rss

package module
v0.0.0-...-ed60a1d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 9, 2024 License: BSD-3-Clause Imports: 17 Imported by: 0

README

erkie/rss

This is a library based on https://github.com/slymarbo/rss. It is a fork with backwards incompatible changes and fixes to the original repo.

Changes

  • Changes to the Feed struct, removing unread flags and more
  • Changes to feed parsing, for compatibility reasons
  • ... And more. This README was written a long time after this fork was created. So feel free to check the history

Fixes

  • Charset decoding. Be a lot more lenient than standard Go towards faulty or broken encodings

Repo

Documentation

Overview

Package rss is a small library for simplifying the parsing of RSS and Atom feeds.

The package could do with more testing, but it conforms to the RSS 1.0, 2.0, and Atom 1.0 specifications, to the best of my ability. I've tested it with about 15 different feeds, and it seems to work fine with them.

If anyone has any problems with feeds being parsed incorrectly, please let me know so that I can debug and improve the package.

Example usage:

```go package main

import "github.com/erkie/rss"

func main() {
	feed, err := rss.Fetch("http://example.com/rss")
	if err != nil {
		// handle error.
	}

	// ... Some time later ...

	err = feed.Update()
	if err != nil {
		// handle error.
	}
}

```

The output structure is pretty much as you'd expect:

```go

type Feed struct {
	Nickname    string // This is not set by the package, but could be helpful.
	Title       string
	Description string
	Link        string // Link to the creator's website.
	UpdateURL   string // URL of the feed itself.
	Image       *Image // Feed icon.
	Items       []*Item
	Refresh     time.Time           // Earliest time this feed should next be checked.
}
type Item struct {
	Title   string
	Summary string
	Content string
	Link    string
	Date    time.Time
	ID      string
	Read    bool
}
type Image struct {
	Title   string
	Url     string
	Height  uint32
	Width   uint32
}

```

The library does its best to follow the appropriate specifications and not to set the Refresh time too soon. It currently follows all update time management methods in the RSS 1.0, 2.0, and Atom 1.0 specifications. If one is not provided, it defaults to 10 minute intervals. If you are having issues with feed providors dropping connections, please let me know and I can increase this default, or you can increase the Refresh time manually. The Feed.Update method uses this Refresh time, so if Update seems to be returning very quickly with no new items, it's likely not making a request due to the provider's Refresh interval.

Index

Constants

View Source
const DATE = "15:04:05 MST 02/01/2006"

DATE is a date

Variables

View Source
var TimeLayouts = []string{
	"Mon, _2 Jan 2006 15:04:05",
	"Mon, _2 Jan 2006 15:04:05 MST",
	"Mon, _2 Jan 2006 15:04:05 Z",
	"Mon, _2 Jan 06 15:04:05 MST",
	"Mon, _2 Jan 2006 15:04:05 -0700",
	"Mon, _2 Jan 06 15:04:05 -0700",
	"_2 Jan 2006 15:04:05 MST",
	"_2 Jan 06 15:04:05 MST",
	"_2 Jan 2006 15:04:05 -0700",
	"_2 Jan 06 15:04:05 -0700",
	"2006-01-02 15:04:05",
	"Jan _2, 2006 15:04 PM MST",
	"Jan _2, 06 15:04 PM MST",
	time.ANSIC,
	time.UnixDate,
	time.RubyDate,
	time.RFC822,
	time.RFC822Z,
	time.RFC850,
	time.RFC1123,
	time.RFC1123Z,
	time.RFC3339,
	time.RFC3339Nano,
	"2006-01-02T15:04:05",
	"02-Jan-2006 15:04:05",
}

TimeLayouts is contains a list of time.Parse() layouts that are used in attempts to convert item.Date and item.PubDate string to time.Time values. The layouts are attempted in ascending order until either time.Parse() does not return an error or all layouts are attempted.

Functions

func CharsetReader

func CharsetReader(theCharset string, input io.Reader) (io.Reader, error)

CharsetReader is a lenient charset reader good for web inputs

func DiscardInvalidUTF8IfUTF8

func DiscardInvalidUTF8IfUTF8(input []byte, responseHeaders http.Header) []byte

DiscardInvalidUTF8IfUTF8 checks if input specifies itself as UTF8, and then runs a check to discard XML-invalid characters (because go xml parser throws up if present)

Types

type Enclosure

type Enclosure struct {
	URL    string `json:"url"`
	Type   string `json:"type"`
	Length string `json:"length"`
}

Enclosure holds enclosure data

func (*Enclosure) Get

func (e *Enclosure) Get() (io.ReadCloser, error)

Get returns an io.Reader for the data held by the Enclosure

type Feed

type Feed struct {
	Type        string
	Title       string
	Description string
	Link        string // Link to the creator's website.
	UpdateURL   string // URL of the feed itself.
	Items       []*Item
	Links       []*Link
	Categories  []string
}

Feed is the top-level structure.

func Parse

func Parse(data []byte, options ParseOptions) (*Feed, error)

Parse RSS or Atom data.

func (*Feed) String

func (f *Feed) String() string

type Item

type Item struct {
	Title      string            `json:"title"`
	Summary    string            `json:"summary"`
	Content    string            `json:"content"`
	Category   string            `json:"category"`
	Link       string            `json:"link"`
	Date       time.Time         `json:"date"`
	ID         string            `json:"id"`
	Enclosures []*Enclosure      `json:"enclosures"`
	Meta       map[string]string `json:"meta"`
}

Item represents a single story.

func (*Item) Format

func (i *Item) Format(indent int) string

Format format an item nicely

func (*Item) String

func (i *Item) String() string
type Link struct {
	URL string
	Rel string
}

Link as defined inside RSS feeds that can contain various information

type ParseOptions

type ParseOptions struct {
	CharsetReader   func(charset string, input io.Reader) (io.Reader, error)
	ResponseHeaders http.Header
	FinalURL        string
}

type ParserFunc

type ParserFunc func(data []byte, options ParseOptions) (*Feed, error)

ParserFunc is the interface for a parser

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL