gofeed

package module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 29, 2023 License: MIT Imports: 16 Imported by: 813

README

gofeed

Build Status Coverage Status Go Report Card License

The gofeed library is a robust feed parser that supports parsing both RSS, Atom and JSON feeds. The library provides a universal gofeed.Parser that will parse and convert all feed types into a hybrid gofeed.Feed model. You also have the option of utilizing the feed specific atom.Parser or rss.Parser or json.Parser parsers which generate atom. Feed , rss.Feed and json.Feed respectively.

Table of Contents

Features

Supported feed types:
  • RSS 0.90
  • Netscape RSS 0.91
  • Userland RSS 0.91
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0
  • JSON 1.0
  • JSON 1.1
Extension Support

The gofeed library provides support for parsing several popular predefined extensions into ready-made structs, including Dublin Core and Apple’s iTunes.

It parses all other feed extensions in a generic way (see the Extensions section for more details).

Invalid Feeds

A best-effort attempt is made at parsing broken and invalid XML feeds. Currently, gofeed can succesfully parse feeds with the following issues:

  • Unescaped/Naked Markup in feed elements
  • Undeclared namespace prefixes
  • Missing closing tags on certain elements
  • Illegal tags within feed elements without namespace prefixes
  • Missing "required" elements as specified by the respective feed specs.
  • Incorrect date formats

Overview

The gofeed library is comprised of a universal feed parser and several feed specific parsers. Which one you choose depends entirely on your usecase. If you will be handling rss, atom and json feeds then it makes sense to use the gofeed. Parser . If you know ahead of time that you will only be parsing one feed type then it would make sense to use rss.Parser or atom.Parser or json. Parser .

Universal Feed Parser

The universal gofeed.Parser works in 3 stages: detection, parsing and translation. It first detects the feed type that it is currently parsing. Then it uses a feed specific parser to parse the feed into its true representation which will be either a rss.Feed or atom.Feed or json. Feed . These models cover every field possible for their respective feed types. Finally, they are translated into a gofeed.Feed model that is a hybrid of all feed types. Performing the universal feed parsing in these 3 stages allows for more flexibility and keeps the code base more maintainable by separating RSS, Atom and Json parsing into seperate packages.

Diagram

The translation step is done by anything which adheres to the gofeed.Translator interface. The DefaultRSSTranslator , DefaultAtomTranslator , DefaultJSONTranslator are used behind the scenes when you use the gofeed.Parser with its default settings. You can see how they translate fields from atom.Feed or rss.Feed json. Feed to the universal gofeed.Feed struct in the Default Mappings section. However, should you disagree with the way certain fields are translated you can easily supply your own gofeed.Translator and override this behavior. See the Advanced Usage section for an example how to do this.

Feed Specific Parsers

The gofeed library provides two feed specific parsers: atom. Parser , rss.Parser and json. Parser . If the hybrid gofeed.Feed model that the universal gofeed.Parser produces does not contain a field from the atom.Feed or rss.Feed or json.Feed model that you require, it might be beneficial to use the feed specific parsers. When using the atom.Parser or rss.Parser or json.Parser directly, you can access all of fields found in the atom. Feed , rss.Feed and json.Feed models. It is also marginally faster because you are able to skip the translation step.

Basic Usage

Universal Feed Parser

The most common usage scenario will be to use gofeed.Parser to parse an arbitrary RSS or Atom or JSON feed into the hybrid gofeed.Feed model. This hybrid model allows you to treat RSS, Atom and JSON feeds the same.

Parse a feed from an URL:
fp := gofeed.NewParser()
feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml")
fmt.Println(feed.Title)
Parse a feed from a string:
feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
fp := gofeed.NewParser()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Title)
Parse a feed from an io. Reader:
file, _ := os.Open("/path/to/a/file.xml")
defer file.Close()
fp := gofeed.NewParser()
feed, _ := fp.Parse(file)
fmt.Println(feed.Title)
Parse a feed from an URL with a 60s timeout:
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
fp := gofeed.NewParser()
feed, _ := fp.ParseURLWithContext("http://feeds.twit.tv/twit.xml", ctx)
fmt.Println(feed.Title)
Parse a feed from an URL with a custom User-Agent:
fp := gofeed.NewParser()
fp.UserAgent = "MyCustomAgent 1.0"
feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml")
fmt.Println(feed.Title)
Feed Specific Parsers

You can easily use the rss. Parser , atom.Parser or json.Parser directly if you have a usage scenario that requires it:

Parse a RSS feed into a rss. Feed
feedData := `<rss version="2.0">
<channel>
<webMaster>example@site.com (Example Name)</webMaster>
</channel>
</rss>`
fp := rss.Parser{}
rssFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(rssFeed.WebMaster)
Parse an Atom feed into a atom. Feed
feedData := `<feed xmlns="http://www.w3.org/2005/Atom">
<subtitle>Example Atom</subtitle>
</feed>`
fp := atom.Parser{}
atomFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(atomFeed.Subtitle)
Parse a JSON feed into a json. Feed
feedData := `{"version":"1.0", "home_page_url": "https://daringfireball.net"}`
fp := json.Parser{}
jsonFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(jsonFeed.HomePageURL)

Advanced Usage

Instantiate Parser with BasicAuthentication:
fp := gofeed.NewParser()
fp.AuthConfig = &gofeed.Auth{
  Username: "foo",
  Password: "bar",
}
Parse a feed while using a custom translator

The mappings and precedence order that are outlined in the Default Mappings section are provided by the following two structs: DefaultRSSTranslator , DefaultAtomTranslator and DefaultJSONTranslator . If you have fields that you think should have a different precedence, or if you want to make a translator that is aware of an unsupported extension you can do this by specifying your own RSS or Atom or JSON translator when using the gofeed. Parser .

Here is a simple example of creating a custom Translator that makes the /rss/channel/itunes:author field have a higher precedence than the /rss/channel/managingEditor field in RSS feeds. We will wrap the existing DefaultRSSTranslator since we only want to change the behavior for a single field.

First we must define a custom translator:


import (
    "fmt"

    "github.com/mmcdole/gofeed"
    "github.com/mmcdole/gofeed/rss"
)

type MyCustomTranslator struct {
    defaultTranslator *gofeed.DefaultRSSTranslator
}

func NewMyCustomTranslator() *MyCustomTranslator {
  t := &MyCustomTranslator{}

  // We create a DefaultRSSTranslator internally so we can wrap its Translate
  // call since we only want to modify the precedence for a single field.
  t.defaultTranslator = &gofeed.DefaultRSSTranslator{}
  return t
}

func (ct* MyCustomTranslator) Translate(feed interface{}) (*gofeed.Feed, error) {
	rss, found := feed.(*rss.Feed)
	if !found {
		return nil, fmt.Errorf("Feed did not match expected type of *rss.Feed")
	}

  f, err := ct.defaultTranslator.Translate(rss)
  if err != nil {
    return nil, err
  }

  if rss.ITunesExt != nil && rss.ITunesExt.Author != "" {
      f.Author = rss.ITunesExt.Author
  } else {
      f.Author = rss.ManagingEditor
  }
  return f
}

Next you must configure your gofeed.Parser to utilize the new gofeed. Translator :

feedData := `<rss version="2.0">
<channel>
<managingEditor>Ender Wiggin</managingEditor>
<itunes:author>Valentine Wiggin</itunes:author>
</channel>
</rss>`

fp := gofeed.NewParser()
fp.RSSTranslator = NewMyCustomTranslator()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Author) // Valentine Wiggin

Extensions

Every element which does not belong to the feed's default namespace is considered an extension by gofeed . These are parsed and stored in a tree-like structure located at Feed.Extensions and Item. Extensions . These fields should allow you to access and read any custom extension elements.

In addition to the generic handling of extensions, gofeed also has built in support for parsing certain popular extensions into their own structs for convenience. It currently supports the Dublin Core and Apple iTunes extensions which you can access at Feed. ItunesExt , feed.DublinCoreExt and Item.ITunesExt and Item. DublinCoreExt

Default Mappings

The DefaultRSSTranslator , the DefaultAtomTranslator and the DefaultJSONTranslator map the following rss. Feed , atom.Feed and json.Feed fields to their respective gofeed.Feed fields. They are listed in order of precedence (highest to lowest):

gofeed.Feed RSS Atom JSON
Title /rss/channel/title
/rdf: RDF/channel/title
/rss/channel/dc:title
/rdf: RDF/channel/dc:title
/feed/title /title
Description /rss/channel/description
/rdf: RDF/channel/description
/rss/channel/itunes:subtitle
/feed/subtitle
/feed/tagline
/description
Link /rss/channel/link
/rdf: RDF/channel/link
/feed/link[@rel=”alternate”]/@href
/feed/link[not(@rel)]/@href
/home_page_url
FeedLink /rss/channel/atom:link[@rel="self"]/@href
/rdf: RDF/channel/atom:link[@rel="self"]/@href
/feed/link[@rel="self"]/@href /feed_url
Updated /rss/channel/lastBuildDate
/rss/channel/dc:date
/rdf: RDF/channel/dc:date
/feed/updated
/feed/modified
/items[0]/date_modified
Published /rss/channel/pubDate /items[0]/date_published
Author /rss/channel/managingEditor
/rss/channel/webMaster
/rss/channel/dc:author
/rdf: RDF/channel/dc:author
/rss/channel/dc:creator
/rdf: RDF/channel/dc:creator
/rss/channel/itunes:author
/feed/authors[0] /author
Authors /rss/channel/managingEditor
/rss/channel/webMaster
/rss/channel/dc:author
/rdf: RDF/channel/dc:author
/rss/channel/dc:creator
/rdf: RDF/channel/dc:creator
/rss/channel/itunes:author
/feed/authors /authors
/author
Language /rss/channel/language
/rss/channel/dc:language
/rdf: RDF/channel/dc:language
/feed/@xml:lang /language
Image /rss/channel/image
/rdf: RDF/image
/rss/channel/itunes:image
/feed/logo /icon
Copyright /rss/channel/copyright
/rss/channel/dc:rights
/rdf: RDF/channel/dc:rights
/feed/rights
/feed/copyright
Generator /rss/channel/generator /feed/generator
Categories /rss/channel/category
/rss/channel/itunes:category
/rss/channel/itunes:keywords
/rss/channel/dc:subject
/rdf: RDF/channel/dc:subject
/feed/category
gofeed.Item RSS Atom JSON
Title /rss/channel/item/title
/rdf: RDF/item/title
/rdf: RDF/item/dc:title
/rss/channel/item/dc:title
/feed/entry/title /items/title
Description /rss/channel/item/description
/rdf: RDF/item/description
/rss/channel/item/dc:description
/rdf: RDF/item/dc:description
/feed/entry/summary /items/summary
Content /rss/channel/item/content:encoded /feed/entry/content /items/content_html
Link /rss/channel/item/link
/rdf: RDF/item/link
/feed/entry/link[@rel=”alternate”]/@href
/feed/entry/link[not(@rel)]/@href
/items/url
Updated /rss/channel/item/dc:date
/rdf: RDF/rdf:item/dc:date
/feed/entry/modified
/feed/entry/updated
/items/date_modified
Published /rss/channel/item/pubDate
/rss/channel/item/dc:date
/feed/entry/published
/feed/entry/issued
/items/date_published
Author /rss/channel/item/author
/rss/channel/item/dc:author
/rdf: RDF/item/dc:author
/rss/channel/item/dc:creator
/rdf: RDF/item/dc:creator
/rss/channel/item/itunes:author
/feed/entry/author /items/author/name
Authors /rss/channel/item/author
/rss/channel/item/dc:author
/rdf: RDF/item/dc:author
/rss/channel/item/dc:creator
/rdf: RDF/item/dc:creator
/rss/channel/item/itunes:author
/feed/entry/authors[0] /items/authors
/items/author/name
GUID /rss/channel/item/guid /feed/entry/id /items/id
Image /rss/channel/item/itunes:image
/rss/channel/item/media:image
/items/image
/items/banner_image
Categories /rss/channel/item/category
/rss/channel/item/dc:subject
/rss/channel/item/itunes:keywords
/rdf: RDF/channel/item/dc:subject
/feed/entry/category /items/tags
Enclosures /rss/channel/item/enclosure /feed/entry/link[@rel=”enclosure”] /items/attachments

Dependencies

License

This project is licensed under the MIT License

Credits

Documentation

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrFeedTypeNotDetected = errors.New("Failed to detect feed type")

ErrFeedTypeNotDetected is returned when the detection system can not figure out the Feed format

Functions

This section is empty.

Types

type Auth added in v1.2.0

type Auth struct {
	Username string
	Password string
}

Auth is a structure allowing to use the BasicAuth during the HTTP request It must be instantiated with your new Parser

type DefaultAtomTranslator

type DefaultAtomTranslator struct{}

DefaultAtomTranslator converts an atom.Feed struct into the generic Feed struct.

This default implementation defines a set of mapping rules between atom.Feed -> Feed for each of the fields in Feed.

func (*DefaultAtomTranslator) Translate

func (t *DefaultAtomTranslator) Translate(feed interface{}) (*Feed, error)

Translate converts an Atom feed into the universal feed type.

type DefaultJSONTranslator added in v1.1.0

type DefaultJSONTranslator struct{}

DefaultJSONTranslator converts an json.Feed struct into the generic Feed struct.

This default implementation defines a set of mapping rules between json.Feed -> Feed for each of the fields in Feed.

func (*DefaultJSONTranslator) Translate added in v1.1.0

func (t *DefaultJSONTranslator) Translate(feed interface{}) (*Feed, error)

Translate converts an JSON feed into the universal feed type.

type DefaultRSSTranslator

type DefaultRSSTranslator struct{}

DefaultRSSTranslator converts an rss.Feed struct into the generic Feed struct.

This default implementation defines a set of mapping rules between rss.Feed -> Feed for each of the fields in Feed.

func (*DefaultRSSTranslator) Translate

func (t *DefaultRSSTranslator) Translate(feed interface{}) (*Feed, error)

Translate converts an RSS feed into the universal feed type.

type Enclosure

type Enclosure struct {
	URL    string `json:"url,omitempty"`
	Length string `json:"length,omitempty"`
	Type   string `json:"type,omitempty"`
}

Enclosure is a file associated with a given Item.

type Feed

type Feed struct {
	Title           string                   `json:"title,omitempty"`
	Description     string                   `json:"description,omitempty"`
	Link            string                   `json:"link,omitempty"`
	FeedLink        string                   `json:"feedLink,omitempty"`
	Links           []string                 `json:"links,omitempty"`
	Updated         string                   `json:"updated,omitempty"`
	UpdatedParsed   *time.Time               `json:"updatedParsed,omitempty"`
	Published       string                   `json:"published,omitempty"`
	PublishedParsed *time.Time               `json:"publishedParsed,omitempty"`
	Author          *Person                  `json:"author,omitempty"` // Deprecated: Use feed.Authors instead
	Authors         []*Person                `json:"authors,omitempty"`
	Language        string                   `json:"language,omitempty"`
	Image           *Image                   `json:"image,omitempty"`
	Copyright       string                   `json:"copyright,omitempty"`
	Generator       string                   `json:"generator,omitempty"`
	Categories      []string                 `json:"categories,omitempty"`
	DublinCoreExt   *ext.DublinCoreExtension `json:"dcExt,omitempty"`
	ITunesExt       *ext.ITunesFeedExtension `json:"itunesExt,omitempty"`
	Extensions      ext.Extensions           `json:"extensions,omitempty"`
	Custom          map[string]string        `json:"custom,omitempty"`
	Items           []*Item                  `json:"items"`
	FeedType        string                   `json:"feedType"`
	FeedVersion     string                   `json:"feedVersion"`
}

Feed is the universal Feed type that atom.Feed and rss.Feed gets translated to. It represents a web feed. Sorting with sort.Sort will order the Items by oldest to newest publish time.

func (Feed) Len

func (f Feed) Len() int

Len returns the length of Items.

func (Feed) Less

func (f Feed) Less(i, k int) bool

Less compares PublishedParsed of Items[i], Items[k] and returns true if Items[i] is less than Items[k].

func (Feed) String

func (f Feed) String() string

func (Feed) Swap

func (f Feed) Swap(i, k int)

Swap swaps Items[i] and Items[k].

type FeedType

type FeedType int

FeedType represents one of the possible feed types that we can detect.

const (
	// FeedTypeUnknown represents a feed that could not have its
	// type determiend.
	FeedTypeUnknown FeedType = iota
	// FeedTypeAtom repesents an Atom feed
	FeedTypeAtom
	// FeedTypeRSS represents an RSS feed
	FeedTypeRSS
	// FeedTypeJSON represents a JSON feed
	FeedTypeJSON
)

func DetectFeedType

func DetectFeedType(feed io.Reader) FeedType

DetectFeedType attempts to determine the type of feed by looking for specific xml elements unique to the various feed types.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/mmcdole/gofeed"
)

func main() {
	feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
	feedType := gofeed.DetectFeedType(strings.NewReader(feedData))
	if feedType == gofeed.FeedTypeRSS {
		fmt.Println("Wow! This is an RSS feed!")
	}
}
Output:

type HTTPError

type HTTPError struct {
	StatusCode int
	Status     string
}

HTTPError represents an HTTP error returned by a server.

func (HTTPError) Error

func (err HTTPError) Error() string

type Image

type Image struct {
	URL   string `json:"url,omitempty"`
	Title string `json:"title,omitempty"`
}

Image is an image that is the artwork for a given feed or item.

type Item

type Item struct {
	Title           string                   `json:"title,omitempty"`
	Description     string                   `json:"description,omitempty"`
	Content         string                   `json:"content,omitempty"`
	Link            string                   `json:"link,omitempty"`
	Links           []string                 `json:"links,omitempty"`
	Updated         string                   `json:"updated,omitempty"`
	UpdatedParsed   *time.Time               `json:"updatedParsed,omitempty"`
	Published       string                   `json:"published,omitempty"`
	PublishedParsed *time.Time               `json:"publishedParsed,omitempty"`
	Author          *Person                  `json:"author,omitempty"` // Deprecated: Use item.Authors instead
	Authors         []*Person                `json:"authors,omitempty"`
	GUID            string                   `json:"guid,omitempty"`
	Image           *Image                   `json:"image,omitempty"`
	Categories      []string                 `json:"categories,omitempty"`
	Enclosures      []*Enclosure             `json:"enclosures,omitempty"`
	DublinCoreExt   *ext.DublinCoreExtension `json:"dcExt,omitempty"`
	ITunesExt       *ext.ITunesItemExtension `json:"itunesExt,omitempty"`
	Extensions      ext.Extensions           `json:"extensions,omitempty"`
	Custom          map[string]string        `json:"custom,omitempty"`
}

Item is the universal Item type that atom.Entry and rss.Item gets translated to. It represents a single entry in a given feed.

type Parser

type Parser struct {
	AtomTranslator Translator
	RSSTranslator  Translator
	JSONTranslator Translator
	UserAgent      string
	AuthConfig     *Auth
	Client         *http.Client
	// contains filtered or unexported fields
}

Parser is a universal feed parser that detects a given feed type, parsers it, and translates it to the universal feed type.

func NewParser

func NewParser() *Parser

NewParser creates a universal feed parser.

func (*Parser) Parse

func (f *Parser) Parse(feed io.Reader) (*Feed, error)

Parse parses a RSS or Atom or JSON feed into the universal gofeed.Feed. It takes an io.Reader which should return the xml/json content.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/mmcdole/gofeed"
)

func main() {
	feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
	fp := gofeed.NewParser()
	feed, err := fp.Parse(strings.NewReader(feedData))
	if err != nil {
		panic(err)
	}
	fmt.Println(feed.Title)
}
Output:

func (*Parser) ParseString

func (f *Parser) ParseString(feed string) (*Feed, error)

ParseString parses a feed XML string and into the universal feed type.

Example
package main

import (
	"fmt"

	"github.com/mmcdole/gofeed"
)

func main() {
	feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
	fp := gofeed.NewParser()
	feed, err := fp.ParseString(feedData)
	if err != nil {
		panic(err)
	}
	fmt.Println(feed.Title)
}
Output:

func (*Parser) ParseURL

func (f *Parser) ParseURL(feedURL string) (feed *Feed, err error)

ParseURL fetches the contents of a given url and attempts to parse the response into the universal feed type.

Example
package main

import (
	"fmt"

	"github.com/mmcdole/gofeed"
)

func main() {
	fp := gofeed.NewParser()
	feed, err := fp.ParseURL("http://feeds.twit.tv/twit.xml")
	if err != nil {
		panic(err)
	}
	fmt.Println(feed.Title)
}
Output:

func (*Parser) ParseURLWithContext

func (f *Parser) ParseURLWithContext(feedURL string, ctx context.Context) (feed *Feed, err error)

ParseURLWithContext fetches contents of a given url and attempts to parse the response into the universal feed type. You can instantiate the Auth structure with your Username and Password to use the BasicAuth during the HTTP call. It will be automatically added to the header of the request Request could be canceled or timeout via given context

type Person

type Person struct {
	Name  string `json:"name,omitempty"`
	Email string `json:"email,omitempty"`
}

Person is an individual specified in a feed (e.g. an author)

type Translator

type Translator interface {
	Translate(feed interface{}) (*Feed, error)
}

Translator converts a particular feed (atom.Feed or rss.Feed of json.Feed) into the generic Feed struct

Directories

Path Synopsis
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL