feeder

package module
v0.0.0-...-5b2f063 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 22, 2018 License: CC0-1.0 Imports: 8 Imported by: 0

README

RSS

This package allows us to fetch Rss and Atom feeds from the internet. They are parsed into an object tree which is a hybrid of both the RSS and Atom standards.

Supported feeds are:

  • Rss v0.91, 0.92 and 2.0
  • Atom 1.0

The package allows us to maintain cache timeout management. This prevents us from querying the servers for feed updates too often and risk ip bans. Apart from setting a cache timeout manually, the package also optionally adheres to the TTL, SkipDays and SkipHours values specified in the feeds themselves.

Note that the TTL, SkipDays and SkipHour fields are only part of the RSS spec. For Atom feeds, we use the CacheTimeout in the Feed struct.

Because the object structure is a hybrid between both RSS and Atom specs, not all fields will be filled when requesting either an RSS or Atom feed. I have tried to create as many shared fields as possible but some of them simply do not occur in either the RSS or Atom spec.

The Feed object supports notifications of new channels and items. This is achieved by passing 2 function handlers to the feeder.New() function. They will be called whenever a feed is updated from the remote source and either a new channel or a new item is found that previously did not exist. This allows you to easily monitor a feed for changes. See feed_test.go for an example of how this works.

DEPENDENCIES

github.com/jteeuwen/go-pkg-xmlx

USAGE

An idiomatic example program can be found in testdata/example.go.

Documentation

Overview

Credits go to github.com/SlyMarbo/rss for inspiring this solution.

Author: jim teeuwen <jimteeuwen@gmail.com>
Dependencies: go-pkg-xmlx (http://github.com/jteeuwen/go-pkg-xmlx)

This package allows us to fetch Rss and Atom feeds from the internet.
They are parsed into an object tree which is a hybrid of both the RSS and Atom
standards.

Supported feeds are:
	- Rss v0.91, 0.91 and 2.0
	- Atom 1.0

The package allows us to maintain cache timeout management. This prevents us
from querying the servers for feed updates too often and risk ip bams. Appart
from setting a cache timeout manually, the package also optionally adheres to
the TTL, SkipDays and SkipHours values specied in the feeds themselves.

Note that the TTL, SkipDays and SkipHour fields are only part of the RSS spec.
For Atom feeds, we use the CacheTimeout in the Feed struct.

Because the object structure is a hybrid between both RSS and Atom specs, not
all fields will be filled when requesting either an RSS or Atom feed. I have
tried to create as many shared fields as possiblem but some of them simply do
not occur in either the RSS or Atom spec.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewDatabase

func NewDatabase() *database

Types

type Author

type Author struct {
	Name  string
	Uri   string
	Email string
}

type Category

type Category struct {
	Domain string
	Text   string
}

type Channel

type Channel struct {
	Title          string
	Links          []Link
	Description    string
	Language       string
	Copyright      string
	ManagingEditor string
	WebMaster      string
	PubDate        string
	LastBuildDate  string
	Docs           string
	Categories     []*Category
	Generator      Generator
	TTL            int
	Rating         string
	SkipHours      []int
	SkipDays       []int
	Image          Image
	Items          []*Item
	Cloud          Cloud
	TextInput      Input
	Extensions     map[string]map[string][]Extension

	// Atom fields
	Id       string
	Rights   string
	Author   Author
	SubTitle SubTitle
}

func (*Channel) Key

func (c *Channel) Key() string

type ChannelHandler

type ChannelHandler interface {
	ProcessChannels(f *Feed, newchannels []*Channel)
}

func NewDatabaseChannelHandler

func NewDatabaseChannelHandler(db *database, chanhandler ChannelHandler) ChannelHandler

type ChannelHandlerFunc

type ChannelHandlerFunc func(f *Feed, newchannels []*Channel)

func (ChannelHandlerFunc) ProcessChannels

func (h ChannelHandlerFunc) ProcessChannels(f *Feed, newchannels []*Channel)

type Cloud

type Cloud struct {
	Domain            string
	Port              int
	Path              string
	RegisterProcedure string
	Protocol          string
}

type Content

type Content struct {
	Type string
	Lang string
	Base string
	Text string
}

type Enclosure

type Enclosure struct {
	Url    string
	Length int64
	Type   string
}

type Extension

type Extension struct {
	Name      string
	Value     string
	Attrs     map[string]string
	Childrens map[string][]Extension
}

type Feed

type Feed struct {
	// Custom cache timeout in minutes.
	CacheTimeout int

	// Make sure we adhere to the cache timeout specified in the feed. If
	// our CacheTimeout is higher than that, we will use that instead.
	EnforceCacheLimit bool

	// Type of feed. Rss, Atom, etc
	Type string

	// Version of the feed. Major and Minor.
	Version [2]int

	// Channels with content.
	Channels []*Channel

	// Url from which this feed was created.
	Url string
	// contains filtered or unexported fields
}

func New

func New(cachetimeout int, enforcecachelimit bool, ch ChannelHandlerFunc, ih ItemHandlerFunc) *Feed

New is a helper function to stay semi-compatible with the old code. Includes the database handler to ensure that this approach is functionally identical to the old database/handlers version.

func NewWithHandlers

func NewWithHandlers(cachetimeout int, enforcecachelimit bool, ch ChannelHandler, ih ItemHandler) *Feed

NewWithHandler creates a new feed with handlers. People should use this approach from now on.

func (*Feed) CanUpdate

func (this *Feed) CanUpdate() bool

This function returns true or false, depending on whether the CacheTimeout value has expired or not. Additionally, it will ensure that we adhere to the RSS spec's SkipDays and SkipHours values (if Feed.EnforceCacheLimit is set to true). If this function returns true, you can be sure that a fresh feed update will be performed.

func (*Feed) Fetch

func (this *Feed) Fetch(uri string, charset xmlx.CharsetFunc) (err error)

Fetch retrieves the feed's latest content if necessary.

The charset parameter overrides the xml decoder's CharsetReader. This allows us to specify a custom character encoding conversion routine when dealing with non-utf8 input. Supply 'nil' to use the default from Go's xml package.

This is equivalent to calling FetchClient with http.DefaultClient

func (*Feed) FetchBytes

func (this *Feed) FetchBytes(uri string, content []byte, charset xmlx.CharsetFunc) (err error)

Fetch retrieves the feed's content from the []byte

The charset parameter overrides the xml decoder's CharsetReader. This allows us to specify a custom character encoding conversion routine when dealing with non-utf8 input. Supply 'nil' to use the default from Go's xml package.

func (*Feed) FetchClient

func (this *Feed) FetchClient(uri string, client *http.Client, charset xmlx.CharsetFunc) (err error)

Fetch retrieves the feed's latest content if necessary.

The charset parameter overrides the xml decoder's CharsetReader. This allows us to specify a custom character encoding conversion routine when dealing with non-utf8 input. Supply 'nil' to use the default from Go's xml package.

The client parameter allows the use of arbitrary network connections, for example the Google App Engine "URL Fetch" service.

func (*Feed) GetVersionInfo

func (this *Feed) GetVersionInfo(doc *xmlx.Document) (ftype string, fversion [2]int)

Returns the type of the feed, ie. "atom" or "rss", and the version number as an array. The first item in the array is the major and the second the minor version number.

func (*Feed) IgnoreCacheOnce

func (this *Feed) IgnoreCacheOnce()

Until the next *successful* fetching of the feed's content, the fetcher will ignore all cache values and update interval hints, and always attempt to retrieve a fresh copy of the feed.

func (*Feed) LastUpdate

func (this *Feed) LastUpdate() time.Time

This returns a timestamp of the last time the feed was updated.

func (*Feed) SecondsTillUpdate

func (this *Feed) SecondsTillUpdate() int64

Returns the number of seconds needed to elapse before the feed should update.

func (*Feed) SetUserAgent

func (this *Feed) SetUserAgent(s string)

func (*Feed) TillUpdate

func (this *Feed) TillUpdate() (time.Duration, error)

Returns the duration needed to elapse before the feed should update.

type Generator

type Generator struct {
	Uri     string
	Version string
	Text    string
}

type Image

type Image struct {
	Title       string
	Url         string
	Link        string
	Width       int
	Height      int
	Description string
}

type Input

type Input struct {
	Title       string
	Description string
	Name        string
	Link        string
}

type Item

type Item struct {
	// RSS and Shared fields
	Title       string
	Links       []*Link
	Description string
	Author      Author
	Categories  []*Category
	Comments    string
	Enclosures  []*Enclosure
	Guid        *string
	PubDate     string
	Source      *Source

	// Atom specific fields
	Id           string
	Generator    *Generator
	Contributors []string
	Content      *Content
	Updated      string

	Extensions map[string]map[string][]Extension
}

func (*Item) Key

func (i *Item) Key() string

func (*Item) ParsedPubDate

func (i *Item) ParsedPubDate() (time.Time, error)

type ItemHandler

type ItemHandler interface {
	ProcessItems(f *Feed, ch *Channel, newitems []*Item)
}

func NewDatabaseItemHandler

func NewDatabaseItemHandler(db *database, itemhandler ItemHandler) ItemHandler

type ItemHandlerFunc

type ItemHandlerFunc func(f *Feed, ch *Channel, newitems []*Item)

func (ItemHandlerFunc) ProcessItems

func (h ItemHandlerFunc) ProcessItems(f *Feed, ch *Channel, newitems []*Item)
type Link struct {
	Href     string
	Rel      string
	Type     string
	HrefLang string
}

type MissingRssNodeError

type MissingRssNodeError struct{}

func (*MissingRssNodeError) Error

func (err *MissingRssNodeError) Error() string

type Source

type Source struct {
	Url  string
	Text string
}

type SubTitle

type SubTitle struct {
	Type string
	Text string
}

type UnsupportedFeedError

type UnsupportedFeedError struct {
	Type    string
	Version [2]int
}

func (*UnsupportedFeedError) Error

func (err *UnsupportedFeedError) Error() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL