slurp

package
v0.0.0-...-23e6414 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2022 License: AGPL-3.0 Imports: 8 Imported by: 2

README

This is a client for slurping articles via the scrapeomat slurp API.


Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ArtStream

type ArtStream struct {

	// if there are more articles to grab, this will be set to non-zero when the stream ends
	NextSinceID int
	// contains filtered or unexported fields
}

func (*ArtStream) Close

func (as *ArtStream) Close()

func (*ArtStream) Next

func (as *ArtStream) Next() (*Article, error)

returns io.EOF at end of stream

type Article

type Article struct {
	ID           int    `json:"id,omitempty"`
	CanonicalURL string `json:"canonical_url"`

	// all known URLs for article (including canonical)
	// TODO: first url should be considered "preferred" if no canonical?
	URLs []string `json:"urls"`

	Headline string   `json:"headline"`
	Authors  []Author `json:"authors,omitempty"`

	// Content contains HTML, sanitised using a subset of tags
	Content string `json:"content"`

	// Published contains date of publication.
	// An ISO8601 string is used instead of time.Time, so that
	// less-precise representations can be held (eg YYYY-MM)
	Published   string      `json:"published,omitempty"`
	Updated     string      `json:"updated,omitempty"`
	Publication Publication `json:"publication,omitempty"`
	// Keywords contains data from rel-tags, meta keywords etc...
	Keywords []Keyword `json:"keywords,omitempty"`
	Section  string    `json:"section,omitempty"`
	Tags     []string  `json:"tags,omitempty"`

	// extra fields from twitcooker
	Extra struct {
		RetweetCount  int `json:"retweet_count,omitempty"`
		FavoriteCount int `json:"favorite_count,omitempty"`
		// resolved links
		Links []string `json:"links,omitempty"`
	} `json:"extra,omitempty"`
}

wire format for article data

type Author

type Author struct {
	Name    string `json:"name"`
	RelLink string `json:"rel_link,omitempty"`
	Email   string `json:"email,omitempty"`
	Twitter string `json:"twitter,omitempty"`
}

type CookedSummary

type CookedSummary struct {
	PubCodes []string
	Days     []string
	// An array of array of counts
	// access as: Data[pubcodeindex][dayindex]
	Data [][]int
	Max  int
}

func CookSummary

func CookSummary(raw RawSummary) *CookedSummary

cooks raw article counts, filling in missing days

type Filter

type Filter struct {
	// date ranges are [from,to)
	PubFrom time.Time
	PubTo   time.Time
	//	AddedFrom time.Time
	//	AddedTo   time.Time
	PubCodes []string
	SinceID  int
	Count    int
}

type Keyword

type Keyword struct {
	Name string `json:"name"`
	URL  string `json:"url,omitempty"`
}

type Msg

type Msg struct {
	Article *Article `json:"article,omitempty"`
	Error   string   `json:"error,omitempty"`
	Next    struct {
		SinceID int `json:"since_id,omitempty"`
	} `json:"next,omitempty"`
}

Msg is a single message - can hold an article or error message

type Publication

type Publication struct {
	// Code is a short, unique name (eg "mirror")
	Code string `json:"code"`
	// Name is the 'pretty' name (eg "The Daily Mirror")
	Name   string `json:"name,omitempty"`
	Domain string `json:"domain,omitempty"`
}

type RawSummary

type RawSummary map[string]map[string]int

map of maps pubcodes -> days -> counts

type Slurper

type Slurper struct {
	Client *http.Client
	// eg "http://localhost:12345/ukarticles
	Location string
}

Slurper is a client for talking to a slurp server

func NewSlurper

func NewSlurper(location string) *Slurper

func (*Slurper) FetchCount

func (s *Slurper) FetchCount(filt *Filter) (int, error)

FetchCount returns the number of articles on the server matching the filter.

func (*Slurper) Slurp

func (s *Slurper) Slurp(filt *Filter) (chan Msg, chan struct{})

!!! DEPRECATED !!! Slurp downloads a set of articles from the server returns a channel which streams out messages. errors are returned via Msg. In the case of network errors, Slurp may synthesise fake Msgs containing the error message. Will repeatedly request until all results returned. filter count param is not the total - it is the max articles to return per request. !!! DEPRECATED !!!

func (*Slurper) Slurp2

func (s *Slurper) Slurp2(filt *Filter) *ArtStream

func (*Slurper) Summary

func (s *Slurper) Summary(filt *Filter) (RawSummary, error)

returns a map of maps pubcodes -> days -> counts

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL