scraper

package
v0.0.0-...-6cf3b9a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2023 License: MIT Imports: 21 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ElasticHost     = "http://localhost"
	ElasticPort     = 9200
	ElasticUser     = "elastic"
	ElasticPassword = "changeme"
)

Variables

This section is empty.

Functions

func ExtractWithBoilerpipe

func ExtractWithBoilerpipe(urlStr string, html string) (string, error)

func ExtractWithGoOse

func ExtractWithGoOse(url string, html string) (string, error)

func NewElasticClient

func NewElasticClient() (*elastic.Client, error)

NewElasticClient creates a new client to connect to an elasticsearch cluster

Types

type Article

type Article struct {
	FeedItem *feedreader.FeedItem
	HTML     string
}

func (*Article) Extract

func (article *Article) Extract() error

Extract the content of an article

func (*Article) Fetch

func (article *Article) Fetch() error

Fetch the content of an article from the web

func (*Article) Write

func (article *Article) Write(outDir string, dayTime *time.Time) error

Write article to file

type FetchError

type FetchError struct {
	Msg    string    `json:"message"`
	URL    string    `json:"url"`
	Status int       `json:"status"`
	Time   time.Time `json:"time"`
}

func (*FetchError) Error

func (e *FetchError) Error() string

type Scraper

type Scraper struct {
	Feeds         []feedreader.Feed
	Lang          string
	Articles      int
	Failures      int
	ElasticClient *elastic.Client
	Verbose       bool
}

func New

func New(feedsFile string) (Scraper, error)

New creates a scraper instance

func (*Scraper) Scrape

func (scraper *Scraper) Scrape(outDir string, day *time.Time) error

Scrape downloads the content of the provide list of urls

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL