crawl

package
v0.0.0-...-a1bc676 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2024 License: Apache-2.0 Imports: 6 Imported by: 1

README

Package cloudeng.io/glean/crawlindex/crawl

import cloudeng.io/glean/crawlindex/crawl

Types

Type Crawler
type Crawler struct {
	// contains filtered or unexported fields
}

Crawler represents a crawler instance that contains global configuration information.

Functions
func New(resources Resources) *Crawler

New creates a new Crawler instance.

Methods
func (c *Crawler) Run(ctx context.Context, fv *Flags, datasource string) error
Type Flags
type Flags struct {
	config.FileFlags
	Outlinks bool `subcmd:"outlinks,false,display extracted outlinks"`
	Progress bool `subcmd:"progress,true,'display progress of downloads'"`
}

Flags represents the flags that are used to control the crawl.

Type Resources
type Resources struct {
	Extractors      map[content.Type]outlinks.Extractor
	PopulateCrawlFS func(ctx context.Context, cfg config.CrawlService, factories map[string]crawlcmd.FSFactory) error
	NewContentFS    func(ctx context.Context, cfg crawlcmd.CrawlCacheConfig) (content.FS, error)
}

Resources represents the resources that are used by the crawler.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler represents a crawler instance that contains global configuration information.

func New

func New(resources Resources) *Crawler

New creates a new Crawler instance.

func (*Crawler) Run

func (c *Crawler) Run(ctx context.Context, fv *Flags, datasource string) error

type Flags

type Flags struct {
	config.FileFlags
	Outlinks bool `subcmd:"outlinks,false,display extracted outlinks"`
	Progress bool `subcmd:"progress,true,'display progress of downloads'"`
}

Flags represents the flags that are used to control the crawl.

type Resources

type Resources struct {
	Extractors      map[content.Type]outlinks.Extractor
	PopulateCrawlFS func(ctx context.Context, cfg config.CrawlService, factories map[string]crawlcmd.FSFactory) error
	NewContentFS    func(ctx context.Context, cfg crawlcmd.CrawlCacheConfig) (content.FS, error)
}

Resources represents the resources that are used by the crawler.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL