apicrawlcmd

package
v0.0.0-...-ccb06fb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 26, 2024 License: Apache-2.0 Imports: 6 Imported by: 14

README

Package cloudeng.io/webapi/operations/apicrawlcmd

import cloudeng.io/webapi/operations/apicrawlcmd

Package apicrawlcmd provides support for building command line tools that implement API crawls.

Functions

Func CachePaths
func CachePaths(crawls Crawls) []string

CachePaths returns the paths of all cache directories.

Func CheckpointPaths
func CheckpointPaths(crawls Crawls) []string

CheckpointPaths returns the paths of all checkpoint directories.

Func ParseCrawlConfig
func ParseCrawlConfig[T any](crawls Crawls, name string, crawlConfig *Crawl[T]) (bool, error)

ParseCrawlConfig parses an API specific crawl config of the specified name.

Types

Type Crawl
type Crawl[T any] struct {
	RateControl crawlcmd.RateControl      `yaml:",inline"`
	Cache       crawlcmd.CrawlCacheConfig `yaml:",inline"`
	Service     T                         `yaml:",inline"`
}

Crawl is a generic type that defines common crawl configuration options as well as allowing for service specific ones.

Type Crawls
type Crawls map[string]struct {
	RateControl crawlcmd.RateControl      `yaml:",inline"`
	Cache       crawlcmd.CrawlCacheConfig `yaml:",inline"`
	Service     yaml.Node                 `yaml:"service"`
}

Crawls represents the configuration of multiple API crawls.

Documentation

Overview

Package apicrawlcmd provides support for building command line tools that implement API crawls.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ParseCrawlConfig

func ParseCrawlConfig[T any](cfg Crawl[yaml.Node], service *Crawl[T]) error

ParseCrawlConfig parses an API specific crawl config, it's parametized by the types of the service specific and crawl cache specific data types.

Types

type Crawl

type Crawl[T any] struct {
	RateControl crawlcmd.RateControl      `yaml:",inline"`
	Cache       crawlcmd.CrawlCacheConfig `yaml:"cache"`
	Service     T                         `yaml:"service_config" cmd:"service specific configuration"`
}

Crawl is a generic type that defines common crawl configuration options as well as allowing for service specific ones. The type of the service specific configuration is generally determined by the API being crawled.

type Crawls

type Crawls map[string]Crawl[yaml.Node]

Crawls represents the configuration of multiple API crawls.

type Resources

type Resources struct {
	// Token contains all authentication tokens/credentials required to access the
	// API being crawled.
	Token *apitokens.T

	NewOperationsFS func(ctx context.Context, cfg crawlcmd.CrawlCacheConfig) (operations.FS, error)

	NewCheckpointOp func(ctx context.Context, cfg crawlcmd.CrawlCacheConfig) (checkpoint.Operation, error)
}

Resources represents the resources typically required to perform an API crawl.

func (Resources) CreateResources

func (r Resources) CreateResources(ctx context.Context, cfg crawlcmd.CrawlCacheConfig) (store operations.FS, chkpt checkpoint.Operation, err error)

type State

type State[T any] struct {
	Config     Crawl[T]
	Token      *apitokens.T
	Store      operations.FS
	Checkpoint checkpoint.Operation
}

func NewState

func NewState[T any](ctx context.Context, config Crawl[yaml.Node], resources Resources) (State[T], error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL