Documentation
¶
Index ¶
- Constants
- Variables
- func DownloadCompleted(outputFilename, resumeFilename string) bool
- func IsInDebugMode() bool
- type AudistoAPIClient
- func (api *AudistoAPIClient) Do(request *http.Request) (*http.Response, error)
- func (api *AudistoAPIClient) FetchRawChunk(forTheFirstRequest bool) ([]byte, int, error)
- func (api *AudistoAPIClient) FetchTotalElements() ([]byte, int, error)
- func (api *AudistoAPIClient) GetAPIEndpoint() string
- func (api *AudistoAPIClient) GetBaseURL() string
- func (api *AudistoAPIClient) GetFullQueryURL(forTheFirstRequest bool) string
- func (api *AudistoAPIClient) GetQueryParams(forTheFirstRequest bool) url.Values
- func (api *AudistoAPIClient) GetRelativePath() string
- func (api *AudistoAPIClient) GetRequestMethod() string
- func (api *AudistoAPIClient) GetRequestURL() (*url.URL, error)
- func (api *AudistoAPIClient) GetTotalElements() (uint64, error)
- func (api *AudistoAPIClient) GetURLPath() string
- func (api *AudistoAPIClient) IsValid() error
- func (api *AudistoAPIClient) ResetChunkSize()
- func (api *AudistoAPIClient) SetChunkSize(size uint64)
- func (api *AudistoAPIClient) SetNextChunkNumber(number uint64)
- func (api *AudistoAPIClient) SetRequestMethod(method string) error
- func (api *AudistoAPIClient) SetTargetPageFilter(pageID uint64)
- type Downloader
- type LogType
- type StatusReport
Constants ¶
const ( // AudistoAPIDomain the domain name endpoint for Audisto API AudistoAPIDomain = "api.audisto.com" // AudistoAPIEndpoint URL enpoint for Audisto API, put "" or "/" string if the endpoint is at the root domain AudistoAPIEndpoint = "/crawls/" // AudistoAPIVersion the version of Audisto API version this downloader will talk to AudistoAPIVersion = "2.0" // EndpointSchema http or https, this probably wont change, hence it is set here EndpointSchema = "https" // DefaultRequestMethod used when http request method is not explicitly set DefaultRequestMethod = "GET" // DefaultOutputFormat the default formatting or file extension for the response we get from Audisto API if not expilictly set DefaultOutputFormat = "tsv" // DefaultChunkSize the default chunk size for interacting with Audisto API if NOT expilicty set // This should not affect the way throttling works DefaultChunkSize = 10000 // ContentType type of the http request to send using the http client ContentType = "application/x-www-form-urlencoded" // AcceptEncoding Content encoding for the http request AcceptEncoding = "gzip, deflate" // ConnectionType is the value of "Connection" http header to be send using the http client ConnectionType = "Keep-Alive" )
const ( // SMOOTHINGFACTOR - SMOOTHINGFACTOR = 0.005 // SelfTargetSuffix used when --targets=self, the output filename will be appended this suffix SelfTargetSuffix = "_links" )
const (
// DebugEnvKey debug environment variable
DebugEnvKey = "DD_DEBUG"
)
const (
// ETAFactor ETA milliseconds estimation factor
ETAFactor = 175
)
Variables ¶
var ( // RefreshInterval time between to progress updates // Export so the caller can fine-tune this RefreshInterval = time.Millisecond * 100 )
progress bar elements
var StatusCodesErrors = map[int]string{
401: "Wrong credentials",
403: "Access denied. Wrong credentials?",
404: "Not found. Correct crawl ID?",
429: "Error while getting total number of elements: 429, multiple requests",
504: "Error while getting total number of elements: 504, server timeout",
}
StatusCodesErrors ..
Functions ¶
func DownloadCompleted ¶
DownloadCompleted a helper function to check if a download for a given output filename has been completed. a download is "considered" completed when: the output filepath exists + its resume file does not exist we're "considering" and not 100% sure since we lack the meta-info resume file.
func IsInDebugMode ¶
func IsInDebugMode() bool
IsInDebugMode checks if the app is running in debug mode
Types ¶
type AudistoAPIClient ¶
type AudistoAPIClient struct { // request path / DSN BasePath string Username string Password string Mode string CrawlID uint64 // request query params Deep bool Filter string Order string Output string ChunkNumber uint64 ChunkSize uint64 // contains filtered or unexported fields }
AudistoAPIClient a struct holding all information required to construct a URL with query params for Audisto API
func NewClient ¶
func NewClient(username string, password string, crawl uint64, mode string, noDetails bool, chunknumber uint64, chunkSize uint64, filter string, order string) (*AudistoAPIClient, error)
NewClient make a new Audisto API Client and checks if it's valid
func (*AudistoAPIClient) Do ¶
Do execute an http request adding Audisto API header values This also do variable validation before executing the request for less http roundtrips
func (*AudistoAPIClient) FetchRawChunk ¶
func (api *AudistoAPIClient) FetchRawChunk(forTheFirstRequest bool) ([]byte, int, error)
FetchRawChunk makes an http request to the server for a given chunk
func (*AudistoAPIClient) FetchTotalElements ¶
func (api *AudistoAPIClient) FetchTotalElements() ([]byte, int, error)
FetchTotalElements sets up the request for the first chunk in json, containing the total number of elements.
func (*AudistoAPIClient) GetAPIEndpoint ¶
func (api *AudistoAPIClient) GetAPIEndpoint() string
GetAPIEndpoint constructs the Audisto API endpoint without the query params nor the dsn part.
func (*AudistoAPIClient) GetBaseURL ¶
func (api *AudistoAPIClient) GetBaseURL() string
GetBaseURL construct the base url for quering Audisto API in the form of: username:password@api.audisto.com
func (*AudistoAPIClient) GetFullQueryURL ¶
func (api *AudistoAPIClient) GetFullQueryURL(forTheFirstRequest bool) string
GetFullQueryURL returns the full url for interacting with Audisto API, INCLUDING query params
func (*AudistoAPIClient) GetQueryParams ¶
func (api *AudistoAPIClient) GetQueryParams(forTheFirstRequest bool) url.Values
GetQueryParams use net/url package to construct query params If forTheFirstRequest is set to true: chunk_size, deep are set to 0 and the output is forced to be json This is used to request the first chunk in json and get total number of elements
func (*AudistoAPIClient) GetRelativePath ¶
func (api *AudistoAPIClient) GetRelativePath() string
GetRelativePath return the relative path to the api domain name e.g. /2.0/crawls/123456/links
func (*AudistoAPIClient) GetRequestMethod ¶
func (api *AudistoAPIClient) GetRequestMethod() string
GetRequestMethod returns the HTTP request method, GET (by default)
func (*AudistoAPIClient) GetRequestURL ¶
func (api *AudistoAPIClient) GetRequestURL() (*url.URL, error)
GetRequestURL returns a validated instance of url.URL, and an error if the validation fails
func (*AudistoAPIClient) GetTotalElements ¶
func (api *AudistoAPIClient) GetTotalElements() (uint64, error)
GetTotalElements asks the server the total number of elements
func (*AudistoAPIClient) GetURLPath ¶
func (api *AudistoAPIClient) GetURLPath() string
GetURLPath returns the full url for interacting with Audisto API, WITHOUT query params e.g. username:password@api.audisto.com/crawls/pages|links
func (*AudistoAPIClient) IsValid ¶
func (api *AudistoAPIClient) IsValid() error
IsValid check if the struct info look good. This does not do any remote request.
func (*AudistoAPIClient) ResetChunkSize ¶
func (api *AudistoAPIClient) ResetChunkSize()
func (*AudistoAPIClient) SetChunkSize ¶
func (api *AudistoAPIClient) SetChunkSize(size uint64)
SetChunkSize set AudistoAPI.ChunkSize to a new size
func (*AudistoAPIClient) SetNextChunkNumber ¶
func (api *AudistoAPIClient) SetNextChunkNumber(number uint64)
SetNextChunkNumber set AudistoAPI.ChunkNumber to the next chunk number
func (*AudistoAPIClient) SetRequestMethod ¶
func (api *AudistoAPIClient) SetRequestMethod(method string) error
SetRequestMethod sets the HTTP request method for interacting with Audisto API Allowed method: GET, POST, PATCH, DELETE
func (*AudistoAPIClient) SetTargetPageFilter ¶
func (api *AudistoAPIClient) SetTargetPageFilter(pageID uint64)
type Downloader ¶
type Downloader struct { OutputFilename string `json:"outputFilename"` TargetsFilename string `json:"targetsFilename"` DoneElements uint64 `json:"doneElements"` TotalElements uint64 `json:"totalElements"` NoDetails bool `json:"noDetails"` TargetsFileMD5 string `json:"targetsFileMD5"` TargetsFileNextID int `json:"targetsFileNextID"` CurrentTarget currentTarget `json:"currentTarget"` PagesSelfTargetsCompleted bool `json:"pagesSelfTargetsCompleted"` // Stop a switch to stop the current download Stop bool // contains filtered or unexported fields }
Downloader initiate or resume a persisted downloading process info using AudistoAPIClient This also follows and increments chunk number, considering total elements to be downloaded
func (*Downloader) PersistConfig ¶
func (d *Downloader) PersistConfig() error
PersistConfig saves the resumer to file
func (*Downloader) ProgressReport ¶
func (d *Downloader) ProgressReport() StatusReport
ProgressReport make the downloader tell its current status
func (*Downloader) Setup ¶
func (d *Downloader) Setup(username string, password string, crawl uint64, mode string, noDetails bool, chunknumber uint64, chunkSize uint64, output string, filter string, noResume bool, order string, targets string) error
Setup assign params and execute the Run() function
func (*Downloader) Start ¶
func (d *Downloader) Start() error
Start runs the overall download logic after the initialization and validation steps
type StatusReport ¶
type StatusReport struct { ETA time.Duration ChunkSize uint64 TotalElements, DoneElements uint64 Mode string TimeoutsCount, ErrorsCount int ProgressPercentage float64 OutputFilename string Logs []map[LogType]string IsIngTargetMode bool TotalIDsCount int CurrentIDOrderNumber int }
StatusReport a struct holding the progress status of the current download
func (*StatusReport) IsDone ¶
func (ps *StatusReport) IsDone() bool
IsDone a helper function to know if the download is considered done.