README
¶
======================== Chrome File Downloader ======================== .. contents:: :depth: 2 The sole purpose of this package is to download files from the Internets with headless Chrome bypassing the Cloudflare and maybe some other annoying browser checks. It does so by implementing the solutions posted in "`bypass headless chrome detection issue`_" for chromedp_. This library may help you if the other download methods don't work, i.e. curl or the standard `http.Get()`. The implementation is based on this `chromedp example`_. Thanks to `@ZekeLu`_ for huge help in getting this going. Compatibility ------------- Tested with: * Chrome (stable) v90.0.4430.93. * github.com/chromedp/chromedp v0.6.12 * github.com/chromedp/cdproto v0.0.0-20210323015217-0942afbea50e Newer versions of Chrome will require some code changes, as described in `this issue`_, as it uses calls that are deprecated in newer protocol version in order to be compatible with current stable version of Chrome (see above). When using headless-shell docker image, please use the following tag:: FROM chromedp/headless-shell:90.0.4430.93 LICENCES -------- chromedp_: Copyright (c) 2016-2020 Kenneth Shaw .. _`this issue`: https://github.com/chromedp/chromedp/issues/807 .. _`chromedp example`: https://github.com/chromedp/examples/tree/master/download_file .. _`@ZekeLu`: https://github.com/ZekeLu .. _chromedp: https://github.com/chromedp/chromedp .. _`bypass headless chrome detection issue`: https://github.com/chromedp/chromedp/issues/396
Documentation
¶
Overview ¶
Package ChromeDL uses chromedp to download the files. It may come handy when one needs to get a file from a protected website that doesn't allow regular methods, such as curl or http.Get().
It is heavily based on https://github.com/chromedp/examples/tree/master/download_file with minor modifications.
Index ¶
Examples ¶
Constants ¶
const DefaultUA = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
DefaultUA is the default user agent string that will be used by the browser instance. Can be changed
Variables ¶
var ErrNoChrome = errors.New("no chrome instance in the context")
ErrNoChrome indicates that there's no chrome instance in the context.
Functions ¶
func Download ¶ added in v0.1.1
Download downloads a file from the provided uri using the chromedp capabilities. It will return the reader with the file contents (buffered), and an error if any. If the error is present, reader may not be nil if the file was downloaded and read successfully. It will store the file in the temporary directory once the download is complete, then buffer it and try to cleanup afterwards. Set the timeout on context if required, by default no timeout is set. Optionally one can pass the configuration options for the downloader.
Example ¶
Output: file size > 0: true file signature: PK
Types ¶
type Instance ¶ added in v0.1.0
type Instance struct {
// contains filtered or unexported fields
}
Instance is the browser instance that will be used for downloading files.
func New ¶ added in v0.1.0
New creates a new Instance, starting up the headless chrome to do the download. Once finished, call Stop to terminate the browser.
func NewWithChromeCtx ¶ added in v0.1.1
NewWithChromeCtx creates new Instance for existing browser instance. Stop will not terminate the browser, but will cancel the event listener.
func (*Instance) Download ¶ added in v0.1.1
Download downloads the file returning the reader with contents.