scrapio

command module

v0.0.0-...-2f0050d Latest Latest Go to latest Published: May 4, 2020 License: Apache-2.0 Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/koshqua/scrapio

Links

Open Source Insights

README ¶

Scrapio

Scrapio - is a lightweight and user-friendy web crawling and scraping library. The main goal of creating the project was to make scraping big amounts of similar data from web easy and user-friendly. It might be useful for wide range of applications, like data mining, data processing and archiving. After some time, I am going to make it a standalone service, which will work as an API.

Installation

Features

At the moment works as a library which can be used to crawl and scrap data from web. What it can do:

Crawl all pages on host, return all the links.
Scrap text, image urls and links from Crawl Result pages.
It leaves the choice of data output(csv,json, etc) up to you.
It's free and quite powerful.
Written in go, concurrent, depending on Network Speed can crawl and scrap up to 2k pages/minute.

Installation

go get github.com/koshqua/scrapio

Usage

Crawler is easy to use. You just need to specify a starting URL and it will crawl all the URL on the host.

    //init a new crawler, give it a start url, it's not necessary should be basic URL
    cr := &crawler.Crawler{StartURL: "https://gulfnews.com/"}
    //Start crawling func. 
    //After some time im going to implement more configs for this func, like max results, etc.
    cr.Crawl()
    //Do something with result, it's up to you

Scraper uses data structure given by crawler. Before initiating a scraper, you need to create a few selectors, to assign them to scraper. Selectors are the simple css-like selectors.

    //create some Selectors, which you want to scrap.
    h2 := scraper.NewSelector("h2", true, true, true)
    img := scraper.NewSelector("img", true, true, true)
    p := scraper.NewSelector("p:first-of-type", true, true, true)
    //Initiate a new scrapper with given selectors
    //Scraper depends on the crawler from previous code snippet.
    //It gets pages and creates new structure with selectors and scrap results.
    sc := scraper.InitScraper(*cr, []scraper.Selector{h2, img, p})
    //And just start scraping
	err := sc.Scrap()
	if err != nil {
		log.Fatalln(err)
	}

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
api
crawler module
scraper module

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL