pdf_crawler

command
v0.0.0-...-584605b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 15, 2022 License: Apache-2.0 Imports: 8 Imported by: 0

README

  1. Given a website Url crawl for all the links and filters pdf urls.
  2. Download and Parse the pdf files and extract the text content.
  3. Use Stanford NER library to identify Named Entities in extracted text
  4. Save to GIG API

How to Run:

1. set category var according the source category. eg. (Tenders, Gazettes, etc.)
2. go run pdf_crawler.go "https://site.lk"

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL