webhog

package
v0.0.0-...-eaa2c0c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2014 License: MIT Imports: 25 Imported by: 0

Documentation

Overview

webhog is a package that stores and downloads a given URL (including js, css, and images) for offline use and uploads it to a given AWS-S3 account.

Index

Constants

View Source
const (
	CompleteStatus  = "complete"
	ParsingStatus   = "parsing"
	UploadingStatus = "uploading"
	ErrorStatus     = "error"
)

Entity progression status's.

Variables

View Source
var Config = new(configuration)
View Source
var Conn = new(connection)

Global var to hold the DB connection

View Source
var EntityDir string

Stored temporary directory for the entity files.

View Source
var ExpirationTime = time.Hour * 168

Set a URL's expiration time to 1 week before it needs to be reprocessed.

View Source
var Models = []Model{}

Hold a reference to all models.

Functions

func ArchiveFinalFiles

func ArchiveFinalFiles(entDir string) (string, error)

Create a tar.gz compressed dir and add in found files for upload.

func Create

func Create(m Model) error

func Cursor

func Cursor(m Model) *mgo.Collection

func DeleteEntity

func DeleteEntity(entity Entity, r render.Render)

func Destroy

func Destroy(m Model, query interface{}) error

func Entities

func Entities(params martini.Params, r render.Render)

func ExtractData

func ExtractData(entity *Entity, url string)

Make a GET request to the given URL and start parsing its HTML.

func Find

func Find(m Model, query interface{}) *mgo.Query

func GetEntity

func GetEntity(params martini.Params, r render.Render)

func KeyRequired

func KeyRequired() martini.Handler

func LoadConfig

func LoadConfig() error

func LoadDB

func LoadDB()

Connect to the given database

func LoadRoutes

func LoadRoutes()

func NewEntityDir

func NewEntityDir() (err error)

Create a temporary dir to store entity files.

func ParseHTML

func ParseHTML(n *html.Node, entity *Entity, done chan bool)

Parse the HTML - pull the href/src attributes for js, css, and images for download.

func Register

func Register(m Model)

Register a model object into the Models reference.

func Scrape

func Scrape(url Url, r render.Render)

func StoreHTML

func StoreHTML(html bytes.Buffer, entDir string) (err error)

Stores the final HTML string into an index.html file.

func StoreResource

func StoreResource(resource, attr, entDir string) (name string, err error)

Stores the given js / css / img file into the given tempdir with a temp name.

func Update

func Update(m Model, query, updates interface{}) error

func UploadEntity

func UploadEntity(dir string, entity *Entity) (string, error)

Types

type Entity

type Entity struct {
	Id        bson.ObjectId `bson:"_id,omitempty" json:"id"`
	UUID      string        `bson:"uuid" json:"uuid"`
	Url       string        `bson:"url" json:"url"`
	AwsLink   string        `bson:"aws_link,omitempty" json:"aws_link"`
	Status    string        `bson:"status" json:"status"`
	CreatedAt time.Time     `bson:"created_at" json:"created_at"`
}

Entity is a representation of a webpage and it's corresponding UUID that's stored on AWS-S3

func NewScraper

func NewScraper(url string) (*Entity, error)

Start the scraping process.

func (Entity) Collection

func (e Entity) Collection() string

type Model

type Model interface {
	Collection() string
}

Interface that wraps DB models for a common querying interface.

type Url

type Url struct {
	Url  string `form:"url" json:"url"`
	UUID string `form:"uuid" json:"uuid"`
}

func (Url) Validate

func (urlType Url) Validate(errors binding.Errors, req *http.Request) binding.Errors

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL