warc

package module
v0.0.0-...-eb7282a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2021 License: GPL-3.0 Imports: 13 Imported by: 4

README

Some Go code and tools for using WARC files.

Currently very use-case specific - I just want to read stored HTTP requests
and responses.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Read

func Read(in io.Reader) (*http.Response, error)

read an http response from an io.Reader

func ReadFile

func ReadFile(filename string) (*http.Response, error)

read an http response from a WARC file if filename has .gz suffix, gzip is assumed

func Write

func Write(w io.Writer, resp *http.Response, srcURL string, timeStamp time.Time) error

Write writes out an http response (including it's body). It tries to leave the response unaltered, although it works by reading in the entire Body, replacing it with a []byte-backed reader reset back to the beginning. This should be fine for most applications, just be aware that this means it's not 100% non-intrusive. TODO: pass in optional extra headers instead of srcURL

Types

type WARCReader

type WARCReader struct {
	// contains filtered or unexported fields
}

func NewReader

func NewReader(in io.Reader) *WARCReader

func (*WARCReader) ReadRecord

func (r *WARCReader) ReadRecord() (*WARCRecord, error)

ReadRecord reads the next WARC record in the file. nil,io.EOF is returned if no more records are available.

type WARCRecord

type WARCRecord struct {
	Version string

	// Header contains the WARC headers fields.
	// Note that the names are canonicalised, so
	// use "Warc-Target-Uri" instead of "WARC-Target-URI", for example.
	Header textproto.MIMEHeader

	// the payload data
	Block []byte
}

func (*WARCRecord) TargetURI

func (rec *WARCRecord) TargetURI() string

helper to read the "Warc-Target-Uri", stripping out any surrounding angle-brackets. The warc spec requires the uri to be contained within angle-brackets (ie "<http://example.com>"), but a lot of tooling and examples don't do this.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL