archiver

package

v0.0.0-...-60192f8 Latest Latest Go to latest Published: Apr 26, 2024 License: AGPL-3.0 Imports: 20 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

codeberg.org/readeck/readeck

Links

Open Source Insights

README ¶

This package is a fork of Obelisk.

It adds some needed features such as:

A specific flag to fetch only images, and not all media
Callbacks for URL and content processing.
Ability to use your own HTTP Client
Ability to use any logger

Obelisk is originally written by RadhiFadlillah and released under an MIT License.

Documentation ¶

Overview ¶

Package archiver provides functions to archive the content of a full HTML page.

Index ¶

func DefaultImageProcessor(_ context.Context, _ *Archiver, input io.Reader, contentType string, ...) ([]byte, string, error)
func DefaultURLProcessor(_ string, content []byte, contentType string) string
func GetContextNode(ctx context.Context) (node *html.Node, ok bool)
type ArchiveFlag
type Archiver
- func New(req *Request) (*Archiver, error)
- func (arc *Archiver) Archive(ctx context.Context) error
- func (arc *Archiver) SendEvent(ctx context.Context, event Event)
type Asset
type Event
type EventError
- func (e *EventError) Fields() map[string]interface{}
type EventFetchURL
- func (e *EventFetchURL) Fields() map[string]interface{}
type EventInfo
- func (e EventInfo) Fields() map[string]interface{}
type EventStartHTML
- func (e EventStartHTML) Fields() map[string]interface{}
type Request

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DefaultImageProcessor ¶

func DefaultImageProcessor(_ context.Context, _ *Archiver,
	input io.Reader, contentType string, _ *url.URL,
) ([]byte, string, error)

DefaultImageProcessor is the default image processor. It simply reads and return the content.

func DefaultURLProcessor ¶

func DefaultURLProcessor(_ string, content []byte, contentType string) string

DefaultURLProcessor is the default URL processor. It returns the base64 encoded URL.

func GetContextNode ¶

func GetContextNode(ctx context.Context) (node *html.Node, ok bool)

GetContextNode returns the html node stored in the context.

Types ¶

type ArchiveFlag ¶

type ArchiveFlag uint8

ArchiveFlag is an archiver feature to enable.

const (
	// EnableCSS enables extraction of CSS files and tags.
	EnableCSS ArchiveFlag = 1 << iota

	// EnableEmbeds enables extraction of Embedes contents.
	EnableEmbeds

	// EnableJS enables extraction of JavaScript contents.
	EnableJS

	// EnableMedia enables extraction of media contents
	// other than image.
	EnableMedia

	// EnableImages enables extraction of images.
	EnableImages
)

type Archiver ¶

type Archiver struct {
	sync.RWMutex

	Cache   map[string]Asset
	Request *Request
	Result  []byte

	Flags ArchiveFlag

	ImageProcessor imageProcessor
	URLProcessor   urlProcessor
	EventHandler   eventHandler

	RequestTimeout        time.Duration
	SkipTLSVerification   bool
	MaxConcurrentDownload int64
	// contains filtered or unexported fields
}

Archiver is the core of obelisk, which used to download a web page then embeds its assets.

func New ¶

func New(req *Request) (*Archiver, error)

New creates a new Archiver using a Request instance.

func (*Archiver) Archive ¶

func (arc *Archiver) Archive(ctx context.Context) error

Archive starts archival process for the specified request. Returns the archival result, content type and error if there are any.

func (*Archiver) SendEvent ¶

func (arc *Archiver) SendEvent(ctx context.Context, event Event)

SendEvent is the function used to send an archiver event.

type Asset ¶

type Asset struct {
	Data        []byte
	ContentType string
}

Asset is asset that used in a web page.

type Event ¶

type Event interface {
	Fields() map[string]interface{}
}

Event is the interface for events emitted by the archiver.

type EventError ¶

type EventError struct {
	Err error
	URI string
}

EventError is the event emitted when errors occur.

func (*EventError) Fields ¶

func (e *EventError) Fields() map[string]interface{}

Fields returns the field map.

type EventFetchURL ¶

type EventFetchURL struct {
	// contains filtered or unexported fields
}

EventFetchURL is the event emitted when the archiver loads a remote resource.

func (*EventFetchURL) Fields ¶

func (e *EventFetchURL) Fields() map[string]interface{}

Fields returns the field map.

type EventInfo ¶

type EventInfo map[string]interface{}

EventInfo is a simple event for any type of data.

func (EventInfo) Fields ¶

func (e EventInfo) Fields() map[string]interface{}

Fields returns the field map.

type EventStartHTML ¶

type EventStartHTML string

EventStartHTML is the event emitted at the beginning of the archiving process.

func (EventStartHTML) Fields ¶

func (e EventStartHTML) Fields() map[string]interface{}

Fields returns the field map.

type Request ¶

type Request struct {
	Input  io.Reader
	URL    *url.URL
	Client *http.Client
}

Request is data of archival request.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL