ltxmlharvest

package module
v0.0.0-...-a2351e4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 15, 2021 License: GPL-3.0 Imports: 13 Imported by: 0

README

ltxmlharvest

A new harvester for latexml-produced xhtml

Documentation

Overview

Package ltxmlharvest provides a MathWebSearch harvester for documents outputted by latexml

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func HarvestFS

func HarvestFS(fsys fs.FS, accept func(path string) bool, uri func(path string) string, writer func(path string, harvest Harvest) error, logger *log.Logger)

HarvestFS recursively harvests all files in fs.FS. Each directory will be grouped into a single harvest.

func HarvestReader

func HarvestReader(reader io.Reader, URI string, writer io.WriteCloser) error

HarvestReader harvests a single reader and writes the output to writer

Types

type Harvest

type Harvest []HarvestFragment

Harvest represents a single harvest. It implements sort.Interface

func HarvestFragments

func HarvestFragments(jobs []Job, logger *log.Logger) Harvest

HarvestFragments executes jobs and writes them to logger

func (Harvest) Len

func (harvest Harvest) Len() int

func (Harvest) Less

func (harvest Harvest) Less(i, j int) bool

func (Harvest) MarshalXML

func (harvest Harvest) MarshalXML(e *xml.Encoder, start xml.StartElement) error

MarshalXML marshals this harvest into xml form

func (Harvest) Swap

func (harvest Harvest) Swap(i, j int)

func (Harvest) WriteTo

func (harvest Harvest) WriteTo(writer io.Writer) (n int64, err error)

WriteTo writes this harvest into writer and returns (0, error)

type HarvestFormula

type HarvestFormula struct {
	// ID of this formula
	ID string

	// Dual (Content + Presentation) MathML contained in this document
	// Content and Presentation should be linked using "xref" attributes.
	// May use "m" and "mws" namespaces.
	DualMathML string

	// Content MathML corresponding to the DualMathML above.
	// Must use the "m" namespace.
	ContentMathML string
}

HarvestFormula represents a single formula found within the harvest

func ReadFormula

func ReadFormula(math *etree.Element) (HarvestFormula, error)

ReadFormula parses a formula based on element

type HarvestFragment

type HarvestFragment struct {
	// ID is an internal, but unique, id of this harvest fragment
	// typically just the running id of this fragment
	ID string

	// URI is the URI of the corresponding document
	URI string

	// XHTMLContent of this document, substiuting "math" + id for formulae
	XHTMLContent string

	// List of formulae within the harvest
	Formulae []HarvestFormula
}

HarvestFragment represents a single document fragment within a harvest

func (HarvestFragment) MarshalXML

func (frag HarvestFragment) MarshalXML(e *xml.Encoder, start xml.StartElement) error

MarshalXML marshals this document into xml

func (*HarvestFragment) ReadFrom

func (f *HarvestFragment) ReadFrom(reader io.Reader) (n int64, err error)

type Job

type Job struct {
	Reader func() (io.ReadCloser, error)
	URI    string
}

Job describes a job for the harvester

func JobFromFile

func JobFromFile(fsys fs.FS, path string, URI string) Job

JobFromFile creates a new Job from a file and a uribase

func (Job) Do

func (job Job) Do(wg *sync.WaitGroup, n int, fragments chan<- HarvestFragment, logger *log.Logger) (err error)

Directories

Path Synopsis
cmd
ltxmlharvest command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL