documentloaders

package

v0.0.0-...-7b6d569 Latest Latest Go to latest Published: Jan 20, 2024 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

Documentation ¶

Overview ¶

Package documentloaders includes a standard interface for loading documents from a source and implementations of this interface.

Index ¶

type CSV
- func NewCSV(r io.Reader, columns ...string) CSV
- func (c CSV) Load(_ context.Context) ([]schema.Document, error)
- func (c CSV) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
type HTML
- func NewHTML(r io.Reader) HTML
- func (h HTML) Load(_ context.Context) ([]schema.Document, error)
- func (h HTML) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
type Loader
type PDF
- func NewPDF(r io.ReaderAt, size int64, opts ...PDFOptions) PDF
- func (p PDF) Load(_ context.Context) ([]schema.Document, error)
- func (p PDF) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
type PDFOptions
- func WithPassword(password string) PDFOptions
type Text
- func NewText(r io.Reader) Text
- func (l Text) Load(_ context.Context) ([]schema.Document, error)
- func (l Text) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type CSV ¶

type CSV struct {
	// contains filtered or unexported fields
}

CSV represents a CSV document loader.

func NewCSV ¶

func NewCSV(r io.Reader, columns ...string) CSV

NewCSV creates a new csv loader with an io.Reader and optional column names for filtering.

func (CSV) Load ¶

func (c CSV) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader and returns a single document with the data.

func (CSV) LoadAndSplit ¶

func (c CSV) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.

type HTML ¶

type HTML struct {
	// contains filtered or unexported fields
}

HTML loads parses and sanitizes html content from an io.Reader.

func NewHTML ¶

func NewHTML(r io.Reader) HTML

NewHTML creates a new html loader with an io.Reader.

func (HTML) Load ¶

func (h HTML) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader and returns a single document with the data.

func (HTML) LoadAndSplit ¶

func (h HTML) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.

type Loader ¶

type Loader interface {
	// Loads loads from a source and returns documents.
	Load(ctx context.Context) ([]schema.Document, error)
	// LoadAndSplit loads from a source and splits the documents using a text splitter.
	LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
}

Loader is the interface for loading and splitting documents from a source.

type PDF ¶

type PDF struct {
	// contains filtered or unexported fields
}

PDF loads text data from an io.Reader.

func NewPDF ¶

func NewPDF(r io.ReaderAt, size int64, opts ...PDFOptions) PDF

NewText creates a new text loader with an io.Reader.

func (PDF) Load ¶

func (p PDF) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader for the PDF data and returns the documents with the data and with metadata attached of the page number and total number of pages of the PDF.

func (PDF) LoadAndSplit ¶

func (p PDF) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads pdf data from the io.Reader and splits it into multiple documents using a text splitter.

type PDFOptions ¶

type PDFOptions func(pdf *PDF)

PDFOptions are options for the PDF loader.

func WithPassword ¶

func WithPassword(password string) PDFOptions

WithPassword sets the password for the PDF.

type Text ¶

type Text struct {
	// contains filtered or unexported fields
}

Text loads text data from an io.Reader.

func NewText ¶

func NewText(r io.Reader) Text

NewText creates a new text loader with an io.Reader.

func (Text) Load ¶

func (l Text) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader and returns a single document with the data.

func (Text) LoadAndSplit ¶

func (l Text) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL