document

package
v1.1.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2026 License: MIT Imports: 25 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func New

func New(content string, mime string) core.Document

func Open

func Open(filePath string) (core.Document, error)

Types

type ParseFunc

type ParseFunc func(r io.Reader) (*RawDocument, error)

type RawDocument

type RawDocument struct {
	Text   string
	Images []core.Image
	Meta   map[string]any
}

func NewRawDoc

func NewRawDoc(text string) *RawDocument

func ParseCSV

func ParseCSV(r io.Reader) (*RawDocument, error)

func ParseDocx

func ParseDocx(r io.Reader) (*RawDocument, error)

ParseDocx reads a .docx file and converts it to Markdown. Uses only standard library (archive/zip + encoding/xml).

func ParseHTML

func ParseHTML(r io.Reader) (*RawDocument, error)

ParseHTML 将 HTML 内容转换为 Markdown 格式的 RawDocument。 使用 html-to-markdown 库处理标题、列表、链接、表格、图片、代码块等元素。 自动从 <title> 标签提取文档标题。

func ParseImage

func ParseImage(r io.Reader) (*RawDocument, error)

ParseImage 读取图片文件,返回 RawDocument 优化:只加载缩略图(224x224)以节省内存,并检测图片真实 MIME 类型

func ParsePDF

func ParsePDF(r io.Reader) (*RawDocument, error)

func ParsePPTX

func ParsePPTX(r io.Reader) (*RawDocument, error)

ParsePPTX reads a .pptx file and converts it to Markdown. Uses only standard library (archive/zip + encoding/xml).

func ParseText

func ParseText(r io.Reader) (*RawDocument, error)

Load 读取文本文件,返回RawDocument

func ParseXlsx

func ParseXlsx(r io.Reader) (*RawDocument, error)

ParseXlsx reads an .xlsx file and converts it to Markdown tables.

func (*RawDocument) AddImage

func (r *RawDocument) AddImage(data []byte) *RawDocument

func (*RawDocument) AddImages

func (r *RawDocument) AddImages(data [][]byte) *RawDocument

func (*RawDocument) GetID

func (r *RawDocument) GetID() string

func (*RawDocument) SetValue

func (r *RawDocument) SetValue(key string, value any) *RawDocument

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL