Documentation
¶
Index ¶
- func New(content string, mime string) core.Document
- func Open(filePath string) (core.Document, error)
- type ParseFunc
- type RawDocument
- func NewRawDoc(text string) *RawDocument
- func ParseCSV(r io.Reader) (*RawDocument, error)
- func ParseDocx(r io.Reader) (*RawDocument, error)
- func ParseHTML(r io.Reader) (*RawDocument, error)
- func ParseImage(r io.Reader) (*RawDocument, error)
- func ParsePDF(r io.Reader) (*RawDocument, error)
- func ParsePPTX(r io.Reader) (*RawDocument, error)
- func ParseText(r io.Reader) (*RawDocument, error)
- func ParseXlsx(r io.Reader) (*RawDocument, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type RawDocument ¶
func NewRawDoc ¶
func NewRawDoc(text string) *RawDocument
func ParseDocx ¶
func ParseDocx(r io.Reader) (*RawDocument, error)
ParseDocx reads a .docx file and converts it to Markdown. Uses only standard library (archive/zip + encoding/xml).
func ParseHTML ¶
func ParseHTML(r io.Reader) (*RawDocument, error)
ParseHTML 将 HTML 内容转换为 Markdown 格式的 RawDocument。 使用 html-to-markdown 库处理标题、列表、链接、表格、图片、代码块等元素。 自动从 <title> 标签提取文档标题。
func ParseImage ¶
func ParseImage(r io.Reader) (*RawDocument, error)
ParseImage 读取图片文件,返回 RawDocument 优化:只加载缩略图(224x224)以节省内存,并检测图片真实 MIME 类型
func ParsePPTX ¶
func ParsePPTX(r io.Reader) (*RawDocument, error)
ParsePPTX reads a .pptx file and converts it to Markdown. Uses only standard library (archive/zip + encoding/xml).
func ParseXlsx ¶
func ParseXlsx(r io.Reader) (*RawDocument, error)
ParseXlsx reads an .xlsx file and converts it to Markdown tables.
func (*RawDocument) AddImage ¶
func (r *RawDocument) AddImage(data []byte) *RawDocument
func (*RawDocument) AddImages ¶
func (r *RawDocument) AddImages(data [][]byte) *RawDocument
func (*RawDocument) GetID ¶
func (r *RawDocument) GetID() string
func (*RawDocument) SetValue ¶
func (r *RawDocument) SetValue(key string, value any) *RawDocument
Click to show internal directories.
Click to hide internal directories.