rag

package

v1.6.1 Latest Latest Go to latest Published: Jan 29, 2026 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tsawler/tabula

Links

Open Source Insights

Documentation ¶

Overview ¶

Package rag provides semantic chunking for RAG (Retrieval-Augmented Generation) workflows. It implements hierarchical, context-aware chunking that respects document structure, ensuring chunks maintain complete thoughts rather than breaking mid-sentence or mid-list.

Package rag provides RAG (Retrieval-Augmented Generation) chunking and export functionality for LLM integration.

This package prepares extracted document content for use with large language models by providing semantic chunking and various export formats.

Chunking ¶

The Chunker splits documents into semantically meaningful chunks:

chunker := rag.NewChunker(config)
chunks := chunker.ChunkDocument(document)

Chunking respects document structure, avoiding splits in the middle of:

Tables
Lists
Paragraphs
Headings with their following content

Chunk Configuration ¶

Use ChunkerConfig to control chunking behavior:

MaxChunkSize - maximum tokens/characters per chunk
MinChunkSize - minimum chunk size (avoids tiny chunks)
Overlap - overlap between consecutive chunks
PreserveStructure - keep tables and lists intact

Chunk Metadata ¶

Each Chunk includes metadata for retrieval:

Page numbers and positions
Section headings
Content type (paragraph, table, list, etc.)
Relationships to other chunks

Export Formats ¶

Export chunks in various formats:

ToMarkdown() - Markdown with preserved structure
ToPlainText() - Plain text extraction
ToJSON() - Structured JSON output

Markdown Export ¶

The MarkdownOptions control markdown generation:

IncludeMetadata - add front matter
PreserveTables - use markdown table syntax
HeadingStyle - ATX (#) or Setext (===) headings

Index ¶

func ApplyOverlap(currentText string, overlap *OverlapResult, sectionTitle string, ...) string
func ConvertSize(value int, from, to SizeUnit) int
func GetListMarkerType(line string) string
func IsCaptionElement(elementType model.ElementType) bool
func IsFigureElement(elementType model.ElementType) bool
func IsListMarker(text string) bool
func IsTableElement(elementType model.ElementType) bool
func IsWithinAtomicBlock(index int, atomicBlocks []AtomicBlock) bool
func NormalizeListMarkers(text string, useNumbers bool) string
type AtomicBlock
- func GetAtomicBlockAt(index int, atomicBlocks []AtomicBlock) *AtomicBlock
type BatchExporter
- func NewBatchExporter(batchSize int) *BatchExporter
- func NewBatchExporterWithConfig(batchSize int, config ExportConfig) *BatchExporter
- func (be *BatchExporter) Export(chunks []*Chunk, callback func(ExportBatch) error) error
- func (be *BatchExporter) ExportToFiles(chunks []*Chunk, filenamePattern string) error
type Boundary
type BoundaryConfig
- func DefaultBoundaryConfig() BoundaryConfig
type BoundaryDetector
- func NewBoundaryDetector() *BoundaryDetector
- func NewBoundaryDetectorWithConfig(config BoundaryConfig) *BoundaryDetector
- func (d *BoundaryDetector) DetectBoundaries(blocks []ContentBlock) []Boundary
- func (d *BoundaryDetector) FindAtomicBlocks(blocks []ContentBlock) []AtomicBlock
- func (d *BoundaryDetector) FindBestBoundary(boundaries []Boundary, minPos, maxPos int) *Boundary
- func (d *BoundaryDetector) FindBoundaryWithLookAhead(boundaries []Boundary, targetPos int) *Boundary
- func (d *BoundaryDetector) ShouldKeepTogether(block1, block2 ContentBlock) bool
type BoundaryType
- func (bt BoundaryType) Score() int
- func (bt BoundaryType) String() string
type CaptionDetector
- func NewCaptionDetector() *CaptionDetector
- func NewCaptionDetectorWithConfig(config TableFigureConfig) *CaptionDetector
- func (d *CaptionDetector) FindFigureCaption(blocks []ContentBlock, figureIndex int) string
- func (d *CaptionDetector) FindTableCaption(blocks []ContentBlock, tableIndex int) string
type Chunk
- func NewChunk(id, text string, metadata ChunkMetadata) *Chunk
- func (c *Chunk) GenerateContextText(config MetadataConfig) string
- func (c *Chunk) GetSectionPathString() string
- func (c *Chunk) Summary() string
- func (c *Chunk) ToEmbeddingFormat() string
- func (c *Chunk) ToMarkdown() string
- func (c *Chunk) ToMarkdownWithOptions(opts MarkdownOptions) string
- func (c *Chunk) ToSearchableText() string
type ChunkCollection
- func ChunkDocument(doc *model.Document) *ChunkCollection
- func ChunkDocumentWithConfig(doc *model.Document, config ChunkerConfig, sizeConfig SizeConfig) *ChunkCollection
- func NewChunkCollection(chunks []*Chunk) *ChunkCollection
- func (cc *ChunkCollection) Count() int
- func (cc *ChunkCollection) ExportToFile(filename string, config ExportConfig) error
- func (cc *ChunkCollection) Filter(predicate func(*Chunk) bool) *ChunkCollection
- func (cc *ChunkCollection) FilterByElementType(elementType string) *ChunkCollection
- func (cc *ChunkCollection) FilterByMaxTokens(maxTokens int) *ChunkCollection
- func (cc *ChunkCollection) FilterByMinTokens(minTokens int) *ChunkCollection
- func (cc *ChunkCollection) FilterByPage(page int) *ChunkCollection
- func (cc *ChunkCollection) FilterByPageRange(startPage, endPage int) *ChunkCollection
- func (cc *ChunkCollection) FilterBySection(sectionTitle string) *ChunkCollection
- func (cc *ChunkCollection) FilterWithImages() *ChunkCollection
- func (cc *ChunkCollection) FilterWithLists() *ChunkCollection
- func (cc *ChunkCollection) FilterWithTables() *ChunkCollection
- func (cc *ChunkCollection) First() *Chunk
- func (cc *ChunkCollection) GetAllSections() []string
- func (cc *ChunkCollection) GetByID(id string) *Chunk
- func (cc *ChunkCollection) GetByIndex(index int) *Chunk
- func (cc *ChunkCollection) GetPageRange() (int, int)
- func (cc *ChunkCollection) GetTotalTokens() int
- func (cc *ChunkCollection) GetTotalWords() int
- func (cc *ChunkCollection) Last() *Chunk
- func (cc *ChunkCollection) Search(keyword string) *ChunkCollection
- func (cc *ChunkCollection) Statistics() CollectionStats
- func (cc *ChunkCollection) ToCSV() (string, error)
- func (cc *ChunkCollection) ToJSON() (string, error)
- func (cc *ChunkCollection) ToJSONL() (string, error)
- func (cc *ChunkCollection) ToMarkdown() string
- func (cc *ChunkCollection) ToMarkdownChunks() []string
- func (cc *ChunkCollection) ToMarkdownChunksWithOptions(opts MarkdownOptions) []string
- func (cc *ChunkCollection) ToMarkdownWithOptions(opts MarkdownOptions) string
- func (cc *ChunkCollection) ToSlice() []*Chunk
- func (cc *ChunkCollection) ToTSV() (string, error)
type ChunkLevel
- func (cl ChunkLevel) String() string
type ChunkMetadata
- func (m *ChunkMetadata) ContainsElementType(elementType string) bool
- func (m *ChunkMetadata) GetPageRange() string
- func (m *ChunkMetadata) GetReadingTimeMinutes(wordsPerMinute int) float64
- func (m *ChunkMetadata) GetReadingTimeString(wordsPerMinute int) string
- func (m *ChunkMetadata) GetSectionPathString(separator string) string
- func (m *ChunkMetadata) IsInSection(sectionTitle string) bool
- func (m *ChunkMetadata) IsOnPage(page int) bool
- func (m *ChunkMetadata) ToJSON() ([]byte, error)
- func (m *ChunkMetadata) ToJSONIndent() ([]byte, error)
- func (m *ChunkMetadata) ToMap() map[string]interface{}
type ChunkResult
type ChunkStats
type ChunkWithOverlap
- func ApplyOverlapToChunks(chunks []*Chunk, config OverlapConfig) []*ChunkWithOverlap
- func (c *ChunkWithOverlap) GetOriginalText() string
- func (c *ChunkWithOverlap) GetOverlapText() string
type ChunkWithOverlapResult
type Chunker
- func NewChunker() *Chunker
- func NewChunkerWithConfig(config ChunkerConfig) *Chunker
- func (c *Chunker) Chunk(doc *model.Document) (*ChunkResult, error)
- func (c *Chunker) ChunkWithOverlapEnabled(doc *model.Document) (*ChunkWithOverlapResult, error)
type ChunkerConfig
- func DefaultChunkerConfig() ChunkerConfig
type CollectionStats
- func (cs *CollectionStats) ToJSON() ([]byte, error)
type ContentBlock
type ContentElement
type ContextFormat
- func (cf ContextFormat) String() string
type DocumentChunkOptions
- func DefaultDocumentChunkOptions() DocumentChunkOptions
- func RAGOptimizedOptions() DocumentChunkOptions
type DocumentChunker
- func NewDocumentChunker() *DocumentChunker
- func NewDocumentChunkerWithConfig(config ChunkerConfig, sizeConfig SizeConfig) *DocumentChunker
- func (dc *DocumentChunker) ChunkDocument(doc *model.Document) *ChunkCollection
type EmbeddingExporter
- func NewEmbeddingExporter() *EmbeddingExporter
- func (ee *EmbeddingExporter) ExportForChroma(chunks []*Chunk, embeddings [][]float64, w io.Writer) error
- func (ee *EmbeddingExporter) ExportForPinecone(chunks []*Chunk, embeddings [][]float64, w io.Writer) error
- func (ee *EmbeddingExporter) ExportForWeaviate(chunks []*Chunk, embeddings [][]float64, className string, w io.Writer) error
- func (ee *EmbeddingExporter) PrepareForVectorDB(chunks []*Chunk) []EmbeddingRecord
type EmbeddingRecord
type ExportBatch
type ExportConfig
- func CSVExportConfig() ExportConfig
- func DefaultExportConfig() ExportConfig
- func JSONLExportConfig() ExportConfig
- func TSVExportConfig() ExportConfig
- func VectorDBExportConfig() ExportConfig
type ExportFormat
- func (ef ExportFormat) FileExtension() string
- func (ef ExportFormat) String() string
type ExportedChunk
type Exporter
- func NewExporter() *Exporter
- func NewExporterWithConfig(config ExportConfig) *Exporter
- func (e *Exporter) Export(chunks []*Chunk, w io.Writer) error
- func (e *Exporter) ExportToFile(chunks []*Chunk, filename string) error
- func (e *Exporter) ExportToString(chunks []*Chunk) (string, error)
type FigureChunk
- func (fc *FigureChunk) ToChunk(chunkIndex int) *Chunk
type LimitType
- func (lt LimitType) String() string
type ListBlock
type ListCoherenceAnalyzer
- func NewListCoherenceAnalyzer() *ListCoherenceAnalyzer
- func NewListCoherenceAnalyzerWithConfig(config ListCoherenceConfig) *ListCoherenceAnalyzer
- func (a *ListCoherenceAnalyzer) AnalyzeListBlock(listText string, precedingText string) *ListBlock
- func (a *ListCoherenceAnalyzer) AnalyzeListCoherence(blocks []ContentBlock) *ListCoherenceResult
- func (a *ListCoherenceAnalyzer) DetectListType(text string) ListType
- func (a *ListCoherenceAnalyzer) FindListSplitPoints(block *ListBlock) []int
- func (a *ListCoherenceAnalyzer) FormatListBlock(block *ListBlock, preserveMarkers bool) string
- func (a *ListCoherenceAnalyzer) IsListIntro(text string) bool
- func (a *ListCoherenceAnalyzer) ParseListItems(text string) []*ListItem
- func (a *ListCoherenceAnalyzer) ShouldKeepListTogether(block *ListBlock) bool
- func (a *ListCoherenceAnalyzer) SplitListBlock(block *ListBlock, atIndex int) (*ListBlock, *ListBlock)
type ListCoherenceConfig
- func DefaultListCoherenceConfig() ListCoherenceConfig
type ListCoherenceResult
type ListItem
type ListType
- func (lt ListType) String() string
type MarkdownOptions
- func DefaultMarkdownOptions() MarkdownOptions
- func RAGOptimizedMarkdownOptions() MarkdownOptions
type MetadataConfig
- func DefaultMetadataConfig() MetadataConfig
type OrphanedContentDetector
- func NewOrphanedContentDetector(minSize int) *OrphanedContentDetector
- func (o *OrphanedContentDetector) AdjustForOrphans(text string, position int, boundaries []Boundary) int
- func (o *OrphanedContentDetector) WouldCreateOrphan(text string, position int) bool
type OverlapConfig
- func DefaultOverlapConfig() OverlapConfig
type OverlapGenerator
- func NewOverlapGenerator() *OverlapGenerator
- func NewOverlapGeneratorWithConfig(config OverlapConfig) *OverlapGenerator
- func (og *OverlapGenerator) GenerateOverlap(chunkText string) *OverlapResult
type OverlapResult
type OverlapStats
type OverlapStrategy
- func (os OverlapStrategy) String() string
type Section
type SizeAction
- func (sa SizeAction) String() string
type SizeCalculator
- func NewSizeCalculator() *SizeCalculator
- func NewSizeCalculatorWithConfig(config SizeConfig) *SizeCalculator
- func (sc *SizeCalculator) Calculate(text string) SizeMetrics
- func (sc *SizeCalculator) Check(text string) SizeCheckResult
- func (sc *SizeCalculator) EstimateTokens(text string) int
- func (sc *SizeCalculator) ExceedsLimit(text string, limit SizeLimit) bool
- func (sc *SizeCalculator) FindSplitPoint(text string, boundaries []Boundary) int
- func (sc *SizeCalculator) FindSplitPointAt(text string, boundaries []Boundary, targetSize int, targetUnit SizeUnit) int
- func (sc *SizeCalculator) GetSize(text string, unit SizeUnit) int
- func (sc *SizeCalculator) IsAboveMax(text string) bool
- func (sc *SizeCalculator) IsBelowMin(text string) bool
- func (sc *SizeCalculator) IsWithinTarget(text string) bool
- func (sc *SizeCalculator) SplitToSize(text string, boundaries []Boundary) []string
type SizeCheckResult
type SizeConfig
- func ClaudeContextConfig() SizeConfig
- func CohereEmbeddingConfig() SizeConfig
- func DefaultSizeConfig() SizeConfig
- func LargeChunkConfig() SizeConfig
- func MediumChunkConfig() SizeConfig
- func OpenAIEmbeddingConfig() SizeConfig
- func SemanticSizeConfig(targetParagraphs, maxParagraphs int) SizeConfig
- func SmallChunkConfig() SizeConfig
- func TokenBasedSizeConfig(targetTokens, maxTokens int) SizeConfig
type SizeLimit
- func (sl SizeLimit) String() string
type SizeMetrics
- func (m SizeMetrics) GetByUnit(unit SizeUnit) int
type SizeUnit
- func (su SizeUnit) String() string
type StreamExporter
- func NewStreamExporter(w io.Writer) *StreamExporter
- func NewStreamExporterWithConfig(w io.Writer, config ExportConfig) *StreamExporter
- func (se *StreamExporter) Close() error
- func (se *StreamExporter) WriteChunk(chunk *Chunk, index int) error
type TableChunk
- func (tc *TableChunk) ToChunk(chunkIndex int) *Chunk
type TableFigureConfig
- func DefaultTableFigureConfig() TableFigureConfig
type TableFigureHandler
- func NewTableFigureHandler() *TableFigureHandler
- func NewTableFigureHandlerWithConfig(config TableFigureConfig) *TableFigureHandler
- func (h *TableFigureHandler) ProcessBlocks(blocks []ContentBlock) *TableFigureResult
- func (h *TableFigureHandler) ProcessFigure(image *model.Image, caption string, pageNumber int) *FigureChunk
- func (h *TableFigureHandler) ProcessTable(table *model.Table, caption string, pageNumber int) []*TableChunk
type TableFigureResult
type TableFigureStats
type TableFormat
- func (tf TableFormat) String() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ApplyOverlap ¶

func ApplyOverlap(currentText string, overlap *OverlapResult, sectionTitle string, includeContext bool) string

ApplyOverlap applies overlap from the previous chunk to the current chunk

func ConvertSize ¶

func ConvertSize(value int, from, to SizeUnit) int

ConvertSize converts a size value from one unit to another (approximate)

func GetListMarkerType ¶

func GetListMarkerType(line string) string

GetListMarkerType returns the marker type for a line

func IsCaptionElement ¶

func IsCaptionElement(elementType model.ElementType) bool

IsCaptionElement checks if an element type is a caption

func IsFigureElement ¶

func IsFigureElement(elementType model.ElementType) bool

IsFigureElement checks if an element type is a figure or image

func IsListMarker ¶

func IsListMarker(text string) bool

IsListMarker checks if text starts with any list marker

func IsTableElement ¶

func IsTableElement(elementType model.ElementType) bool

IsTableElement checks if an element type is a table

func IsWithinAtomicBlock ¶

func IsWithinAtomicBlock(index int, atomicBlocks []AtomicBlock) bool

IsWithinAtomicBlock checks if an index is within any atomic block

func NormalizeListMarkers ¶

func NormalizeListMarkers(text string, useNumbers bool) string

NormalizeListMarkers normalizes list markers to a consistent format

Types ¶

type AtomicBlock ¶

type AtomicBlock struct {
	StartIndex int
	EndIndex   int
	Type       string
	Reason     string
}

AtomicBlocks identifies blocks that should not be split

func GetAtomicBlockAt ¶

func GetAtomicBlockAt(index int, atomicBlocks []AtomicBlock) *AtomicBlock

GetAtomicBlockAt returns the atomic block containing the given index, if any

type BatchExporter ¶

type BatchExporter struct {
	// contains filtered or unexported fields
}

BatchExporter handles exporting large collections in batches

func NewBatchExporter ¶

func NewBatchExporter(batchSize int) *BatchExporter

NewBatchExporter creates a new batch exporter

func NewBatchExporterWithConfig ¶

func NewBatchExporterWithConfig(batchSize int, config ExportConfig) *BatchExporter

NewBatchExporterWithConfig creates a batch exporter with custom config

func (*BatchExporter) Export ¶

func (be *BatchExporter) Export(chunks []*Chunk, callback func(ExportBatch) error) error

Export exports chunks in batches, calling the callback for each batch

func (*BatchExporter) ExportToFiles ¶

func (be *BatchExporter) ExportToFiles(chunks []*Chunk, filenamePattern string) error

ExportToFiles exports chunks to numbered files

type Boundary ¶

type Boundary struct {
	// Type is the kind of boundary
	Type BoundaryType

	// Position is the character offset in the text
	Position int

	// Score is the priority score for splitting here
	Score int

	// ElementIndex is the index of the element this boundary follows
	ElementIndex int

	// Context provides additional information about the boundary
	Context string
}

Boundary represents a potential chunk boundary in the content

type BoundaryConfig ¶

type BoundaryConfig struct {
	// MinChunkSize is the minimum characters before considering a boundary
	MinChunkSize int

	// MaxChunkSize is the maximum characters before forcing a boundary
	MaxChunkSize int

	// PreferParagraphBreaks prefers paragraph boundaries over sentence boundaries
	PreferParagraphBreaks bool

	// KeepListsIntact tries to keep lists with their introductory text
	KeepListsIntact bool

	// KeepTablesIntact treats tables as atomic units
	KeepTablesIntact bool

	// KeepFiguresIntact keeps figures with their captions
	KeepFiguresIntact bool

	// LookAheadChars is how far to look ahead for better boundaries
	LookAheadChars int

	// ListIntroPatterns are patterns that indicate list introductions
	ListIntroPatterns []*regexp.Regexp
}

BoundaryConfig holds configuration for boundary detection

func DefaultBoundaryConfig ¶

func DefaultBoundaryConfig() BoundaryConfig

DefaultBoundaryConfig returns sensible defaults for boundary detection

type BoundaryDetector ¶

type BoundaryDetector struct {
	// contains filtered or unexported fields
}

BoundaryDetector detects semantic boundaries in content

func NewBoundaryDetector ¶

func NewBoundaryDetector() *BoundaryDetector

NewBoundaryDetector creates a new boundary detector with default configuration

func NewBoundaryDetectorWithConfig ¶

func NewBoundaryDetectorWithConfig(config BoundaryConfig) *BoundaryDetector

NewBoundaryDetectorWithConfig creates a boundary detector with custom configuration

func (*BoundaryDetector) DetectBoundaries ¶

func (d *BoundaryDetector) DetectBoundaries(blocks []ContentBlock) []Boundary

DetectBoundaries finds all semantic boundaries in a sequence of content blocks

func (*BoundaryDetector) FindAtomicBlocks ¶

func (d *BoundaryDetector) FindAtomicBlocks(blocks []ContentBlock) []AtomicBlock

FindAtomicBlocks identifies sequences of blocks that should stay together

func (*BoundaryDetector) FindBestBoundary ¶

func (d *BoundaryDetector) FindBestBoundary(boundaries []Boundary, minPos, maxPos int) *Boundary

FindBestBoundary finds the best boundary within a range for splitting

func (*BoundaryDetector) FindBoundaryWithLookAhead ¶

func (d *BoundaryDetector) FindBoundaryWithLookAhead(boundaries []Boundary, targetPos int) *Boundary

FindBoundaryWithLookAhead finds a boundary, looking ahead for better options

func (*BoundaryDetector) ShouldKeepTogether ¶

func (d *BoundaryDetector) ShouldKeepTogether(block1, block2 ContentBlock) bool

ShouldKeepTogether determines if two blocks should be kept in the same chunk

type BoundaryType ¶

type BoundaryType int

BoundaryType represents the type of semantic boundary

const (
	// BoundaryNone indicates no boundary (middle of content)
	BoundaryNone BoundaryType = iota
	// BoundarySentence indicates a sentence ending
	BoundarySentence
	// BoundaryParagraph indicates a paragraph break
	BoundaryParagraph
	// BoundaryList indicates end of a list
	BoundaryList
	// BoundaryListItem indicates end of a list item
	BoundaryListItem
	// BoundaryHeading indicates a heading (section break)
	BoundaryHeading
	// BoundaryTable indicates end of a table
	BoundaryTable
	// BoundaryFigure indicates end of a figure/image
	BoundaryFigure
	// BoundaryCodeBlock indicates end of a code block
	BoundaryCodeBlock
	// BoundaryPageBreak indicates a page break
	BoundaryPageBreak
)

func (BoundaryType) Score ¶

func (bt BoundaryType) Score() int

Score returns a priority score for this boundary type (higher = better split point)

func (BoundaryType) String ¶

func (bt BoundaryType) String() string

String returns a human-readable representation of the boundary type

type CaptionDetector ¶

type CaptionDetector struct {
	// contains filtered or unexported fields
}

CaptionDetector helps find captions associated with tables and figures

func NewCaptionDetector ¶

func NewCaptionDetector() *CaptionDetector

NewCaptionDetector creates a new caption detector

func NewCaptionDetectorWithConfig ¶

func NewCaptionDetectorWithConfig(config TableFigureConfig) *CaptionDetector

NewCaptionDetectorWithConfig creates a caption detector with custom config

func (*CaptionDetector) FindFigureCaption ¶

func (d *CaptionDetector) FindFigureCaption(blocks []ContentBlock, figureIndex int) string

FindFigureCaption searches for a caption near a figure

func (*CaptionDetector) FindTableCaption ¶

func (d *CaptionDetector) FindTableCaption(blocks []ContentBlock, tableIndex int) string

FindTableCaption searches for a caption near a table

type Chunk ¶

type Chunk struct {
	// ID is a unique identifier for this chunk
	ID string `json:"id"`

	// Text is the chunk content
	Text string `json:"text"`

	// TextWithContext is the text with section heading prepended for better retrieval
	TextWithContext string `json:"text_with_context,omitempty"`

	// Metadata contains rich contextual information
	Metadata ChunkMetadata `json:"metadata"`
}

Chunk represents a semantic unit of text extracted from a document for RAG

func NewChunk ¶

func NewChunk(id, text string, metadata ChunkMetadata) *Chunk

NewChunk creates a new chunk with the given text and metadata

func (*Chunk) GenerateContextText ¶

func (c *Chunk) GenerateContextText(config MetadataConfig) string

GenerateContextText generates context text based on configuration

func (*Chunk) GetSectionPathString ¶

func (c *Chunk) GetSectionPathString() string

GetSectionPathString returns the section path as a formatted string

func (*Chunk) Summary ¶

func (c *Chunk) Summary() string

Summary returns a brief summary of the chunk

func (*Chunk) ToEmbeddingFormat ¶

func (c *Chunk) ToEmbeddingFormat() string

ToEmbeddingFormat returns text optimized for embedding generation

func (*Chunk) ToMarkdown ¶

func (c *Chunk) ToMarkdown() string

ToMarkdown converts a chunk to markdown format

func (*Chunk) ToMarkdownWithOptions ¶

func (c *Chunk) ToMarkdownWithOptions(opts MarkdownOptions) string

ToMarkdownWithOptions converts a chunk to markdown with custom options

func (*Chunk) ToSearchableText ¶

func (c *Chunk) ToSearchableText() string

ToSearchableText returns text optimized for keyword search

type ChunkCollection ¶

type ChunkCollection struct {
	Chunks []*Chunk
}

ChunkCollection provides filtering and search over chunks

func ChunkDocument ¶

func ChunkDocument(doc *model.Document) *ChunkCollection

ChunkDocument is a convenience function to chunk a document with default settings

func ChunkDocumentWithConfig ¶

func ChunkDocumentWithConfig(doc *model.Document, config ChunkerConfig, sizeConfig SizeConfig) *ChunkCollection

ChunkDocumentWithConfig chunks a document with custom configuration

func NewChunkCollection ¶

func NewChunkCollection(chunks []*Chunk) *ChunkCollection

NewChunkCollection creates a new collection from chunks

func (*ChunkCollection) Count ¶

func (cc *ChunkCollection) Count() int

Count returns the number of chunks in the collection

func (*ChunkCollection) ExportToFile ¶

func (cc *ChunkCollection) ExportToFile(filename string, config ExportConfig) error

ExportToFile exports the collection to a file

func (*ChunkCollection) Filter ¶

func (cc *ChunkCollection) Filter(predicate func(*Chunk) bool) *ChunkCollection

Filter returns chunks matching a predicate

func (*ChunkCollection) FilterByElementType ¶

func (cc *ChunkCollection) FilterByElementType(elementType string) *ChunkCollection

FilterByElementType returns chunks containing a specific element type

func (*ChunkCollection) FilterByMaxTokens ¶

func (cc *ChunkCollection) FilterByMaxTokens(maxTokens int) *ChunkCollection

FilterByMaxTokens returns chunks with at most N estimated tokens

func (*ChunkCollection) FilterByMinTokens ¶

func (cc *ChunkCollection) FilterByMinTokens(minTokens int) *ChunkCollection

FilterByMinTokens returns chunks with at least N estimated tokens

func (*ChunkCollection) FilterByPage ¶

func (cc *ChunkCollection) FilterByPage(page int) *ChunkCollection

FilterByPage returns chunks on a specific page

func (*ChunkCollection) FilterByPageRange ¶

func (cc *ChunkCollection) FilterByPageRange(startPage, endPage int) *ChunkCollection

FilterByPageRange returns chunks within a page range

func (*ChunkCollection) FilterBySection ¶

func (cc *ChunkCollection) FilterBySection(sectionTitle string) *ChunkCollection

FilterBySection returns chunks in a specific section

func (*ChunkCollection) FilterWithImages ¶

func (cc *ChunkCollection) FilterWithImages() *ChunkCollection

FilterWithImages returns chunks containing images

func (*ChunkCollection) FilterWithLists ¶

func (cc *ChunkCollection) FilterWithLists() *ChunkCollection

FilterWithLists returns chunks containing lists

func (*ChunkCollection) FilterWithTables ¶

func (cc *ChunkCollection) FilterWithTables() *ChunkCollection

FilterWithTables returns chunks containing tables

func (*ChunkCollection) First ¶

func (cc *ChunkCollection) First() *Chunk

First returns the first chunk or nil

func (*ChunkCollection) GetAllSections ¶

func (cc *ChunkCollection) GetAllSections() []string

GetAllSections returns unique section titles

func (*ChunkCollection) GetByID ¶

func (cc *ChunkCollection) GetByID(id string) *Chunk

GetByID returns a chunk by ID

func (*ChunkCollection) GetByIndex ¶

func (cc *ChunkCollection) GetByIndex(index int) *Chunk

GetByIndex returns a chunk by index

func (*ChunkCollection) GetPageRange ¶

func (cc *ChunkCollection) GetPageRange() (int, int)

GetPageRange returns the min and max page numbers

func (*ChunkCollection) GetTotalTokens ¶

func (cc *ChunkCollection) GetTotalTokens() int

GetTotalTokens returns the sum of estimated tokens across all chunks

func (*ChunkCollection) GetTotalWords ¶

func (cc *ChunkCollection) GetTotalWords() int

GetTotalWords returns the sum of words across all chunks

func (*ChunkCollection) Last ¶

func (cc *ChunkCollection) Last() *Chunk

Last returns the last chunk or nil

func (*ChunkCollection) Search ¶

func (cc *ChunkCollection) Search(keyword string) *ChunkCollection

Search returns chunks containing a keyword (case-insensitive)

func (*ChunkCollection) Statistics ¶

func (cc *ChunkCollection) Statistics() CollectionStats

Statistics returns aggregate statistics about the collection

func (*ChunkCollection) ToCSV ¶

func (cc *ChunkCollection) ToCSV() (string, error)

ToCSV exports the collection as CSV

func (*ChunkCollection) ToJSON ¶

func (cc *ChunkCollection) ToJSON() (string, error)

ToJSON exports the collection as JSON array

func (*ChunkCollection) ToJSONL ¶

func (cc *ChunkCollection) ToJSONL() (string, error)

ToJSONL exports the collection as JSON Lines

func (*ChunkCollection) ToMarkdown ¶

func (cc *ChunkCollection) ToMarkdown() string

ToMarkdown converts all chunks to a combined markdown document

func (*ChunkCollection) ToMarkdownChunks ¶

func (cc *ChunkCollection) ToMarkdownChunks() []string

ToMarkdownChunks returns each chunk as a separate markdown string Useful when you need to process chunks individually but want markdown format

func (*ChunkCollection) ToMarkdownChunksWithOptions ¶

func (cc *ChunkCollection) ToMarkdownChunksWithOptions(opts MarkdownOptions) []string

ToMarkdownChunksWithOptions returns each chunk as separate markdown strings

func (*ChunkCollection) ToMarkdownWithOptions ¶

func (cc *ChunkCollection) ToMarkdownWithOptions(opts MarkdownOptions) string

ToMarkdownWithOptions converts all chunks to markdown with custom options

func (*ChunkCollection) ToSlice ¶

func (cc *ChunkCollection) ToSlice() []*Chunk

ToSlice returns the underlying slice

func (*ChunkCollection) ToTSV ¶

func (cc *ChunkCollection) ToTSV() (string, error)

ToTSV exports the collection as TSV

type ChunkLevel ¶

type ChunkLevel int

ChunkLevel represents the hierarchical level of a chunk

const (
	// ChunkLevelDocument represents the entire document as one chunk
	ChunkLevelDocument ChunkLevel = iota
	// ChunkLevelSection represents a section defined by headings
	ChunkLevelSection
	// ChunkLevelParagraph represents a single paragraph
	ChunkLevelParagraph
	// ChunkLevelSentence represents a single sentence (used for oversized paragraphs)
	ChunkLevelSentence
)

func (ChunkLevel) String ¶

func (cl ChunkLevel) String() string

String returns a human-readable representation of the chunk level

type ChunkMetadata ¶

type ChunkMetadata struct {
	// DocumentTitle is the title of the source document
	DocumentTitle string `json:"document_title,omitempty"`

	// SectionPath is the hierarchical path of headings (e.g., ["Chapter 1", "Introduction", "Overview"])
	SectionPath []string `json:"section_path,omitempty"`

	// SectionTitle is the immediate section heading (last element of SectionPath)
	SectionTitle string `json:"section_title,omitempty"`

	// HeadingLevel is the level of the current section (1-6, 0 if no heading)
	HeadingLevel int `json:"heading_level,omitempty"`

	// PageStart is the starting page number (1-indexed)
	PageStart int `json:"page_start"`

	// PageEnd is the ending page number (1-indexed)
	PageEnd int `json:"page_end"`

	// ChunkIndex is the position of this chunk in the document (0-indexed)
	ChunkIndex int `json:"chunk_index"`

	// TotalChunks is the total number of chunks in the document
	TotalChunks int `json:"total_chunks,omitempty"`

	// Level is the hierarchical level of this chunk
	Level ChunkLevel `json:"level"`

	// ParentID is the ID of the parent chunk (empty for top-level chunks)
	ParentID string `json:"parent_id,omitempty"`

	// ChildIDs are the IDs of child chunks
	ChildIDs []string `json:"child_ids,omitempty"`

	// ElementTypes lists the types of elements contained (paragraph, list, table, etc.)
	ElementTypes []string `json:"element_types,omitempty"`

	// HasTable indicates if the chunk contains a table
	HasTable bool `json:"has_table,omitempty"`

	// HasList indicates if the chunk contains a list
	HasList bool `json:"has_list,omitempty"`

	// HasImage indicates if the chunk contains an image
	HasImage bool `json:"has_image,omitempty"`

	// CharCount is the number of characters in the chunk text
	CharCount int `json:"char_count"`

	// WordCount is the number of words in the chunk text
	WordCount int `json:"word_count"`

	// EstimatedTokens is an estimated token count (chars/4 as rough approximation)
	EstimatedTokens int `json:"estimated_tokens"`

	// BBox is the bounding box of the chunk content on the page
	BBox *model.BBox `json:"bbox,omitempty"`
}

ChunkMetadata contains rich metadata about a chunk's context within the document

func (*ChunkMetadata) ContainsElementType ¶

func (m *ChunkMetadata) ContainsElementType(elementType string) bool

ContainsElementType checks if the chunk contains a specific element type

func (*ChunkMetadata) GetPageRange ¶

func (m *ChunkMetadata) GetPageRange() string

GetPageRange returns a formatted page range string

func (*ChunkMetadata) GetReadingTimeMinutes ¶

func (m *ChunkMetadata) GetReadingTimeMinutes(wordsPerMinute int) float64

GetReadingTimeMinutes estimates reading time in minutes

func (*ChunkMetadata) GetReadingTimeString ¶

func (m *ChunkMetadata) GetReadingTimeString(wordsPerMinute int) string

GetReadingTimeString returns a human-readable reading time

func (*ChunkMetadata) GetSectionPathString ¶

func (m *ChunkMetadata) GetSectionPathString(separator string) string

GetSectionPathString returns the section path as a formatted string

func (*ChunkMetadata) IsInSection ¶

func (m *ChunkMetadata) IsInSection(sectionTitle string) bool

IsInSection checks if the chunk is within a given section path

func (*ChunkMetadata) IsOnPage ¶

func (m *ChunkMetadata) IsOnPage(page int) bool

IsOnPage checks if the chunk spans a given page

func (*ChunkMetadata) ToJSON ¶

func (m *ChunkMetadata) ToJSON() ([]byte, error)

ToJSON serializes metadata to JSON

func (*ChunkMetadata) ToJSONIndent ¶

func (m *ChunkMetadata) ToJSONIndent() ([]byte, error)

ToJSONIndent serializes metadata to indented JSON

func (*ChunkMetadata) ToMap ¶

func (m *ChunkMetadata) ToMap() map[string]interface{}

ToMap converts metadata to a map for flexible access

type ChunkResult ¶

type ChunkResult struct {
	// Chunks are the generated chunks in reading order
	Chunks []*Chunk

	// DocumentTitle is the document title if available
	DocumentTitle string

	// TotalPages is the total number of pages processed
	TotalPages int

	// Statistics about the chunking process
	Stats ChunkStats
}

ChunkResult contains the chunking output

type ChunkStats ¶

type ChunkStats struct {
	TotalChunks     int
	TotalCharacters int
	TotalWords      int
	TotalTokensEst  int
	AvgChunkSize    int
	MinChunkSize    int
	MaxChunkSize    int
	SectionChunks   int
	ParagraphChunks int
	SentenceChunks  int
}

ChunkStats contains statistics about the chunking process

type ChunkWithOverlap ¶

type ChunkWithOverlap struct {
	*Chunk

	// OverlapPrefix is the overlap content prepended from previous chunk
	OverlapPrefix string

	// OverlapSuffix is the overlap content that will be prepended to next chunk
	OverlapSuffix string

	// HasOverlapPrefix indicates if this chunk has overlap from previous
	HasOverlapPrefix bool

	// HasOverlapSuffix indicates if this chunk provides overlap to next
	HasOverlapSuffix bool
}

ChunkWithOverlap represents a chunk with its overlap information

func ApplyOverlapToChunks ¶

func ApplyOverlapToChunks(chunks []*Chunk, config OverlapConfig) []*ChunkWithOverlap

ApplyOverlapToChunks adds overlap between consecutive chunks

func (*ChunkWithOverlap) GetOriginalText ¶

func (c *ChunkWithOverlap) GetOriginalText() string

GetOriginalText returns the chunk text without overlap prefix

func (*ChunkWithOverlap) GetOverlapText ¶

func (c *ChunkWithOverlap) GetOverlapText() string

GetOverlapText returns just the overlap portion of a chunk (for analysis)

type ChunkWithOverlapResult ¶

type ChunkWithOverlapResult struct {
	// Chunks are the generated chunks with overlap information
	Chunks []*ChunkWithOverlap

	// DocumentTitle is the document title if available
	DocumentTitle string

	// TotalPages is the total number of pages processed
	TotalPages int

	// Statistics about the chunking process
	Stats ChunkStats

	// OverlapStats contains overlap-specific statistics
	OverlapStats OverlapStats
}

ChunkWithOverlapResult contains chunking output with overlap information

type Chunker ¶

type Chunker struct {
	// contains filtered or unexported fields
}

Chunker performs semantic chunking of documents

func NewChunker ¶

func NewChunker() *Chunker

NewChunker creates a new chunker with default configuration

func NewChunkerWithConfig ¶

func NewChunkerWithConfig(config ChunkerConfig) *Chunker

NewChunkerWithConfig creates a chunker with custom configuration

func (*Chunker) Chunk ¶

func (c *Chunker) Chunk(doc *model.Document) (*ChunkResult, error)

Chunk processes a document and returns semantic chunks

func (*Chunker) ChunkWithOverlapEnabled ¶

func (c *Chunker) ChunkWithOverlapEnabled(doc *model.Document) (*ChunkWithOverlapResult, error)

ChunkWithOverlapEnabled processes a document and returns chunks with overlap

type ChunkerConfig ¶

type ChunkerConfig struct {
	// TargetChunkSize is the target size for chunks in characters
	// Default: 1000
	TargetChunkSize int

	// MaxChunkSize is the hard limit for chunk size in characters
	// Chunks will be split at sentence boundaries if they exceed this
	// Default: 2000
	MaxChunkSize int

	// MinChunkSize is the minimum size for a chunk in characters
	// Smaller chunks may be merged with adjacent content
	// Default: 100
	MinChunkSize int

	// OverlapSize is the number of characters to overlap between chunks
	// Default: 100
	OverlapSize int

	// OverlapSentences when true, uses sentence-based overlap instead of character-based
	// Default: true
	OverlapSentences bool

	// PreserveListCoherence keeps list intros with their items
	// Default: true
	PreserveListCoherence bool

	// PreserveTableCoherence keeps tables as atomic units
	// Default: true
	PreserveTableCoherence bool

	// IncludeSectionContext prepends section heading to chunk text
	// Default: true
	IncludeSectionContext bool

	// SplitOnHeadings creates new chunks at heading boundaries
	// Default: true
	SplitOnHeadings bool

	// MinHeadingLevel is the minimum heading level to split on (1-6)
	// Lower numbers = split on more headings
	// Default: 3 (split on H1, H2, H3)
	MinHeadingLevel int

	// PreserveParagraphs tries to keep paragraphs intact
	// Default: true
	PreserveParagraphs bool

	// IDPrefix is a prefix for generated chunk IDs
	// Default: "chunk"
	IDPrefix string
}

ChunkerConfig holds configuration options for the chunker

func DefaultChunkerConfig ¶

func DefaultChunkerConfig() ChunkerConfig

DefaultChunkerConfig returns sensible default configuration

type CollectionStats ¶

type CollectionStats struct {
	TotalChunks      int
	TotalTokens      int
	TotalWords       int
	TotalChars       int
	AvgTokens        int
	MinTokens        int
	MaxTokens        int
	ChunksWithTables int
	ChunksWithLists  int
	ChunksWithImages int
	UniqueSections   int
	PageStart        int
	PageEnd          int
}

CollectionStats contains aggregate statistics about a chunk collection

func (*CollectionStats) ToJSON ¶

func (cs *CollectionStats) ToJSON() ([]byte, error)

ToJSON serializes stats to JSON

type ContentBlock ¶

type ContentBlock struct {
	Type     model.ElementType
	Text     string
	Page     int
	Index    int
	ListInfo *model.ListInfo
	IsIntro  bool // True if this appears to introduce the next element
}

ContentBlock represents a block of content for boundary detection

type ContentElement ¶

type ContentElement struct {
	Type     model.ElementType
	Text     string
	Page     int
	BBox     model.BBox
	ListInfo *model.ListInfo
}

ContentElement represents a piece of content within a section

type ContextFormat ¶

type ContextFormat int

ContextFormat defines how context is injected into chunk text

const (
	// ContextFormatNone adds no context
	ContextFormatNone ContextFormat = iota
	// ContextFormatBracket adds context in brackets: [Section Title]
	ContextFormatBracket
	// ContextFormatMarkdown adds context as markdown heading
	ContextFormatMarkdown
	// ContextFormatBreadcrumb adds full path as breadcrumb
	ContextFormatBreadcrumb
	// ContextFormatXML adds context in XML-style tags
	ContextFormatXML
)

func (ContextFormat) String ¶

func (cf ContextFormat) String() string

String returns a human-readable representation of the context format

type DocumentChunkOptions ¶

type DocumentChunkOptions struct {
	ChunkerConfig ChunkerConfig
	SizeConfig    SizeConfig
}

DocumentChunkOptions holds options for document chunking

func DefaultDocumentChunkOptions ¶

func DefaultDocumentChunkOptions() DocumentChunkOptions

DefaultDocumentChunkOptions returns default chunking options

func RAGOptimizedOptions ¶

func RAGOptimizedOptions() DocumentChunkOptions

RAGOptimizedOptions returns options optimized for RAG workflows

type DocumentChunker ¶

type DocumentChunker struct {
	// contains filtered or unexported fields
}

DocumentChunker provides RAG chunking for Document objects

func NewDocumentChunker ¶

func NewDocumentChunker() *DocumentChunker

NewDocumentChunker creates a new document chunker with default configuration

func NewDocumentChunkerWithConfig ¶

func NewDocumentChunkerWithConfig(config ChunkerConfig, sizeConfig SizeConfig) *DocumentChunker

NewDocumentChunkerWithConfig creates a document chunker with custom configuration

func (*DocumentChunker) ChunkDocument ¶

func (dc *DocumentChunker) ChunkDocument(doc *model.Document) *ChunkCollection

ChunkDocument chunks a Document into semantic units for RAG

type EmbeddingExporter ¶

type EmbeddingExporter struct {
	// contains filtered or unexported fields
}

EmbeddingExporter exports chunks with embeddings for vector databases

func NewEmbeddingExporter ¶

func NewEmbeddingExporter() *EmbeddingExporter

NewEmbeddingExporter creates an exporter optimized for embedding export

func (*EmbeddingExporter) ExportForChroma ¶

func (ee *EmbeddingExporter) ExportForChroma(chunks []*Chunk, embeddings [][]float64, w io.Writer) error

ExportForChroma exports in Chroma-compatible format

func (*EmbeddingExporter) ExportForPinecone ¶

func (ee *EmbeddingExporter) ExportForPinecone(chunks []*Chunk, embeddings [][]float64, w io.Writer) error

ExportForPinecone exports in Pinecone-compatible format

func (*EmbeddingExporter) ExportForWeaviate ¶

func (ee *EmbeddingExporter) ExportForWeaviate(chunks []*Chunk, embeddings [][]float64, className string, w io.Writer) error

ExportForWeaviate exports in Weaviate-compatible format

func (*EmbeddingExporter) PrepareForVectorDB ¶

func (ee *EmbeddingExporter) PrepareForVectorDB(chunks []*Chunk) []EmbeddingRecord

PrepareForVectorDB prepares chunks for vector database ingestion

type EmbeddingRecord ¶

type EmbeddingRecord struct {
	ID        string                 `json:"id"`
	Text      string                 `json:"text"`
	Embedding []float64              `json:"embedding,omitempty"`
	Metadata  map[string]interface{} `json:"metadata,omitempty"`
}

EmbeddingRecord represents a single record for vector DB ingestion

type ExportBatch ¶

type ExportBatch struct {
	// BatchNumber is the zero-indexed batch number
	BatchNumber int

	// StartIndex is the starting chunk index in the original collection
	StartIndex int

	// EndIndex is the ending chunk index (exclusive)
	EndIndex int

	// ChunkCount is the number of chunks in this batch
	ChunkCount int

	// Data contains the exported data
	Data string
}

ExportBatch represents a single exported batch

type ExportConfig ¶

type ExportConfig struct {
	// Format specifies the export format
	Format ExportFormat

	// IncludeMetadata determines which metadata fields to include
	IncludeMetadata bool

	// MetadataFields specifies which metadata fields to include (nil = all)
	MetadataFields []string

	// IncludeText includes the chunk text content
	IncludeText bool

	// IncludeEmbeddings includes embedding vectors if present
	IncludeEmbeddings bool

	// FlattenMetadata flattens nested metadata into dot-notation keys
	FlattenMetadata bool

	// CSVDelimiter specifies the delimiter for CSV export (default: comma)
	CSVDelimiter rune

	// IncludeHeader includes header row in CSV/TSV exports
	IncludeHeader bool

	// PrettyPrint enables pretty printing for JSON formats
	PrettyPrint bool

	// TextColumnName specifies the column name for text content
	TextColumnName string

	// ChunkIDColumnName specifies the column name for chunk ID
	ChunkIDColumnName string
}

ExportConfig holds configuration options for export

func CSVExportConfig ¶

func CSVExportConfig() ExportConfig

CSVExportConfig returns config optimized for CSV export

func DefaultExportConfig ¶

func DefaultExportConfig() ExportConfig

DefaultExportConfig returns sensible defaults for export configuration

func JSONLExportConfig ¶

func JSONLExportConfig() ExportConfig

JSONLExportConfig returns config optimized for JSON Lines export

func TSVExportConfig ¶

func TSVExportConfig() ExportConfig

TSVExportConfig returns config optimized for TSV export

func VectorDBExportConfig ¶

func VectorDBExportConfig() ExportConfig

VectorDBExportConfig returns config optimized for vector DB ingestion

type ExportFormat ¶

type ExportFormat int

ExportFormat defines the available export formats

const (
	// ExportFormatJSONL exports as JSON Lines (one JSON object per line)
	ExportFormatJSONL ExportFormat = iota
	// ExportFormatJSON exports as a JSON array
	ExportFormatJSON
	// ExportFormatCSV exports as comma-separated values
	ExportFormatCSV
	// ExportFormatTSV exports as tab-separated values
	ExportFormatTSV
)

func (ExportFormat) FileExtension ¶

func (ef ExportFormat) FileExtension() string

FileExtension returns the typical file extension for this format

func (ExportFormat) String ¶

func (ef ExportFormat) String() string

String returns a human-readable representation of the export format

type ExportedChunk ¶

type ExportedChunk struct {
	// ID is the unique identifier for the chunk
	ID string `json:"id,omitempty"`

	// Text is the chunk content
	Text string `json:"text,omitempty"`

	// Metadata holds all metadata fields as a map
	Metadata map[string]interface{} `json:"metadata,omitempty"`

	// Embeddings holds the embedding vector(s) if present
	Embeddings []float64 `json:"embeddings,omitempty"`

	// Source document information
	DocumentTitle string `json:"document_title,omitempty"`
	PageStart     int    `json:"page_start,omitempty"`
	PageEnd       int    `json:"page_end,omitempty"`

	// Position within the document
	ChunkIndex int `json:"chunk_index,omitempty"`

	// Section information
	SectionTitle string   `json:"section_title,omitempty"`
	SectionPath  []string `json:"section_path,omitempty"`

	// Content indicators
	HasTable bool `json:"has_table,omitempty"`
	HasList  bool `json:"has_list,omitempty"`
	HasImage bool `json:"has_image,omitempty"`
}

ExportedChunk represents a chunk prepared for export

type Exporter ¶

type Exporter struct {
	// contains filtered or unexported fields
}

Exporter handles exporting chunks to various formats

func NewExporter ¶

func NewExporter() *Exporter

NewExporter creates a new exporter with default configuration

func NewExporterWithConfig ¶

func NewExporterWithConfig(config ExportConfig) *Exporter

NewExporterWithConfig creates an exporter with custom configuration

func (*Exporter) Export ¶

func (e *Exporter) Export(chunks []*Chunk, w io.Writer) error

Export exports chunks to the specified writer

func (*Exporter) ExportToFile ¶

func (e *Exporter) ExportToFile(chunks []*Chunk, filename string) error

ExportToFile exports chunks to a file

func (*Exporter) ExportToString ¶

func (e *Exporter) ExportToString(chunks []*Chunk) (string, error)

ExportToString exports chunks to a string

type FigureChunk ¶

type FigureChunk struct {
	// Image is the source image (if available)
	Image *model.Image

	// Caption is the associated caption text
	Caption string

	// HasCaption indicates if a caption was found
	HasCaption bool

	// AltText is alternative text for the image
	AltText string

	// Description is a generated description
	Description string

	// Format is the image format
	Format string

	// PageNumber is the source page
	PageNumber int
}

FigureChunk represents a figure/image as a chunk

func (*FigureChunk) ToChunk ¶

func (fc *FigureChunk) ToChunk(chunkIndex int) *Chunk

ToChunk converts a FigureChunk to a generic Chunk

type LimitType ¶

type LimitType int

LimitType defines whether a limit is soft or hard

const (
	// LimitTypeSoft is a preference - try not to exceed but allow if necessary
	LimitTypeSoft LimitType = iota
	// LimitTypeHard is a strict limit - must not exceed
	LimitTypeHard
)

func (LimitType) String ¶

func (lt LimitType) String() string

String returns a human-readable representation of the limit type

type ListBlock ¶

type ListBlock struct {
	// Type is the kind of list
	Type ListType

	// IntroText is the introductory paragraph (if any)
	IntroText string

	// HasIntro indicates if there's an introductory paragraph
	HasIntro bool

	// Items are the list items
	Items []*ListItem

	// MaxLevel is the deepest nesting level
	MaxLevel int

	// TotalItems is the total count including nested items
	TotalItems int

	// IsComplete indicates if the list is complete
	IsComplete bool
}

ListBlock represents a complete list with its context

type ListCoherenceAnalyzer ¶

type ListCoherenceAnalyzer struct {
	// contains filtered or unexported fields
}

ListCoherenceAnalyzer analyzes and manages list coherence

func NewListCoherenceAnalyzer ¶

func NewListCoherenceAnalyzer() *ListCoherenceAnalyzer

NewListCoherenceAnalyzer creates a new analyzer with default config

func NewListCoherenceAnalyzerWithConfig ¶

func NewListCoherenceAnalyzerWithConfig(config ListCoherenceConfig) *ListCoherenceAnalyzer

NewListCoherenceAnalyzerWithConfig creates an analyzer with custom config

func (*ListCoherenceAnalyzer) AnalyzeListBlock ¶

func (a *ListCoherenceAnalyzer) AnalyzeListBlock(listText string, precedingText string) *ListBlock

AnalyzeListBlock creates a complete ListBlock from text

func (*ListCoherenceAnalyzer) AnalyzeListCoherence ¶

func (a *ListCoherenceAnalyzer) AnalyzeListCoherence(blocks []ContentBlock) *ListCoherenceResult

AnalyzeListCoherence analyzes list coherence in a sequence of text blocks

func (*ListCoherenceAnalyzer) DetectListType ¶

func (a *ListCoherenceAnalyzer) DetectListType(text string) ListType

DetectListType identifies the type of list from its content

func (*ListCoherenceAnalyzer) FindListSplitPoints ¶

func (a *ListCoherenceAnalyzer) FindListSplitPoints(block *ListBlock) []int

FindListSplitPoints finds safe points to split a large list

func (*ListCoherenceAnalyzer) FormatListBlock ¶

func (a *ListCoherenceAnalyzer) FormatListBlock(block *ListBlock, preserveMarkers bool) string

FormatListBlock formats a list block back to text

func (*ListCoherenceAnalyzer) IsListIntro ¶

func (a *ListCoherenceAnalyzer) IsListIntro(text string) bool

IsListIntro checks if text appears to introduce a list

func (*ListCoherenceAnalyzer) ParseListItems ¶

func (a *ListCoherenceAnalyzer) ParseListItems(text string) []*ListItem

ParseListItems extracts structured list items from text

func (*ListCoherenceAnalyzer) ShouldKeepListTogether ¶

func (a *ListCoherenceAnalyzer) ShouldKeepListTogether(block *ListBlock) bool

ShouldKeepListTogether determines if a list should be kept as one chunk

func (*ListCoherenceAnalyzer) SplitListBlock ¶

func (a *ListCoherenceAnalyzer) SplitListBlock(block *ListBlock, atIndex int) (*ListBlock, *ListBlock)

SplitListBlock splits a list at the specified item index

type ListCoherenceConfig ¶

type ListCoherenceConfig struct {
	// KeepIntroWithList keeps introductory text with the list
	KeepIntroWithList bool

	// MaxIntroDistance is max chars between intro and list
	MaxIntroDistance int

	// PreserveNesting keeps nested lists together
	PreserveNesting bool

	// MaxListSize is max chars for a list before considering split
	MaxListSize int

	// MinItemsBeforeSplit is minimum items to have before splitting
	MinItemsBeforeSplit int

	// AllowSplitAtLevel allows splitting only at this nesting level or higher
	AllowSplitAtLevel int

	// IntroPatterns are patterns that detect list introductions
	IntroPatterns []*regexp.Regexp
}

ListCoherenceConfig holds configuration for list coherence

func DefaultListCoherenceConfig ¶

func DefaultListCoherenceConfig() ListCoherenceConfig

DefaultListCoherenceConfig returns sensible defaults

type ListCoherenceResult ¶

type ListCoherenceResult struct {
	// Blocks are the identified list blocks
	Blocks []*ListBlock

	// IntroOrphans are introductions without following lists
	IntroOrphans []string

	// TotalLists is the number of lists found
	TotalLists int

	// ListsWithIntros is the number of lists with introductions
	ListsWithIntros int

	// NestedLists is the number of lists with nesting
	NestedLists int
}

ListCoherenceResult holds the result of list coherence analysis

type ListItem ¶

type ListItem struct {
	// Text is the item content
	Text string

	// Marker is the bullet/number (e.g., "•", "1.", "a)")
	Marker string

	// Level is the nesting level (0 = top level)
	Level int

	// Index is the position in the list
	Index int

	// Children are nested list items
	Children []*ListItem

	// IsComplete indicates if the item text is complete
	IsComplete bool
}

ListItem represents a single item in a list

type ListType ¶

type ListType int

ListType represents the type of list

const (
	// ListTypeUnordered is a bullet list
	ListTypeUnordered ListType = iota
	// ListTypeOrdered is a numbered list
	ListTypeOrdered
	// ListTypeDefinition is a definition list (term: definition)
	ListTypeDefinition
	// ListTypeChecklist is a checkbox list
	ListTypeChecklist
)

func (ListType) String ¶

func (lt ListType) String() string

String returns a human-readable representation of the list type

type MarkdownOptions ¶

type MarkdownOptions struct {
	// IncludeMetadata adds metadata comments at the start
	IncludeMetadata bool

	// IncludeTableOfContents generates a TOC from section headings
	IncludeTableOfContents bool

	// IncludeChunkSeparators adds horizontal rules between chunks
	IncludeChunkSeparators bool

	// IncludePageNumbers adds page references
	IncludePageNumbers bool

	// IncludeChunkIDs adds chunk IDs as HTML comments
	IncludeChunkIDs bool

	// HeadingLevelOffset adjusts heading levels (e.g., 1 makes H1 -> H2)
	HeadingLevelOffset int

	// MaxHeadingLevel caps heading depth (default: 6)
	MaxHeadingLevel int

	// SectionSeparator is text between major sections (default: "\n\n---\n\n")
	SectionSeparator string
}

MarkdownOptions configures markdown output generation

func DefaultMarkdownOptions ¶

func DefaultMarkdownOptions() MarkdownOptions

DefaultMarkdownOptions returns sensible defaults for markdown generation

func RAGOptimizedMarkdownOptions ¶

func RAGOptimizedMarkdownOptions() MarkdownOptions

RAGOptimizedMarkdownOptions returns options optimized for RAG ingestion

type MetadataConfig ¶

type MetadataConfig struct {
	// ContextFormat determines how context is added to chunk text
	ContextFormat ContextFormat

	// IncludeDocumentTitle includes document title in context
	IncludeDocumentTitle bool

	// IncludePageNumbers includes page numbers in context
	IncludePageNumbers bool

	// IncludeSectionPath includes full section path (not just title)
	IncludeSectionPath bool

	// WordsPerMinute for reading time estimation (default: 200)
	WordsPerMinute int
}

MetadataConfig holds configuration for metadata handling

func DefaultMetadataConfig ¶

func DefaultMetadataConfig() MetadataConfig

DefaultMetadataConfig returns sensible defaults

type OrphanedContentDetector ¶

type OrphanedContentDetector struct {
	// MinOrphanSize is the minimum size for standalone content
	MinOrphanSize int
}

OrphanedContentDetector helps avoid creating orphaned content at chunk boundaries

func NewOrphanedContentDetector ¶

func NewOrphanedContentDetector(minSize int) *OrphanedContentDetector

NewOrphanedContentDetector creates a new orphan detector

func (*OrphanedContentDetector) AdjustForOrphans ¶

func (o *OrphanedContentDetector) AdjustForOrphans(text string, position int, boundaries []Boundary) int

AdjustForOrphans adjusts a split position to avoid orphaned content

func (*OrphanedContentDetector) WouldCreateOrphan ¶

func (o *OrphanedContentDetector) WouldCreateOrphan(text string, position int) bool

WouldCreateOrphan checks if splitting at position would create orphaned content

type OverlapConfig ¶

type OverlapConfig struct {
	// Strategy determines how overlap is computed
	Strategy OverlapStrategy

	// Size is the target overlap size in characters (for character-based)
	// or number of sentences/paragraphs (for sentence/paragraph-based)
	Size int

	// MinOverlap is the minimum overlap to include (avoids tiny overlaps)
	MinOverlap int

	// MaxOverlap is the maximum overlap allowed (prevents excessive duplication)
	MaxOverlap int

	// PreserveWords ensures character overlap doesn't break words
	PreserveWords bool

	// IncludeHeadingContext includes section heading in overlap for context
	IncludeHeadingContext bool
}

OverlapConfig holds configuration for chunk overlap

func DefaultOverlapConfig ¶

func DefaultOverlapConfig() OverlapConfig

DefaultOverlapConfig returns sensible defaults for overlap

type OverlapGenerator ¶

type OverlapGenerator struct {
	// contains filtered or unexported fields
}

OverlapGenerator generates overlap content between chunks

func NewOverlapGenerator ¶

func NewOverlapGenerator() *OverlapGenerator

NewOverlapGenerator creates a new overlap generator with default configuration

func NewOverlapGeneratorWithConfig ¶

func NewOverlapGeneratorWithConfig(config OverlapConfig) *OverlapGenerator

NewOverlapGeneratorWithConfig creates an overlap generator with custom configuration

func (*OverlapGenerator) GenerateOverlap ¶

func (og *OverlapGenerator) GenerateOverlap(chunkText string) *OverlapResult

GenerateOverlap extracts overlap content from the end of a chunk

type OverlapResult ¶

type OverlapResult struct {
	// Text is the overlap content to prepend to the next chunk
	Text string

	// CharCount is the number of characters in the overlap
	CharCount int

	// SentenceCount is the number of complete sentences in the overlap
	SentenceCount int

	// Strategy is the strategy that was used
	Strategy OverlapStrategy
}

OverlapResult contains the computed overlap text and metadata

type OverlapStats ¶

type OverlapStats struct {
	// TotalOverlapChars is the total characters in overlap regions
	TotalOverlapChars int

	// AvgOverlapChars is the average overlap size
	AvgOverlapChars int

	// ChunksWithOverlap is the number of chunks that have overlap
	ChunksWithOverlap int

	// OverlapStrategy is the strategy used
	OverlapStrategy OverlapStrategy
}

OverlapStats contains statistics about overlap in chunks

type OverlapStrategy ¶

type OverlapStrategy int

OverlapStrategy defines how overlap between chunks is computed

const (
	// OverlapNone disables overlap between chunks
	OverlapNone OverlapStrategy = iota
	// OverlapCharacter uses character-based overlap (simple but can break words/sentences)
	OverlapCharacter
	// OverlapSentence uses sentence-based overlap (preserves complete sentences)
	OverlapSentence
	// OverlapParagraph uses paragraph-based overlap (preserves complete paragraphs)
	OverlapParagraph
)

func (OverlapStrategy) String ¶

func (os OverlapStrategy) String() string

String returns a human-readable representation of the overlap strategy

type Section ¶

type Section struct {
	// Heading is the section heading (nil for content before first heading)
	Heading *model.HeadingInfo

	// HeadingLevel is the heading level (0 if no heading)
	HeadingLevel int

	// Title is the section title
	Title string

	// Path is the hierarchical path of parent section titles
	Path []string

	// Content is the text content of this section
	Content []ContentElement

	// PageStart is the starting page (1-indexed)
	PageStart int

	// PageEnd is the ending page (1-indexed)
	PageEnd int

	// Children are nested subsections
	Children []*Section

	// Parent is the parent section (nil for top-level)
	Parent *Section
}

Section represents a document section defined by a heading

type SizeAction ¶

type SizeAction int

SizeAction suggests what action to take for size issues

const (
	// SizeActionNone - no action needed
	SizeActionNone SizeAction = iota
	// SizeActionSplit - chunk should be split
	SizeActionSplit
	// SizeActionMerge - chunk should be merged with neighbor
	SizeActionMerge
	// SizeActionTruncate - chunk must be truncated (hard limit exceeded)
	SizeActionTruncate
)

func (SizeAction) String ¶

func (sa SizeAction) String() string

String returns a human-readable representation of the size action

type SizeCalculator ¶

type SizeCalculator struct {
	// contains filtered or unexported fields
}

SizeCalculator calculates various size metrics for text

func NewSizeCalculator ¶

func NewSizeCalculator() *SizeCalculator

NewSizeCalculator creates a new size calculator with default config

func NewSizeCalculatorWithConfig ¶

func NewSizeCalculatorWithConfig(config SizeConfig) *SizeCalculator

NewSizeCalculatorWithConfig creates a size calculator with custom config

func (*SizeCalculator) Calculate ¶

func (sc *SizeCalculator) Calculate(text string) SizeMetrics

Calculate computes all size metrics for the given text

func (*SizeCalculator) Check ¶

func (sc *SizeCalculator) Check(text string) SizeCheckResult

Check performs a comprehensive size check on the text

func (*SizeCalculator) EstimateTokens ¶

func (sc *SizeCalculator) EstimateTokens(text string) int

EstimateTokens estimates token count using the configured ratio

func (*SizeCalculator) ExceedsLimit ¶

func (sc *SizeCalculator) ExceedsLimit(text string, limit SizeLimit) bool

ExceedsLimit checks if text exceeds a specific limit

func (*SizeCalculator) FindSplitPoint ¶

func (sc *SizeCalculator) FindSplitPoint(text string, boundaries []Boundary) int

FindSplitPoint finds the best position to split text to meet size constraints

func (*SizeCalculator) FindSplitPointAt ¶

func (sc *SizeCalculator) FindSplitPointAt(text string, boundaries []Boundary, targetSize int, targetUnit SizeUnit) int

FindSplitPointAt finds the best position to split text at a specific size limit

func (*SizeCalculator) GetSize ¶

func (sc *SizeCalculator) GetSize(text string, unit SizeUnit) int

GetSize returns the size in the specified unit

func (*SizeCalculator) IsAboveMax ¶

func (sc *SizeCalculator) IsAboveMax(text string) bool

IsAboveMax checks if text exceeds maximum size

func (*SizeCalculator) IsBelowMin ¶

func (sc *SizeCalculator) IsBelowMin(text string) bool

IsBelowMin checks if text is below minimum size

func (*SizeCalculator) IsWithinTarget ¶

func (sc *SizeCalculator) IsWithinTarget(text string) bool

IsWithinTarget checks if size is within target range

func (*SizeCalculator) SplitToSize ¶

func (sc *SizeCalculator) SplitToSize(text string, boundaries []Boundary) []string

SplitToSize splits text into chunks that meet size constraints

type SizeCheckResult ¶

type SizeCheckResult struct {
	// Metrics are the calculated size metrics
	Metrics SizeMetrics

	// IsValid indicates if the size is acceptable
	IsValid bool

	// Reason explains why the size is not valid (if applicable)
	Reason string

	// SuggestedAction suggests what to do if size is not valid
	SuggestedAction SizeAction

	// TargetDiff is the difference from target size
	TargetDiff int
}

SizeCheckResult contains the result of a size check

type SizeConfig ¶

type SizeConfig struct {
	// Target is the ideal chunk size to aim for
	Target SizeLimit

	// Min is the minimum chunk size
	Min SizeLimit

	// Max is the maximum chunk size
	Max SizeLimit

	// TokensPerChar is the ratio of tokens to characters (default: 0.25)
	// Used for token estimation
	TokensPerChar float64

	// AllowExceedForAtomicContent allows exceeding max for tables/lists
	AllowExceedForAtomicContent bool

	// MergeSmallChunks merges chunks below min with neighbors
	MergeSmallChunks bool

	// SplitAtSemanticBoundaries prefers semantic boundaries over exact sizes
	SplitAtSemanticBoundaries bool
}

SizeConfig holds comprehensive size configuration for chunking

func ClaudeContextConfig ¶

func ClaudeContextConfig() SizeConfig

ClaudeContextConfig returns config for Claude's context window

func CohereEmbeddingConfig ¶

func CohereEmbeddingConfig() SizeConfig

CohereEmbeddingConfig returns config optimized for Cohere embeddings

func DefaultSizeConfig ¶

func DefaultSizeConfig() SizeConfig

DefaultSizeConfig returns sensible defaults for size configuration

func LargeChunkConfig ¶

func LargeChunkConfig() SizeConfig

LargeChunkConfig returns config for large chunks (good for context)

func MediumChunkConfig ¶

func MediumChunkConfig() SizeConfig

MediumChunkConfig returns config for medium chunks (balanced)

func OpenAIEmbeddingConfig ¶

func OpenAIEmbeddingConfig() SizeConfig

OpenAIEmbeddingConfig returns config optimized for OpenAI embeddings (8191 tokens max)

func SemanticSizeConfig ¶

func SemanticSizeConfig(targetParagraphs, maxParagraphs int) SizeConfig

SemanticSizeConfig returns configuration for semantic unit-based chunking

func SmallChunkConfig ¶

func SmallChunkConfig() SizeConfig

SmallChunkConfig returns config for small chunks (good for precise retrieval)

func TokenBasedSizeConfig ¶

func TokenBasedSizeConfig(targetTokens, maxTokens int) SizeConfig

TokenBasedSizeConfig returns configuration optimized for token-based chunking

type SizeLimit ¶

type SizeLimit struct {
	// Value is the limit value
	Value int

	// Unit is the unit of measurement
	Unit SizeUnit

	// Type determines if this is a soft or hard limit
	Type LimitType
}

SizeLimit represents a size limit with its type and value

func (SizeLimit) String ¶

func (sl SizeLimit) String() string

String returns a human-readable representation of the size limit

type SizeMetrics ¶

type SizeMetrics struct {
	Characters int
	Tokens     int
	Words      int
	Sentences  int
	Paragraphs int
}

SizeMetrics holds all size measurements for a piece of text

func (SizeMetrics) GetByUnit ¶

func (m SizeMetrics) GetByUnit(unit SizeUnit) int

GetByUnit returns the metric value for the specified unit

type SizeUnit ¶

type SizeUnit int

SizeUnit defines the unit of measurement for chunk sizes

const (
	// SizeUnitCharacters measures size in characters
	SizeUnitCharacters SizeUnit = iota
	// SizeUnitTokens measures size in estimated tokens (chars/4)
	SizeUnitTokens
	// SizeUnitWords measures size in words
	SizeUnitWords
	// SizeUnitSentences measures size in sentences
	SizeUnitSentences
	// SizeUnitParagraphs measures size in paragraphs
	SizeUnitParagraphs
)

func (SizeUnit) String ¶

func (su SizeUnit) String() string

String returns a human-readable representation of the size unit

type StreamExporter ¶

type StreamExporter struct {
	// contains filtered or unexported fields
}

StreamExporter handles streaming export for very large collections

func NewStreamExporter ¶

func NewStreamExporter(w io.Writer) *StreamExporter

NewStreamExporter creates a new stream exporter

func NewStreamExporterWithConfig ¶

func NewStreamExporterWithConfig(w io.Writer, config ExportConfig) *StreamExporter

NewStreamExporterWithConfig creates a stream exporter with custom config

func (*StreamExporter) Close ¶

func (se *StreamExporter) Close() error

Close finalizes the stream export

func (*StreamExporter) WriteChunk ¶

func (se *StreamExporter) WriteChunk(chunk *Chunk, index int) error

WriteChunk writes a single chunk to the stream

type TableChunk ¶

type TableChunk struct {
	// Table is the source table
	Table *model.Table

	// Caption is the associated caption text
	Caption string

	// HasCaption indicates if a caption was found
	HasCaption bool

	// FormattedText is the table rendered as text
	FormattedText string

	// Summary is a brief description of the table
	Summary string

	// RowCount is the number of rows
	RowCount int

	// ColCount is the number of columns
	ColCount int

	// Headers are the column headers (if detected)
	Headers []string

	// IsSplit indicates if this is part of a split table
	IsSplit bool

	// SplitIndex is the index of this part (0-based)
	SplitIndex int

	// TotalSplits is the total number of parts
	TotalSplits int

	// PageNumber is the source page
	PageNumber int
}

TableChunk represents a table as a chunk

func (*TableChunk) ToChunk ¶

func (tc *TableChunk) ToChunk(chunkIndex int) *Chunk

ToChunk converts a TableChunk to a generic Chunk

type TableFigureConfig ¶

type TableFigureConfig struct {
	// TableFormat determines how tables are rendered in chunks
	TableFormat TableFormat

	// MaxTableSize is the maximum characters for a table before considering split
	MaxTableSize int

	// MaxTableRows is the maximum rows before considering split
	MaxTableRows int

	// SplitLargeTables allows splitting tables that exceed limits
	SplitLargeTables bool

	// IncludeTableCaption includes detected captions with tables
	IncludeTableCaption bool

	// IncludeFigureCaption includes detected captions with figures
	IncludeFigureCaption bool

	// CaptionSearchDistance is max chars to search for caption
	CaptionSearchDistance int

	// IncludeTableSummary adds a brief summary of table dimensions
	IncludeTableSummary bool

	// IncludeFigureAltText includes alt text for figures
	IncludeFigureAltText bool

	// PreserveTableStructure keeps structural info for RAG
	PreserveTableStructure bool
}

TableFigureConfig holds configuration for table and figure chunking

func DefaultTableFigureConfig ¶

func DefaultTableFigureConfig() TableFigureConfig

DefaultTableFigureConfig returns sensible defaults

type TableFigureHandler ¶

type TableFigureHandler struct {
	// contains filtered or unexported fields
}

TableFigureHandler handles table and figure chunking

func NewTableFigureHandler ¶

func NewTableFigureHandler() *TableFigureHandler

NewTableFigureHandler creates a new handler with default config

func NewTableFigureHandlerWithConfig ¶

func NewTableFigureHandlerWithConfig(config TableFigureConfig) *TableFigureHandler

NewTableFigureHandlerWithConfig creates a handler with custom config

func (*TableFigureHandler) ProcessBlocks ¶

func (h *TableFigureHandler) ProcessBlocks(blocks []ContentBlock) *TableFigureResult

ProcessBlocks processes content blocks to extract tables and figures

func (*TableFigureHandler) ProcessFigure ¶

func (h *TableFigureHandler) ProcessFigure(image *model.Image, caption string, pageNumber int) *FigureChunk

ProcessFigure converts a figure/image to a chunk

func (*TableFigureHandler) ProcessTable ¶

func (h *TableFigureHandler) ProcessTable(table *model.Table, caption string, pageNumber int) []*TableChunk

ProcessTable converts a table to one or more chunks

type TableFigureResult ¶

type TableFigureResult struct {
	// TableChunks are the processed table chunks
	TableChunks []*TableChunk

	// FigureChunks are the processed figure chunks
	FigureChunks []*FigureChunk

	// Stats contains processing statistics
	Stats TableFigureStats
}

TableFigureResult holds the result of processing tables and figures

type TableFigureStats ¶

type TableFigureStats struct {
	TotalTables        int
	TotalFigures       int
	TablesWithCaption  int
	FiguresWithCaption int
	SplitTables        int
	TotalTableRows     int
	TotalTableCols     int
}

TableFigureStats contains statistics about table/figure processing

type TableFormat ¶

type TableFormat int

TableFormat defines how tables are formatted in chunks

const (
	// TableFormatPlainText formats table as tab-separated text
	TableFormatPlainText TableFormat = iota
	// TableFormatMarkdown formats table as markdown
	TableFormatMarkdown
	// TableFormatCSV formats table as CSV
	TableFormatCSV
	// TableFormatHTML formats table as HTML
	TableFormatHTML
)

func (TableFormat) String ¶

func (tf TableFormat) String() string

String returns a human-readable representation of the table format

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL