Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrContentProcessingFailed = errors.New("content processing failed")
ErrContentProcessingFailed indicates that content processing failed.
Functions ¶
func InitializeProcessors ¶
func InitializeProcessors(cfg ProcessorConfig) ([]Processor, []Options, error)
InitializeProcessors initializes a list of processors based on the provided names and configurations. It returns a slice of Processor instances and their corresponding options. The options at index i correspond to the processor at index i in the processors slice.
Types ¶
type Options ¶
type Options struct { // Minimum content length to consider valid (in characters) MinContentLength int // Whether to include images in the processed content IncludeImages bool // Whether to include tables in the processed content IncludeTables bool // Whether to include videos in the processed content IncludeVideos bool // Maximum length for article content (0 means no limit) MaxContentLength int // Additional processor-specific options AdditionalOptions map[string]any }
Options contains configuration for content processors.
func DefaultOptions ¶
func DefaultOptions() Options
DefaultOptions returns the default processor options.
type Processor ¶
type Processor interface { // Process processes the raw content and updates the article with processed content Process(article *models.Article, opts *Options) error // Name returns the name of this processor Name() string }
Processor defines the interface for content processors.
type ProcessorConfig ¶
type ProcessorConfig struct { // Processors is a list of processors to apply in the order defined Processors []string `toml:"processors"` // ProcessorConfigs contains optional configuration for each processor. If // a processor is not configured, it will use its default settings. ProcessorConfigs map[string]any `toml:"processor_configs"` }
type ReadabilityProcessor ¶
type ReadabilityProcessor struct{}
ReadabilityProcessor uses go-readability to extract the main content from HTML.
func NewReadabilityProcessor ¶
func NewReadabilityProcessor() *ReadabilityProcessor
NewReadabilityProcessor creates a new readability-based content processor.
func (*ReadabilityProcessor) Name ¶
func (p *ReadabilityProcessor) Name() string
Name returns the name of this processor.
type SanitizerProcessor ¶ added in v0.2.0
type SanitizerProcessor struct{}
SanitizerProcessor is a processor that sanitizes HTML content.
func NewSanitizerProcessor ¶ added in v0.2.0
func NewSanitizerProcessor() *SanitizerProcessor
NewSanitizerProcessor creates a new instance of SanitizerProcessor.
func (*SanitizerProcessor) Name ¶ added in v0.2.0
func (s *SanitizerProcessor) Name() string