processor

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2025 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrContentProcessingFailed = errors.New("content processing failed")

ErrContentProcessingFailed indicates that content processing failed.

Functions

func InitializeProcessors

func InitializeProcessors(cfg ProcessorConfig) ([]Processor, []Options, error)

InitializeProcessors initializes a list of processors based on the provided names and configurations. It returns a slice of Processor instances and their corresponding options. The options at index i correspond to the processor at index i in the processors slice.

func List

func List() []string

List returns a list of available processor names.

Types

type Options

type Options struct {
	// Minimum content length to consider valid (in characters)
	MinContentLength int

	// Whether to include images in the processed content
	IncludeImages bool

	// Whether to include tables in the processed content
	IncludeTables bool

	// Whether to include videos in the processed content
	IncludeVideos bool

	// Maximum length for article content (0 means no limit)
	MaxContentLength int

	// Additional processor-specific options
	AdditionalOptions map[string]any
}

Options contains configuration for content processors.

func DefaultOptions

func DefaultOptions() Options

DefaultOptions returns the default processor options.

func OptionsFromConfig

func OptionsFromConfig(config map[string]any) (Options, error)

type Processor

type Processor interface {
	// Process processes the raw content and updates the article with processed content
	Process(article *models.Article, opts *Options) error

	// Name returns the name of this processor
	Name() string
}

Processor defines the interface for content processors.

func New

func New(name string) (Processor, error)

New returns a new instance of the specified processor.

type ProcessorConfig

type ProcessorConfig struct {
	// Processors is a list of processors to apply in the order defined
	Processors []string `toml:"processors"`

	// ProcessorConfigs contains optional configuration for each processor. If
	// a processor is not configured, it will use its default settings.
	ProcessorConfigs map[string]any `toml:"processor_configs"`
}

type ReadabilityProcessor

type ReadabilityProcessor struct{}

ReadabilityProcessor uses go-readability to extract the main content from HTML.

func NewReadabilityProcessor

func NewReadabilityProcessor() *ReadabilityProcessor

NewReadabilityProcessor creates a new readability-based content processor.

func (*ReadabilityProcessor) Name

func (p *ReadabilityProcessor) Name() string

Name returns the name of this processor.

func (*ReadabilityProcessor) Process

func (p *ReadabilityProcessor) Process(article *models.Article, opts *Options) error

Process extracts the main content from an article's HTML content.

type SanitizerProcessor added in v0.2.0

type SanitizerProcessor struct{}

SanitizerProcessor is a processor that sanitizes HTML content.

func NewSanitizerProcessor added in v0.2.0

func NewSanitizerProcessor() *SanitizerProcessor

NewSanitizerProcessor creates a new instance of SanitizerProcessor.

func (*SanitizerProcessor) Name added in v0.2.0

func (s *SanitizerProcessor) Name() string

func (*SanitizerProcessor) Process added in v0.2.0

func (p *SanitizerProcessor) Process(article *models.Article, opts *Options) error

Process sanitizes the HTML content of the article.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL