Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ExtractionStats ¶
type ExtractionStats struct {
TotalFound int // Total anchor tags with href found
Valid int // Valid links extracted
EmptyHrefs int // Empty href attributes
FilteredOut int // Links filtered out (javascript:, mailto:, etc.)
RelativeURLs int // Relative URLs that were resolved
ResolutionErrors int // Errors during relative URL resolution
InvalidURLs int // Invalid URLs after resolution
NormalizationErrors int // Errors during URL normalization
}
ExtractionStats holds statistics about link extraction
func (*ExtractionStats) String ¶
func (s *ExtractionStats) String() string
String returns a human-readable representation of the stats
type LinkExtractor ¶
type LinkExtractor struct {
// contains filtered or unexported fields
}
LinkExtractor provides functionality to extract and filter links from HTML content
func NewLinkExtractor ¶
func NewLinkExtractor(logger *slog.Logger) *LinkExtractor
NewLinkExtractor creates a new LinkExtractor instance
func (*LinkExtractor) ExtractLinks ¶
func (le *LinkExtractor) ExtractLinks(baseURL, htmlContent string) ([]string, error)
ExtractLinks extracts and filters links from HTML content baseURL is used to resolve relative URLs to absolute URLs htmlContent is the HTML content to parse Returns a slice of valid, filtered absolute URLs
func (*LinkExtractor) ExtractLinksWithStats ¶
func (le *LinkExtractor) ExtractLinksWithStats(baseURL, htmlContent string) ([]string, *ExtractionStats, error)
ExtractLinksWithStats extracts links and returns statistics
func (*LinkExtractor) ExtractSameDomainLinks ¶
func (le *LinkExtractor) ExtractSameDomainLinks(baseURL, htmlContent string) ([]string, error)
ExtractSameDomainLinks extracts links that belong to the same domain as the base URL
Click to show internal directories.
Click to hide internal directories.