parser

package

v0.3.0 Latest Latest Go to latest Published: Jun 29, 2025 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/aoshimash/urlmap

Links

Open Source Insights

Documentation ¶

Index ¶

type ExtractionStats
- func (s *ExtractionStats) String() string
type LinkExtractor
- func NewLinkExtractor(logger *slog.Logger) *LinkExtractor

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ExtractionStats ¶

type ExtractionStats struct {
	TotalFound          int // Total anchor tags with href found
	Valid               int // Valid links extracted
	EmptyHrefs          int // Empty href attributes
	FilteredOut         int // Links filtered out (javascript:, mailto:, etc.)
	RelativeURLs        int // Relative URLs that were resolved
	ResolutionErrors    int // Errors during relative URL resolution
	InvalidURLs         int // Invalid URLs after resolution
	NormalizationErrors int // Errors during URL normalization
}

ExtractionStats holds statistics about link extraction

func (*ExtractionStats) String ¶

func (s *ExtractionStats) String() string

String returns a human-readable representation of the stats

type LinkExtractor ¶

type LinkExtractor struct {
	// contains filtered or unexported fields
}

LinkExtractor provides functionality to extract and filter links from HTML content

func NewLinkExtractor ¶

func NewLinkExtractor(logger *slog.Logger) *LinkExtractor

NewLinkExtractor creates a new LinkExtractor instance

func (*LinkExtractor) ExtractLinks ¶

func (le *LinkExtractor) ExtractLinks(baseURL, htmlContent string) ([]string, error)

ExtractLinks extracts and filters links from HTML content baseURL is used to resolve relative URLs to absolute URLs htmlContent is the HTML content to parse Returns a slice of valid, filtered absolute URLs

func (*LinkExtractor) ExtractLinksWithStats ¶

func (le *LinkExtractor) ExtractLinksWithStats(baseURL, htmlContent string) ([]string, *ExtractionStats, error)

ExtractLinksWithStats extracts links and returns statistics

func (*LinkExtractor) ExtractSameDomainLinks ¶

func (le *LinkExtractor) ExtractSameDomainLinks(baseURL, htmlContent string) ([]string, error)

ExtractSameDomainLinks extracts links that belong to the same domain as the base URL

Source Files ¶

View all Source files

parser.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL