Documentation
¶
Overview ¶
Package htmlutil provides HTML processing utilities for social media scraping.
Index ¶
- func ContactLinks(htmlContent, baseURL string) []string
- func Description(htmlContent string) string
- func EmailAddresses(htmlContent string) []string
- func ExtractEmailFromURL(urlStr string) (string, bool)
- func ExtractRedirectURL(htmlContent string) string
- func IsEmailURL(urlStr string) bool
- func SocialLinks(htmlContent string) []string
- func Title(htmlContent string) string
- func ToMarkdown(htmlContent string) string
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ContactLinks ¶
ContactLinks extracts contact/about page URLs from HTML content. These pages often contain additional social media links.
func Description ¶
Description extracts the meta description from HTML content.
func EmailAddresses ¶
EmailAddresses extracts email addresses from HTML content. Filters out common false positives like noreply@, example@, etc.
func ExtractEmailFromURL ¶
ExtractEmailFromURL extracts an email address from URLs like "https://user@domain.com" or "http://email@example.com". Returns the email address and true if found, empty string and false otherwise.
func ExtractRedirectURL ¶ added in v0.7.9
ExtractRedirectURL checks HTML content for meta refresh or JavaScript redirects. Returns the redirect URL if found, empty string otherwise.
func IsEmailURL ¶
IsEmailURL returns true if the URL is a mailto: link or an email address with http(s):// prefix.
func SocialLinks ¶
SocialLinks extracts social media URLs from HTML content.
func ToMarkdown ¶
ToMarkdown converts HTML content to markdown format.
Types ¶
This section is empty.