profile

package
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 15, 2025 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package profile defines the common types for social media profile extraction.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrAuthRequired    = errors.New("authentication required")
	ErrNoCookies       = errors.New("no cookies available")
	ErrProfileNotFound = errors.New("profile not found")
	ErrRateLimited     = errors.New("rate limited")
)

Common errors returned by platform packages.

Functions

func Register added in v0.9.0

func Register(p Platform)

Register adds a platform to the global registry. This should be called from each platform package's init() function.

func RegisterWithFetcher added in v0.9.0

func RegisterWithFetcher(p Platform, fetch FetchFunc)

RegisterWithFetcher adds a platform with its fetch function to the global registry.

Types

type FetchFunc added in v0.9.0

type FetchFunc func(ctx context.Context, url string, cfg *FetcherConfig) (*Profile, error)

FetchFunc is a function that fetches a profile from a URL. This allows platforms to register their fetch logic without requiring a specific client type.

func LookupFetcher added in v0.9.0

func LookupFetcher(name string) FetchFunc

LookupFetcher returns the fetch function for the given platform name, or nil if not found.

type FetcherConfig added in v0.9.0

type FetcherConfig struct {
	Cache          any               // httpcache.Cacher - use any to avoid import cycles
	Cookies        map[string]string // Platform-specific cookies
	Logger         *slog.Logger
	GitHubToken    string // GitHub API token
	BrowserCookies bool   // Whether to read cookies from browser
}

FetcherConfig holds configuration for creating platform fetchers.

type Platform added in v0.9.0

type Platform interface {
	// Name returns the platform identifier (e.g., "github", "twitter").
	Name() string

	// Type returns the category of content this platform hosts.
	Type() PlatformType

	// Match returns true if the URL belongs to this platform.
	Match(url string) bool

	// AuthRequired returns true if authentication is needed to fetch profiles.
	AuthRequired() bool
}

Platform defines the interface that all platform implementations must satisfy. Each platform package registers itself via Register() in an init() function.

func LookupPlatform added in v0.9.0

func LookupPlatform(name string) Platform

LookupPlatform returns the platform with the given name, or nil if not found.

func MatchURL added in v0.9.0

func MatchURL(url string) Platform

MatchURL returns the first platform that matches the given URL, or nil if none match. Platforms are checked in registration order, so order matters for overlapping patterns.

func Platforms added in v0.9.0

func Platforms() []Platform

Platforms returns all registered platforms.

type PlatformType added in v0.9.0

type PlatformType string

PlatformType categorizes what kind of content a platform primarily hosts. This enables cross-platform matching bonuses (e.g., same username on GitHub and GitLab).

const (
	PlatformTypeCode      PlatformType = "code"      // Code hosting: GitHub, GitLab, Codeberg, etc.
	PlatformTypeBlog      PlatformType = "blog"      // Long-form writing: Medium, Substack, Dev.to, etc.
	PlatformTypeMicroblog PlatformType = "microblog" // Short posts: Twitter, Mastodon, Bluesky, etc.
	PlatformTypeVideo     PlatformType = "video"     // Video content: YouTube, TikTok, Twitch, etc.
	PlatformTypeForum     PlatformType = "forum"     // Discussion forums: Reddit, HN, Lobsters, etc.
	PlatformTypeGaming    PlatformType = "gaming"    // Gaming platforms: Steam, etc.
	PlatformTypeSocial    PlatformType = "social"    // General social: LinkedIn, Instagram, VK, etc.
	PlatformTypePackage   PlatformType = "package"   // Package registries: npm, PyPI, crates.io, etc.
	PlatformTypeSecurity  PlatformType = "security"  // Security platforms: HackerOne, Bugcrowd, etc.
	PlatformTypeOther     PlatformType = "other"     // Uncategorized platforms
)

Platform type constants for categorizing platforms by their primary content type.

func TypeOf added in v0.9.0

func TypeOf(name string) PlatformType

TypeOf returns the platform type for a given platform name. Returns PlatformTypeOther for unknown platforms.

type Post

type Post struct {
	Type     PostType `json:"type"`               // Type of content
	Title    string   `json:"title,omitempty"`    // Title (for videos, articles, posts)
	Content  string   `json:"content,omitempty"`  // Body text or description
	URL      string   `json:"url,omitempty"`      // Link to the original content
	Category string   `json:"category,omitempty"` // Category (subreddit, channel, topic, etc.)
}

Post represents a piece of user-generated content (post, comment, video, etc.).

type PostType

type PostType string

PostType indicates the type of user-generated content.

const (
	PostTypeComment    PostType = "comment"
	PostTypePost       PostType = "post"
	PostTypeVideo      PostType = "video"
	PostTypeArticle    PostType = "article"
	PostTypeQuestion   PostType = "question"
	PostTypeAnswer     PostType = "answer"
	PostTypeRepository PostType = "repository"
)

Post type constants for categorizing user-generated content.

type Profile

type Profile struct {
	// Metadata
	Platform      string `json:",omitempty"` // Platform name: "linkedin", "twitter", "mastodon", etc.
	URL           string `json:",omitempty"` // Original URL fetched
	Authenticated bool   `json:",omitempty"` // Whether login cookies were used
	Error         string `json:",omitempty"` // Error message if fetch failed (e.g., "login required")

	// Core profile data
	Username  string   `json:",omitempty"` // Handle/username (without @ prefix)
	Name      string   `json:",omitempty"` // Display name
	AvatarURL string   `json:",omitempty"` // Profile photo/avatar URL
	Bio       string   `json:",omitempty"` // Profile bio/description
	Location  string   `json:",omitempty"` // Geographic location
	Website   string   `json:",omitempty"` // Personal website URL
	CreatedAt string   `json:",omitempty"` // Account creation date (ISO timestamp)
	UpdatedAt string   `json:",omitempty"` // Most recent activity or profile update (ISO timestamp)
	UTCOffset *float64 `json:",omitempty"` // UTC offset in hours (e.g., -8 for PST, 5.5 for IST)

	// Platform-specific fields
	Fields map[string]string `json:",omitempty"` // Additional platform-specific data (headline, employer, etc.)

	// For further crawling
	SocialLinks []string `json:",omitempty"` // Other social media URLs detected on the profile

	// User-generated content (posts, comments, videos, etc.)
	Posts []Post `json:",omitempty"` // Structured content extracted from the profile

	// Fallback for unrecognized platforms
	Unstructured string `json:",omitempty"` // Raw markdown content (HTML->MD conversion)

	// Guess mode fields (omitted from JSON when empty)
	IsGuess    bool     `json:",omitempty"` // True if this profile was discovered via guessing
	Confidence float64  `json:",omitempty"` // Confidence score 0.0-1.0 for guessed profiles
	GuessMatch []string `json:",omitempty"` // Reasons for match (e.g., "username", "name", "location")
}

Profile represents extracted data from a social media profile.

func Fetch added in v0.9.0

func Fetch(ctx context.Context, url string, cfg *FetcherConfig) (*Profile, error)

Fetch finds the matching platform and fetches the profile. Returns ErrProfileNotFound if no platform matches or the platform has no fetcher.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL