profile

package
v0.9.21 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2025 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package profile defines the common types for social media profile extraction.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrAuthRequired    = errors.New("authentication required")
	ErrNoCookies       = errors.New("no cookies available")
	ErrProfileNotFound = errors.New("profile not found")
	ErrRateLimited     = errors.New("rate limited")
)

Common errors returned by platform packages.

Functions

func Register added in v0.9.0

func Register(p Platform)

Register adds a platform to the global registry. This should be called from each platform package's init() function.

func RegisterWithFetcher added in v0.9.0

func RegisterWithFetcher(p Platform, fetch FetchFunc)

RegisterWithFetcher adds a platform with its fetch function to the global registry.

Types

type AccountState added in v0.9.4

type AccountState string

AccountState indicates the current state of a user account.

const (
	AccountStateActive     AccountState = ""           // Account is active (default, omitted from JSON)
	AccountStateRenamed    AccountState = "renamed"    // Account was renamed to a new username
	AccountStateDeleted    AccountState = "deleted"    // Account was deleted but historical data recovered
	AccountStateUnverified AccountState = "unverified" // Profile exists but ownership could not be verified
)

Account state constants.

type FetchFunc added in v0.9.0

type FetchFunc func(ctx context.Context, url string, cfg *FetcherConfig) (*Profile, error)

FetchFunc is a function that fetches a profile from a URL. This allows platforms to register their fetch logic without requiring a specific client type.

func LookupFetcher added in v0.9.0

func LookupFetcher(name string) FetchFunc

LookupFetcher returns the fetch function for the given platform name, or nil if not found.

type FetcherConfig added in v0.9.0

type FetcherConfig struct {
	Cache          any               // httpcache.Cacher - use any to avoid import cycles
	Cookies        map[string]string // Platform-specific cookies
	Logger         *slog.Logger
	GitHubToken    string // GitHub API token
	BrowserCookies bool   // Whether to read cookies from browser
}

FetcherConfig holds configuration for creating platform fetchers.

type Platform added in v0.9.0

type Platform interface {
	// Name returns the platform identifier (e.g., "github", "twitter").
	Name() string

	// Type returns the category of content this platform hosts.
	Type() PlatformType

	// Match returns true if the URL belongs to this platform.
	Match(url string) bool

	// AuthRequired returns true if authentication is needed to fetch profiles.
	AuthRequired() bool
}

Platform defines the interface that all platform implementations must satisfy. Each platform package registers itself via Register() in an init() function.

func LookupPlatform added in v0.9.0

func LookupPlatform(name string) Platform

LookupPlatform returns the platform with the given name, or nil if not found.

func MatchURL added in v0.9.0

func MatchURL(url string) Platform

MatchURL returns the first platform that matches the given URL, or nil if none match. Platforms are checked in registration order, so order matters for overlapping patterns.

func Platforms added in v0.9.0

func Platforms() []Platform

Platforms returns all registered platforms.

type PlatformType added in v0.9.0

type PlatformType string

PlatformType categorizes what kind of content a platform primarily hosts. This enables cross-platform matching bonuses (e.g., same username on GitHub and GitLab).

const (
	PlatformTypeCode       PlatformType = "code"       // Code hosting: GitHub, GitLab, Codeberg, etc.
	PlatformTypeBlog       PlatformType = "blog"       // Long-form writing: Medium, Substack, Dev.to, etc.
	PlatformTypeMicroblog  PlatformType = "microblog"  // Short posts: Twitter, Mastodon, Bluesky, etc.
	PlatformTypeVideo      PlatformType = "video"      // Video content: YouTube, TikTok, Twitch, etc.
	PlatformTypeForum      PlatformType = "forum"      // Discussion forums: Reddit, HN, Lobsters, etc.
	PlatformTypeGaming     PlatformType = "gaming"     // Gaming platforms: Steam, etc.
	PlatformTypeSocial     PlatformType = "social"     // General social: LinkedIn, Instagram, VK, etc.
	PlatformTypePackage    PlatformType = "package"    // Package registries: npm, PyPI, crates.io, etc.
	PlatformTypeSecurity   PlatformType = "security"   // Security platforms: HackerOne, Bugcrowd, etc.
	PlatformTypeScheduling PlatformType = "scheduling" // Scheduling: Cal.com, Calendly, etc.
	PlatformTypeOther      PlatformType = "other"      // Uncategorized platforms
)

Platform type constants for categorizing platforms by their primary content type.

func TypeOf added in v0.9.0

func TypeOf(name string) PlatformType

TypeOf returns the platform type for a given platform name. Returns PlatformTypeOther for unknown platforms.

type Post

type Post struct {
	Type     PostType `json:"type"`               // Type of content
	Title    string   `json:"title,omitempty"`    // Title (for videos, articles, posts)
	Content  string   `json:"content,omitempty"`  // Body text or description
	URL      string   `json:"url,omitempty"`      // Link to the original content
	Category string   `json:"category,omitempty"` // Category (subreddit, channel, topic, etc.)
	Date     string   `json:"date,omitempty"`     // Date/timestamp of the post (ISO 8601 or human-readable)
}

Post represents a piece of user-generated content (post, comment, video, etc.).

type PostType

type PostType string

PostType indicates the type of user-generated content.

const (
	PostTypeComment    PostType = "comment"
	PostTypePost       PostType = "post"
	PostTypeVideo      PostType = "video"
	PostTypeArticle    PostType = "article"
	PostTypeQuestion   PostType = "question"
	PostTypeAnswer     PostType = "answer"
	PostTypeRepository PostType = "repository"
	PostTypeEvent      PostType = "event" // Calendar events, meetups, etc.
)

Post type constants for categorizing user-generated content.

type Profile

type Profile struct {
	// Metadata
	Platform      string `json:",omitempty"` // Platform name: "linkedin", "twitter", "mastodon", etc.
	URL           string `json:",omitempty"` // Original URL fetched
	Authenticated bool   `json:",omitempty"` // Whether login cookies were used
	Error         string `json:",omitempty"` // Error message if fetch failed (e.g., "login required")

	// Core profile data
	Username    string   `json:",omitempty"` // Handle/username (without @ prefix)
	DisplayName string   `json:",omitempty"` // Person's chosen display name on the platform (not page title or error messages)
	PageTitle   string   `json:",omitempty"` // HTML page title (may contain errors or site name)
	AvatarURL   string   `json:",omitempty"` // Profile photo/avatar URL
	AvatarHash  uint64   `json:",omitempty"` // Perceptual hash of avatar for cross-platform matching
	Bio         string   `json:",omitempty"` // Profile bio/description
	Location    string   `json:",omitempty"` // Geographic location
	Website     string   `json:",omitempty"` // Personal website URL
	CreatedAt   string   `json:",omitempty"` // Account creation date (ISO timestamp)
	UpdatedAt   string   `json:",omitempty"` // Most recent activity or profile update (ISO timestamp)
	UTCOffset   *float64 `json:",omitempty"` // UTC offset in hours (e.g., -8 for PST, 5.5 for IST)

	// Account state (for renamed/deleted accounts)
	AccountState AccountState `json:",omitempty"` // Current account state (renamed, deleted)
	Aliases      []string     `json:",omitempty"` // Alternative usernames (old names, aliases) for cross-platform matching
	DatabaseID   string       `json:",omitempty"` // Platform-specific unique ID (survives renames)
	ArchivedAt   string       `json:",omitempty"` // Timestamp of archived snapshot used (if deleted)

	// Platform-specific fields
	Fields map[string]string `json:",omitempty"` // Additional platform-specific data (headline, employer, etc.)
	Badges map[string]string `json:",omitempty"` // Achievements/badges with counts (e.g., "Pair Extraordinaire": "4")
	Groups []string          `json:",omitempty"` // Organizations, teams, or groups the user belongs to (sorted)

	// For further crawling
	SocialLinks []string `json:",omitempty"` // Other social media URLs detected on the profile

	// User-generated content (posts, comments, videos, etc.)
	Posts []Post `json:",omitempty"` // Structured content extracted from the profile

	// Code repositories (pinned/popular repos from GitHub, etc.)
	Repositories []Repository `json:",omitempty"`

	// Unstructured content (README, page content, etc.)
	Content string `json:",omitempty"` // Raw HTML content (README, page body)

	// Guess mode fields (omitted from JSON when empty)
	IsGuess    bool     `json:",omitempty"` // True if this profile was discovered via guessing
	Confidence float64  `json:",omitempty"` // Confidence score 0.0-1.0 for guessed profiles
	GuessMatch []string `json:",omitempty"` // Reasons for match (e.g., "username", "name", "location")
}

Profile represents extracted data from a social media profile.

func Fetch added in v0.9.0

func Fetch(ctx context.Context, url string, cfg *FetcherConfig) (*Profile, error)

Fetch finds the matching platform and fetches the profile. Returns ErrProfileNotFound if no platform matches or the platform has no fetcher.

type Repository added in v0.9.2

type Repository struct {
	Name        string `json:"name"`                  // Repository name
	Description string `json:"description,omitempty"` // Repository description
	URL         string `json:"url,omitempty"`         // Repository URL
	Language    string `json:"language,omitempty"`    // Primary programming language
	Stars       string `json:"stars,omitempty"`       // Star count (as string, e.g. "1.2k")
	Forks       string `json:"forks,omitempty"`       // Fork count
}

Repository represents a code repository (pinned/popular on GitHub, etc.).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL