proxyinabox

package module
v0.9.20 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2026 License: MIT Imports: 6 Imported by: 3

README

Proxy-in-a-Box

Go Go Report Card

Automatic proxy pool for web scraping. Crawls proxies from YAML-defined sources, validates them, and provides HTTP/HTTPS proxy servers with automatic rotation and TLS fingerprint spoofing.

中文说明

Features

  • YAML-driven sources — All proxy sources defined as YAML configs with Lua scripting for complex logic
  • Headless browser scraping — Integrated Lightpanda for JS-rendered pages (e.g. IPRoyal)
  • Auto-validation — Concurrent proxy verification with configurable worker pool
  • Smart rotation — Automatic proxy assignment based on domain and IP limits
  • TLS fingerprint spoofing — Uses uTLS to mimic Chrome browser fingerprints
  • MITM support — Built-in man-in-the-middle proxy for HTTPS traffic
  • SQLite storage — Lightweight embedded database, no external dependencies

Quick Start

# docker-compose.yml
services:
  proxy-in-a-box:
    image: ghcr.io/naiba/proxy-in-a-box
    restart: unless-stopped
    volumes:
      - ./data:/app/data
    ports:
      - "8080:8080"   # HTTP proxy
      - "8081:8081"   # HTTPS proxy
      - "8083:8083"   # Dashboard + API
From Source
go install github.com/naiba/proxyinabox/cmd/proxy-in-a-box@latest
mkdir -p data/sources
# Create data/pb.yaml and data/sources/*.yaml (see below)
proxy-in-a-box

Usage

Usage:
  proxy-in-a-box [flags]
  proxy-in-a-box [command]

Available Commands:
  test-source    Test a single proxy source YAML file (fetch + verify availability)

Flags:
  -c, --conf string   config file (default "./data/pb.yaml")
  -p, --ha string     http proxy server addr (default "0.0.0.0:8080")
  -s, --sa string     https proxy server addr (default "0.0.0.0:8081")
  -m, --ma string     management/dashboard addr (default "0.0.0.0:8083")
  -h, --help          help for proxy-in-a-box
Test a Source
proxy-in-a-box test-source data/sources/my-source.yaml [-w 20]

Fetches proxies from the specified source YAML file and verifies their availability. Use -w to set concurrent verification workers (default: 20).

Configure your application to use the proxy:

HTTP Proxy:  http://127.0.0.1:8080
HTTPS Proxy: https://127.0.0.1:8081

Management Dashboard & API:

GET /             — Web dashboard (pool overview, proxy list, source status)
GET /stat         — Pool statistics (plain text)
GET /get          — Get one available proxy
GET /api/stats    — Pool statistics (JSON: totals, by protocol/source, blocked IPs, request stats)
GET /api/proxies  — Full proxy list (JSON)
GET /api/sources  — Source fetch statuses (JSON)

Configuration

data/pb.yaml:

debug: true

sys:
  name: MyProxy
  proxy_verify_worker: 20    # concurrent verification workers

# HTTPS MITM decryption (default: false)
# When enabled, the proxy decrypts HTTPS traffic using a self-signed CA — clients must disable TLS verification or trust the CA.
# When disabled (default), HTTPS CONNECT requests are tunneled as-is — clients use standard TLS verification.
enable_mitm: false

# Headless browser for JS-rendered pages (optional)
# Requires lightpanda binary — included in Docker image
lightpanda:
  bin: lightpanda             # binary path (leave empty to disable)

Proxy Sources

Sources are YAML files in data/sources/. Three types supported:

text — Plain text IP:Port lists
name: thespeedx-http
type: text
url: "https://raw.githubusercontent.com/TheSpeedX/PROXY-List/master/http.txt"
protocol: http
interval: 5m
json — JSON API with field paths
name: proxyscrape
type: json
url: "https://api.proxyscrape.com/v3/free-proxy-list/get?request=displayproxies&format=json"
ip_field: "proxies.*.ip"
port_field: "proxies.*.port"
protocol_field: "proxies.*.protocol"
interval: 5m
script — Lua scripts for complex logic

Lua globals: fetch(url, headers?), sleep(ms), json_decode(str), json_encode(table), browser_fetch(url), browser_eval(expression)

name: kuaidaili
type: script
interval: 10m
script: |
  local proxies = {}
  for page = 1, 5 do
    sleep(3000)
    local body = fetch("https://www.kuaidaili.com/free/inha/" .. page)
    if body then
      local match = string.match(body, "fpsList = (.-);%s*\n")
      if match then
        local list = json_decode(match)
        if list then
          for _, item in ipairs(list) do
            proxies[#proxies+1] = {ip = item.ip, port = item.port, protocol = "http"}
          end
        end
      end
    end
  end
  return proxies
Browser-powered scraping (for JS-rendered pages)

Requires lightpanda config. browser_fetch(url) navigates the headless browser and returns rendered HTML. browser_eval(expression) executes JavaScript on the loaded page.

name: iproyal
type: script
interval: 30m
script: |
  local proxies = {}
  local html = browser_fetch("https://iproyal.com/free-proxy-list/")
  if not html then return proxies end
  local raw = browser_eval([[(function(){
    var rows = document.querySelectorAll('div.grid.min-w-\\[600px\\]');
    var r = [];
    for (var i = 0; i < rows.length; i++) {
      var ch = rows[i].children;
      if (ch.length >= 3) {
        var ip = ch[0].textContent.trim();
        if (/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/.test(ip))
          r.push({ip: ip, port: ch[1].textContent.trim(), protocol: ch[2].textContent.trim().toLowerCase()});
      }
    }
    return JSON.stringify(r);
  })()]])
  if raw then
    local data = json_decode(raw)
    if data then
      for _, item in ipairs(data) do
        proxies[#proxies+1] = {ip = item.ip, port = item.port, protocol = item.protocol}
      end
    end
  end
  return proxies

Architecture

                    ┌─────────────────────────────────────────┐
                    │           Proxy-in-a-Box                │
                    ├─────────────────────────────────────────┤
 Your App ────────► │  HTTP Proxy :8080 / HTTPS Proxy :8081  │
                    ├─────────────────────────────────────────┤
                    │              Proxy Pool                 │
                    │   ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐      │
                    │   │ IP1 │ │ IP2 │ │ IP3 │ │ ... │      │
                    │   └─────┘ └─────┘ └─────┘ └─────┘      │
                    ├─────────────────────────────────────────┤
                    │  YAML Sources   │ Validators            │
                    │  text/json/lua  │ (concurrent workers)  │
                    ├─────────────────────────────────────────┤
                    │       Lightpanda (headless browser)      │
                    └─────────────────────────────────────────┘
                                     │
                                     ▼
                              ┌─────────────┐
                              │   SQLite    │
                              └─────────────┘

Benchmark

ab -v4 -n100 -c10 -X 127.0.0.1:8080 http://api.ip.la/cn

Tech Stack

  • Language: Go 1.25
  • Database: SQLite (via glebarez/sqlite + GORM)
  • Scripting: gopher-lua (Lua 5.1 VM)
  • Browser: Lightpanda
  • TLS: uTLS for fingerprint spoofing
  • HTTP: Standard library + custom MITM proxy

License

MIT

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DB *gorm.DB

DB instance

View Source
var DataDir string

Functions

func Init

func Init(configFilePath string)

Init init system

Types

type BlockedIP added in v0.8.0

type BlockedIP struct {
	IP                  string    `gorm:"type:varchar(15);primaryKey"`
	ConsecutiveFailures int       `gorm:"default:0"`
	LockedUntil         time.Time `gorm:"index"`
}

type Cache

type Cache interface {
	// --- 代理池读取 ---
	RandomProxy() (string, bool)
	GetProxy() (string, bool)
	ProxyLength() int
	PickProxy(req *http.Request) (string, error)
	HasProxy(p string) bool
	GetAllProxies() []Proxy

	// UpsertProxy 新代理首次验证成功时调用。
	// 原子完成:清除 blocked_ips 旧记录 → 写入/更新 DB → 替换内存 entry。
	UpsertProxy(p Proxy) error

	// MarkVerifySuccess 已有代理定期验证成功时调用。
	// 原子完成:清除 blocked_ips → 更新 DB delay/last_verify → 同步内存。
	MarkVerifySuccess(p Proxy, delay int64, verifyTime time.Time)

	// MarkVerifyFailed 已有代理定期验证失败但未达锁定阈值时调用。
	// 原子完成:从内存移除 → 更新 DB last_verify 防止反复被选中。
	MarkVerifyFailed(p Proxy)

	// RecordFailure 代理请求或验证失败时调用,累计失败次数。
	// 达到阈值时锁定 IP 并从 DB + 内存中删除该 IP 的所有代理。
	// 返回 true 表示触发了锁定。
	RecordFailure(ip string) bool

	// IsIPLocked 检查 IP 是否在锁定期内。
	IsIPLocked(ip string) bool

	// LoadLockedIPs 启动时从 DB 加载锁定状态到内存。
	LoadLockedIPs()

	// CleanupStaleProxies 删除 last_verify 超过阈值的陈旧代理,同步清理内存。
	CleanupStaleProxies(threshold time.Duration)
}
var CI Cache

CI cache instance

type Conf

type Conf struct {
	Debug bool
	Redis struct {
		Host string
		Port string
		Pass string
		Db   int
	}
	Sys struct {
		Name              string
		ProxyVerifyWorker int `mapstructure:"proxy_verify_worker"`
	}
	Lightpanda struct {
		// lightpanda 二进制路径,留空则禁用浏览器抓取
		Bin string
	}
	// EnableMITM 是否启用 HTTPS 中间人解密,默认 false(关闭时走 TCP 隧道透传,客户端无需关闭 TLS 验证)
	EnableMITM bool `mapstructure:"enable_mitm"`
}

Conf config struct

var Config Conf

Config system config

type Proxy

type Proxy struct {
	gorm.Model
	// BUG-FIX: GORM v2 不识别 unique_index(v1 语法),导致 IP 唯一约束从未生效,
	// 同 IP 不同端口/协议的代理在 DB 中可以重复插入。改用 uniqueIndex 复合索引,
	// 以 (IP, Port, Protocol) 为唯一粒度,允许同 IP 不同端口合法共存。
	IP         string `gorm:"type:varchar(15);uniqueIndex:idx_proxy_endpoint"`
	Port       string `gorm:"type:varchar(5);uniqueIndex:idx_proxy_endpoint"`
	Country    string `gorm:"type:varchar(15)"`
	Provence   string `gorm:"type:varchar(15)"`
	Source     string
	Protocol   string `gorm:"uniqueIndex:idx_proxy_endpoint"`
	Delay      int64
	LastVerify time.Time
}

Proxy proxy model

func (*Proxy) String

func (p *Proxy) String() string

func (*Proxy) URI

func (p *Proxy) URI() string

type ProxyService

type ProxyService interface {
	GetUnVerified() ([]Proxy, error)
}

ProxyService proxy service

Directories

Path Synopsis
cmd
proxy-in-a-box command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL