Automatic proxy pool for web scraping. Crawls proxies from YAML-defined sources, validates them, and provides HTTP/HTTPS proxy servers with automatic rotation, rate limiting, and TLS fingerprint spoofing.
- YAML-driven sources — All proxy sources defined as YAML configs with Lua scripting for complex logic
- Headless browser scraping — Integrated Lightpanda for JS-rendered pages (e.g. IPRoyal)
- Auto-validation — Concurrent proxy verification with configurable worker pool
- Smart rotation — Automatic proxy assignment based on domain and IP limits
- Rate limiting — Configurable requests per IP and domains per IP
- TLS fingerprint spoofing — Uses uTLS to mimic Chrome browser fingerprints
- MITM support — Built-in man-in-the-middle proxy for HTTPS traffic
- SQLite storage — Lightweight embedded database, no external dependencies
# docker-compose.yml
services:
proxy-in-a-box:
image: ghcr.io/naiba/proxy-in-a-box
restart: unless-stopped
volumes:
- ./data:/app/data
ports:
- "8080:8080" # HTTP proxy
- "8081:8081" # HTTPS proxy
- "8083:8083" # Dashboard + APIgo install github.com/naiba/proxyinabox/cmd/proxy-in-a-box@latest
mkdir -p data/sources
# Create data/pb.yaml and data/sources/*.yaml (see below)
proxy-in-a-boxUsage:
proxy-in-a-box [flags]
proxy-in-a-box [command]
Available Commands:
test-source Test a single proxy source YAML file (fetch + verify availability)
Flags:
-c, --conf string config file (default "./data/pb.yaml")
-p, --ha string http proxy server addr (default "0.0.0.0:8080")
-s, --sa string https proxy server addr (default "0.0.0.0:8081")
-m, --ma string management/dashboard addr (default "0.0.0.0:8083")
-h, --help help for proxy-in-a-box
proxy-in-a-box test-source data/sources/my-source.yaml [-w 20]Fetches proxies from the specified source YAML file and verifies their availability. Use -w to set concurrent verification workers (default: 20).
Configure your application to use the proxy:
HTTP Proxy: http://127.0.0.1:8080
HTTPS Proxy: https://127.0.0.1:8081
Management Dashboard & API:
GET / — Web dashboard (pool overview, proxy list, source status)
GET /stat — Pool statistics (plain text)
GET /get — Get one available proxy
GET /api/stats — Pool statistics (JSON: totals, by protocol/source, blocked IPs, request stats)
GET /api/proxies — Full proxy list (JSON)
GET /api/sources — Source fetch statuses (JSON)
data/pb.yaml:
debug: true
sys:
name: MyProxy
proxy_verify_worker: 20 # concurrent verification workers
domains_per_ip: 30 # max domains per IP in 30 minutes
request_limit_per_ip: 10 # max requests per IP per second
# HTTPS MITM decryption (default: false)
# When enabled, the proxy decrypts HTTPS traffic using a self-signed CA — clients must disable TLS verification or trust the CA.
# When disabled (default), HTTPS CONNECT requests are tunneled as-is — clients use standard TLS verification.
enable_mitm: false
# Headless browser for JS-rendered pages (optional)
# Requires lightpanda binary — included in Docker image
lightpanda:
bin: lightpanda # binary path (leave empty to disable)Sources are YAML files in data/sources/. Three types supported:
name: thespeedx-http
type: text
url: "https://raw.githubusercontent.com/TheSpeedX/PROXY-List/master/http.txt"
protocol: http
interval: 5mname: proxyscrape
type: json
url: "https://api.proxyscrape.com/v3/free-proxy-list/get?request=displayproxies&format=json"
ip_field: "proxies.*.ip"
port_field: "proxies.*.port"
protocol_field: "proxies.*.protocol"
interval: 5mLua globals: fetch(url, headers?), sleep(ms), json_decode(str), json_encode(table), browser_fetch(url), browser_eval(expression)
name: kuaidaili
type: script
interval: 10m
script: |
local proxies = {}
for page = 1, 5 do
sleep(3000)
local body = fetch("https://www.kuaidaili.com/free/inha/" .. page)
if body then
local match = string.match(body, "fpsList = (.-);%s*\n")
if match then
local list = json_decode(match)
if list then
for _, item in ipairs(list) do
proxies[#proxies+1] = {ip = item.ip, port = item.port, protocol = "http"}
end
end
end
end
end
return proxiesRequires lightpanda config. browser_fetch(url) navigates the headless browser and returns rendered HTML. browser_eval(expression) executes JavaScript on the loaded page.
name: iproyal
type: script
interval: 30m
script: |
local proxies = {}
local html = browser_fetch("https://iproyal.com/free-proxy-list/")
if not html then return proxies end
local raw = browser_eval([[(function(){
var rows = document.querySelectorAll('div.grid.min-w-\\[600px\\]');
var r = [];
for (var i = 0; i < rows.length; i++) {
var ch = rows[i].children;
if (ch.length >= 3) {
var ip = ch[0].textContent.trim();
if (/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/.test(ip))
r.push({ip: ip, port: ch[1].textContent.trim(), protocol: ch[2].textContent.trim().toLowerCase()});
}
}
return JSON.stringify(r);
})()]])
if raw then
local data = json_decode(raw)
if data then
for _, item in ipairs(data) do
proxies[#proxies+1] = {ip = item.ip, port = item.port, protocol = item.protocol}
end
end
end
return proxies ┌─────────────────────────────────────────┐
│ Proxy-in-a-Box │
├─────────────────────────────────────────┤
Your App ────────► │ HTTP Proxy :8080 / HTTPS Proxy :8081 │
├─────────────────────────────────────────┤
│ Proxy Pool │
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ IP1 │ │ IP2 │ │ IP3 │ │ ... │ │
│ └─────┘ └─────┘ └─────┘ └─────┘ │
├─────────────────────────────────────────┤
│ YAML Sources │ Validators │
│ text/json/lua │ (concurrent workers) │
├─────────────────────────────────────────┤
│ Lightpanda (headless browser) │
└─────────────────────────────────────────┘
│
▼
┌─────────────┐
│ SQLite │
└─────────────┘
ab -v4 -n100 -c10 -X 127.0.0.1:8080 http://api.ip.la/cn- Language: Go 1.25
- Database: SQLite (via
glebarez/sqlite+ GORM) - Scripting: gopher-lua (Lua 5.1 VM)
- Browser: Lightpanda
- TLS: uTLS for fingerprint spoofing
- HTTP: Standard library + custom MITM proxy
MIT