Skip to content

feat(dns): DNS-over-HTTPS resolver for mobile networks#151

Open
samosvalishe wants to merge 8 commits into
cacggghp:mainfrom
samosvalishe:feat/doh
Open

feat(dns): DNS-over-HTTPS resolver for mobile networks#151
samosvalishe wants to merge 8 commits into
cacggghp:mainfrom
samosvalishe:feat/doh

Conversation

@samosvalishe
Copy link
Copy Markdown
Contributor

На мобильном трафике некоторых операторов клиент падает ещё до
стадии TURN-а - на этапе получения VK-credentials (#144):

  • Логин/аутентификация к login.vk.ru, api.vk.ru,
    calls.okcdn.ru не проходит.
  • В отдельных случаях подключение уходит в «не тот» IP и ловит RST
    на TLS-handshake.
  • На том же телефоне через Wi-Fi всё работает штатно.

Первопричина

До этого патча имена резолвились единственным способом - UDP/53 к
набору публичных резолверов (Яндекс/Google/Cloudflare), зашитых в
getCustomNetDialer():

dnsServers := []string{"77.88.8.8:53", "77.88.8.1:53",
    "8.8.8.8:53", "8.8.4.4:53", "1.1.1.1:53", "1.0.0.1:53"}

На мобильных сетях этот путь не работает по двум причинам:

  1. UDP/53 режется или не возвращает ответы. Connect к :53
    вроде проходит (UDP connectionless — SendTo не падает), а реальный
    ответ не приходит.
  2. DNS-spoofing оператором. Даже если пакет до резолвера дошёл,
    оператор прозрачно перехватывает UDP/53 и подставляет свой ответ.
    Смена DNS в настройках Android не спасает, так как перехват происходит на уровне сети.

TCP/53 блокируется ещё агрессивнее, так что как fallback не годится.


Решение: DNS-over-HTTPS (RFC 8484) с авто-переключением

Что сделано

  1. Новый модуль client/doh.go - DoH-резолвер:

    • POST application/dns-message на заранее выбранные endpoint'ы.
    • Bootstrap-IP для каждого endpoint, чтобы сам DoH-транспорт не
      зависел от системного DNS.
    • Параллельный A + AAAA, IPv4 сортируется первым (лучше для
      IPv4-only CGNAT).
    • TTL-кэш с clamp'ом [10s, 1h].
    • TLS 1.2+, embedded Mozilla CA roots
      (golang.org/x/crypto/x509roots/fallback) - нужно для
      Android-сборок с CGO_ENABLED=0.
  2. Локальный UDP/TCP-форвардер на 127.0.0.1 - Go-резолвер
    подключается к нему как к обычному DNS-серверу, он заворачивает
    приходящую wire-форму запроса в DoH и отдаёт ответ обратно.
    Все edge-кейсы Go-резолвера (RESINFO, EDNS, TCP length-prefix,
    повторы) обрабатываются штатно.

  3. Единая точка входа appDialer() в main.go - все сетевые
    клиенты проекта теперь резолвят одинаково:

    • tls-client для VK-auth;
    • http.Transport для Telemost-конференции;
    • websocket.Dialer для Telemost WSS;
    • http.Transport для прокси ручной капчи.
  4. Флаг -dns=udp|doh|auto (default auto).
    В auto-режиме при старте делается один реальный DNS round-trip по
    UDP/53 под дедлайном 1.5 с. Если ответ не пришёл - процесс на
    всё время жизни
    переключается на DoH.

  5. Список endpoints - сознательно в таком порядке:

    common.dot.dns.yandex.net  →  Yandex
    secure.dot.dns.yandex.net  →  Yandex
    family.dot.dns.yandex.net  →  Yandex
    dns.google                 →  Google
    cloudflare-dns.com         →  Cloudflare
    

    Яндекс первым - лучше остаётся доступен с мобильных
    операторов, чем Google/CF.

  6. Удалена зависимость bschaatsbergen/dnsdialer и её
    транзитивки (hashicorp/golang-lru/v2, google.golang.org/grpc,
    google.golang.org/genproto/...) - DoH-резолвер полностью
    покрывает функциональность, которую мы использовали.

@samosvalishe
Copy link
Copy Markdown
Contributor Author

Потестить можно тут https://github.com/samosvalishe/turn-proxy-android/releases/tag/v1.8.0

@samosvalishe samosvalishe marked this pull request as draft April 18, 2026 22:26
@samosvalishe
Copy link
Copy Markdown
Contributor Author

…nostics

- Pick turn URL by streamID % len(urls) instead of always urlsRaw[0]
- Add countingConn to track bytes written/read for TCP TURN connections
- Add classifyNetErr helper for structured error categorization
- Log TCP dial failures always; verbose logs gated behind isDebug
@samosvalishe samosvalishe marked this pull request as ready for review April 22, 2026 14:07
…oxy)

Brings the captcha-solver improvements from main into feat/doh while keeping
the flat client/ layout (no internal/* refactor pulled in).

- Persistent SavedProfile (UA + Sec-CH-UA + device JSON + browser_fp) captured
  during manual solve and replayed by auto/slider so VK sees a consistent
  fingerprint across runs. Stored under $VK_PROFILE_PATH | UserCacheDir |
  TempDir | CWD.
- callCaptchaNotRobot: per-session adFp, sha256 debug_info, jittered
  connectionRtt/connectionDownlink, cursor "[]" on first check, headers
  switched to Origin api.vk.ru / Referer not_robot_captcha.
- Slider session: per-session adFp + debugInfo, savedProfile injection,
  ApplyBrowserProfileFhttp + same captcha headers on every request,
  getContent fallback with/without captcha_settings, second componentDone
  before getContent (matches real widget lifecycle).
- Manual proxy: strip WebView identity headers (X-Requested-With and friends),
  server-side rewrite of src/href/action attributes (skipping <script>/<style>
  spans), inject helper script at <head> opening, sendBeacon + form fallback
  for token delivery on mobile WebView, /generic_proxy SSRF allowlist +
  scheme check + security-header strip + server-side success_token extract,
  loggingTransport that captures the real browser fingerprint and persists
  it as SavedProfile, best-effort 3s Shutdown, Windows rundll32 launcher,
  PII redaction in logs.
- solvePoW returns an error instead of an empty string.
- Manual captcha timeout bumped 60s -> 3m on context.Background so a human
  has time to solve regardless of the auth-level deadline; non-empty
  token/key from the manual goroutine is treated as success even if the
  server cleanup returned an error.
Two related changes ported from main, adapted to the flat client/ layout:

1. Identity caching + per-slot TURN creds (vkauth)
   - Split the monolithic getTokenChain into:
     * acquireVkIdentity — captcha-gated steps 1-3 (anonym_token,
       getCallPreview, getAnonymousToken). Cached per (link, client_id)
       for identityLifetime=8m, globally serialised via vkRequestMu +
       3-6s cooldown.
     * acquireVkTurnSlot — lightweight steps 4-5 (auth.anonymLogin
       with fresh device_id, vchat.joinConversationByLink). Each call
       returns a distinct (username, password) pair, so multiple
       streams under the same identity each get their own VK-side
       slot — bypasses per-username throttling without re-solving
       captcha.
   - vkCredentialsList trimmed from 5 to 2: VKVIDEO_* and VK_ID_AUTH_APP
     started returning error_code:3 "Unknown method" on
     calls.getAnonymousToken (observed 2026-04-28) and only burned
     throttle budget if kept in rotation.
   - streamsPerCache 10 -> 1: each stream now caches its own slot
     creds because slots are unique per call.
   - Credential rotation starts at streamID%n offset so concurrent
     streams spread across the credential list instead of all hitting
     the same client_id first.
   - identityStore + identityEntry give per-(link, client_id)
     serialisation: only one stream solves captcha per identity.
   - turn_server.urls picking is transport-aware (prefers urls whose
     ?transport= matches udpMode, falls back to the full list when
     nothing matches to preserve -port override) and round-robins
     within an identity via urlCounter — streamID%len(pool) collapses
     every stream of an identity onto the same parity.

2. Multiple TURN allocations per stream (oneTurnConnection)
   - New -allocs-per-stream flag (default 1).
   - dialTurn extracted as a helper that returns a turnAllocation
     (dialConn, turn.Client, relay PacketConn).
   - relayPool wraps the live relays with sync.RWMutex + atomic
     counter for round-robin pick on the outbound hot path.
   - Outbound goroutine (conn2 -> relay) uses pool.pick() round-robin.
     Per-relay inbound goroutine (relay -> conn2) is spawned via
     spawnInbound; they all feed the same conn2 keyed by
     internalPipeAddr.
   - Primary allocation opens immediately. Extras are deferred 3s so
     the DTLS handshake completes over the primary first, letting the
     server install the Connection ID; subsequent multi-path packets
     are then matched to the existing session via CID rather than
     5-tuple. Each extra is jittered 200ms apart.
   - Allocation tracking + deadline-on-cancel + close-on-exit handle
     clean shutdown of all relays.

A new udpMode global mirrors the -udp flag so acquireVkTurnSlot
(called from the credential layer, which doesn't have access to
turnParams) can filter URLs by transport.
- relayPool/sessionPool: atomic.Pointer copy-on-write, drop RWMutex from pick()
- DTLS read loop caches activeLocalPeer locally to skip type-assert per packet
- solvePoW parallelised across runtime.NumCPU() workers
- vkRequestMu replaced with per-client_id throttle so distinct client_ids run in parallel
- inboundChan 2000 -> 8192, periodic drop-counter logging
- listener caches addr.String() to avoid redundant atomic.Value stores
- handshakeSem 3 -> 8 default + new -handshake-concurrency flag
- per-stream startup jitter 100-500ms -> 30-130ms
- TURN dial ticker 200ms -> 100ms
- extra alloc deferral 3s -> 1s (DTLS handshake completes fast)
- VLESS maintainer stagger 300ms -> 100ms

For N=10 streams cold start drops from ~5-8s of pure scheduling overhead to ~1.5-2s; bottleneck is now the per-client_id throttle and VK API latency.
…bound

- sleepCtx helper (NewTimer+Stop) replaces time.After in long backoff sites:
  DTLS reconnect (10-30s), captcha 60s ban backoff, lockout sleep. Avoids
  long-lived timer leak when ctx cancels mid-wait.
- startIdentityJanitor: prunes expired identityStore entries every 5 min.
  Two-phase with TryLock so an in-flight acquireVkIdentity (which can hold
  entry.mu for tens of seconds during captcha) never blocks the janitor or
  other acquires.
- getYandexCreds: cap ws read loop at 64 messages so a chatty/malformed peer
  cannot keep us reading until the 15s deadline burns down.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant