telegram bot reliability: silence noise, retry sends, multi-chat digest, persist nonces#130
Merged
Merged
Conversation
PR #127 stopped the bot from replying "unknown command" to other bots and to unknown commands in groups, but DMs still hit the hint when a user types a third-party command like /raid or /setup_raid because those messages come from a real user, not a bot. This adds two layers on top: 1. A vocabulary list of well-known third-party commands (raid bots, sniper bots, chat-mod bots, common slang typed with a slash) that we silently ignore regardless of chat type. 2. A "looks like ours" heuristic — only reply with a hint when the unknown command is purely alphabetic ascii between 2 and 12 chars, which matches the shape of every command we own. /setup_raid, /raid123, single-letter commands, and anything else falls through silently. Together they stop the bot from ever responding "unknown command" during raid setup, while still helping a user who types /statu in a dm.
1f2304f to
9d294a8
Compare
Both telegram clients (the alerter-side TelegramClient in telegram_format and the polling-side client in telegram_commands) used to log and drop any non-2xx response. Digest signals were silently lost on rate-limit; command replies vanished on transient 5xx. Adds a shared send_with_retry helper in telegram_format that: - Honours Telegram's parameters.retry_after on 429 responses - Backs off exponentially (1s, 2s, 4s, 8s, 16s, capped at 60s) for 5xx and network errors - Treats 4xx other than 429 as permanent — no retry, single error returned - Retries up to 5 attempts before giving up A pure classify_response function makes the policy testable without a live HTTP call. New unit tests cover 2xx/4xx/5xx/429 classification, retry-after extraction, and the backoff schedule. Both clients now route their sendMessage calls through the helper. The polling-side getUpdates path is unchanged because long-poll errors are already retried by the outer loop.
The digest scheduler used to drain the alert bus and post one message to
the env TELEGRAM_CHAT_ID. Per-chat overrides (/threshold, /mute,
/subscribe, /quiet) wrote rows into tg_chat_config but the alerter
ignored them — users were typing commands that did nothing.
This wires the digest tick to:
1. Drain the bus once per tick (preserves single-publisher semantics).
2. Resolve destination chats from the env default plus every row in
tg_chat_config (deduped, env chat first).
3. For each destination, apply the per-chat filter pipeline:
- quiet window — skip the chat for this tick
- subscribed_kinds — drop signals the chat opted out of
- muted_markets — drop signals for muted slug or market_key
- effective_threshold — drop signals whose move_size is below the
chat's threshold (env default if unset)
4. Run select_top_signals with a per-chat cooldown slice so a market
that was just sent to chat A doesn't suppress it for chat B.
5. Send via TelegramClient::send_to(chat_id, ...) — a new method that
targets an arbitrary chat without changing the existing send()
surface that other alerters depend on.
Per-chat cooldowns are tracked in a single HashMap<(i64, String), u64>
on the spawn task, projected down to the per-chat shape select_top_signals
already expects via scoped_cooldowns. New unit tests cover the projection
and confirm cooldowns isolate properly between chats.
list_chat_ids in tg_chat_config returns every configured chat. DB
errors return an empty vec so the env-default chat continues to receive
alerts even when the table query fails.
The /link nonce store was deliberately in-memory: a HashMap keyed on (chat_id, wallet_lower) on the long-poll task. Every API restart wiped it, forcing every user mid-flow to redo /link. With auto-deploy from main that meant a fresh nonce request after each merge. Replaces the type alias + free functions with a NonceStore struct that writes to Redis under tg:link_pending:<chat_id> with the configured NONCE_TTL_SECS as the key TTL. The value is a small JSON blob holding the wallet lowercase and the nonce hex so /verify can resolve both without a second key. Single-use semantics live at the consume step: peek is non-destructive so /verify can recover the wallet, run signature verification, and only then delete. A losing concurrent /verify call would re-fetch the same peek but the second tg_chat_config upsert is idempotent for the same wallet, so the race is safe. Redis hiccups fall back to an in-memory map shadow on the same task — /link still works during a Redis outage, just without persistence. The fallback is purged on every operation so expired entries do not leak. The chat-keyed scheme also cleans up the multi-pending-wallet case the old HashMap had ambiguously: when a user re-runs /link with a different wallet in the same chat, the latest one wins. Today the behavior was whichever HashMap iteration returned first, which was already non-deterministic across multiple pending wallets. Tests cover the in-memory fallback path directly. The Redis glue is a thin wrapper over RedisService::set/get/delete and is exercised in integration only.
209bf92 to
8f5b8a9
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four telegram-bot reliability fixes, stacked on the same branch because
each builds on the previous (multi-chat needs the retry layer's
`send_to`; persistent nonces use the same Redis path the rest of the
service already depends on).
1. Silence third-party raid/sniper commands in DMs (`9d294a8`)
PR #127 stopped "unknown command" replies from going to other bots and
from groups, but DMs still hit the hint when a user types a third-party
command like `/raid` or `/setup_raid` themselves.
chat-mod bots, common slash-prefixed slang) — silenced regardless of
chat type.
when the unrecognized command is purely alphabetic ASCII between 2
and 12 chars (the shape of every command we own). `/setup_raid`,
`/raid123`, single letters all fall through silently.
2. Retry sends on 429 and transient 5xx (`88b17c8`)
Both telegram clients used to log and drop any non-2xx response. Digest
signals were silently lost on rate-limit; command replies vanished on
transient 5xx.
the alerter-side `TelegramClient` and the polling-side client in
`telegram_commands`.
5xx and network errors.
3. Fan the digest out per-chat (`a0c6e36`)
The digest scheduler used to drain the bus and post one message to the
env `TELEGRAM_CHAT_ID`. Per-chat overrides (`/threshold`, `/mute`,
`/subscribe`, `/quiet`) wrote rows into `tg_chat_config` but the
alerter ignored them — users were typing commands that did nothing.
effective_threshold.
to the existing `select_top_signals` shape — a market just sent to
chat A doesn't suppress it for chat B.
arbitrary chats without changing the existing `send()` surface that
other alerters depend on.
4. Persist /link nonces in Redis (`209bf92`)
The `/link` nonce store was deliberately in-memory: a HashMap on the
long-poll task. Every API restart wiped it, forcing every user mid-flow
to redo `/link`. With auto-deploy from main that meant a fresh nonce
request after every merge.
with the configured `NONCE_TTL_SECS` as the key TTL.
verification can run first, then `consume` deletes.
working through an outage, just without persistence.
HashMap-ordering behavior when multiple wallets had pending /link in
the same chat. Latest `/link` wins.
Test plan
`/threshold 10` in a non-env chat → next digest respects it;
restart api, then `/verify` an outstanding link nonce → still works
(nonce survived); digest survives a forced 429 from the bot api (manual)
Notes
threshold, no mutes, all kinds, no quiet window.
replies and digest sends both benefit.
dormant direct-send paths in `probability_alert`/`new_market_alert`,
digest idempotency under genuine retry-exhaust.