Skip to content

fix(telegram): resolve connection pool exhaustion causing silent message loss#8320

Open
whatevertogo wants to merge 2 commits into
AstrBotDevs:masterfrom
whatevertogo:fix/issue-8314-telegram-pool-exhaustion
Open

fix(telegram): resolve connection pool exhaustion causing silent message loss#8320
whatevertogo wants to merge 2 commits into
AstrBotDevs:masterfrom
whatevertogo:fix/issue-8314-telegram-pool-exhaustion

Conversation

@whatevertogo
Copy link
Copy Markdown
Contributor

@whatevertogo whatevertogo commented May 24, 2026

Summary

Resolves #8314

The Telegram adapter stops receiving messages after running for a while, requiring a manual adapter restart. The root cause is connection pool exhaustion in the httpx client used by python-telegram-bot's long-polling mechanism.

Root Cause

ApplicationBuilder defaults create a get_updates_request with:

  • read_timeout=5.0s — far shorter than Telegram's long-poll timeout (~30s), causing legitimate waiting connections to be forcibly dropped
  • pool_timeout=1.0s — gives up immediately when the single connection pool slot is occupied
  • connection_pool_size=1 — only one connection for long-polling

When the single long-poll connection becomes stuck (network glitch, half-open TCP, proxy), the pool is exhausted. Subsequent getUpdates calls hit PoolTimeout and the adapter silently stops receiving messages.

On shutdown, the same exhaustion causes updater.stop() to hang indefinitely as it tries a final get_updates call (visible in the issue's stack trace).

Changes

_build_application() — Configure proper HTTP timeouts

Parameter Default New Reason
get_updates_read_timeout 5.0s 60.0s Must exceed Telegram's ~30s long-poll
get_updates_connect_timeout 5.0s 15.0s Generous connection phase timeout
get_updates_pool_timeout 1.0s 10.0s Allow time for pool recovery when connection stalls
read_timeout 5.0s 30.0s General API calls don't need ultra-short timeout
connect_timeout 5.0s 10.0s General API connection timeout
pool_timeout 1.0s 5.0s General API pool wait timeout

_shutdown_application() — Add timeout shield around updater.stop()

Wrap updater.stop() in asyncio.wait_for(..., timeout=10.0) so shutdown doesn't hang indefinitely when the pool is already exhausted.

Verification

  • ruff check — All checks passed ✅
  • ruff format — Clean ✅
  • Module import — Works ✅

Summary by Sourcery

Adjust Telegram adapter HTTP client timeouts and shutdown behavior to prevent long‑polling lockups and ensure clean termination.

Bug Fixes:

  • Increase Telegram long‑polling and general HTTP timeouts, including connection pool timeouts, to avoid connection pool exhaustion that previously caused the adapter to silently stop receiving messages.
  • Guard Telegram updater shutdown with a bounded asyncio timeout so adapter shutdown no longer hangs when the HTTP connection pool is exhausted.

…age loss

Configure ApplicationBuilder with proper HTTP timeouts for long-polling:
- get_updates_read_timeout 60s (must exceed Telegram's ~30s long-poll)
- get_updates_connect_timeout 15s, pool_timeout 10s (allow pool recovery)
- General API read_timeout 30s, connect_timeout 10s, pool_timeout 5s

Also add asyncio.wait_for timeout shield around updater.stop() in
shutdown to prevent indefinite hang when pool is already exhausted.

Resolves AstrBotDevs#8314
Copilot AI review requested due to automatic review settings May 24, 2026 19:34
@auto-assign auto-assign Bot requested review from Raven95676 and anka-afk May 24, 2026 19:34
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. labels May 24, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new HTTP timeout and pool settings in _build_application are hard-coded; consider wiring them to adapter config (with current values as defaults) so they can be tuned for different environments without code changes.
  • In _shutdown_application, the broad except Exception: pass around await asyncio.wait_for(updater.stop(), ...) will silently swallow unexpected errors; it would be safer to either reuse contextlib.suppress for specific exception types or at least log non-timeout exceptions.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new HTTP timeout and pool settings in `_build_application` are hard-coded; consider wiring them to adapter config (with current values as defaults) so they can be tuned for different environments without code changes.
- In `_shutdown_application`, the broad `except Exception: pass` around `await asyncio.wait_for(updater.stop(), ...)` will silently swallow unexpected errors; it would be safer to either reuse `contextlib.suppress` for specific exception types or at least log non-timeout exceptions.

## Individual Comments

### Comment 1
<location path="astrbot/core/platform/sources/telegram/tg_adapter.py" line_range="172-183" />
<code_context>
+        self._application_started = False
+
+        updater = self.application.updater
+        if updater is not None:
+            try:
+                await asyncio.wait_for(updater.stop(), timeout=10.0)
+            except asyncio.TimeoutError:
+                logger.warning(
+                    "Telegram updater stop timed out; connection pool may be exhausted."
+                )
+            except Exception:
+                pass
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Swallowing all exceptions from `updater.stop()` may hide shutdown issues.

Since you’re now handling timeouts explicitly, consider logging other exceptions instead of `except Exception: pass`. If `updater.stop()` starts failing after a library or dependency change, those failures will be invisible and hard to debug. Emitting at least a warning (e.g. `logger.warning(..., exc_info=True)`) would keep shutdown robust while making real issues detectable.

```suggestion
        self._application_started = False

        updater = self.application.updater
        if updater is not None:
            try:
                await asyncio.wait_for(updater.stop(), timeout=10.0)
            except asyncio.TimeoutError:
                logger.warning(
                    "Telegram updater stop timed out; connection pool may be exhausted."
                )
            except Exception:
                logger.warning(
                    "Error while stopping Telegram updater; shutdown may be incomplete.",
                    exc_info=True,
                )
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/platform/sources/telegram/tg_adapter.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Telegram adapter's resilience by configuring explicit timeouts for long-polling and general API calls. It also adds a timeout mechanism when stopping the updater during shutdown to avoid resource exhaustion. Feedback suggests increasing the connection pool size, as a single stalled connection could still block the adapter despite the new timeout settings.

Comment thread astrbot/core/platform/sources/telegram/tg_adapter.py
- Increase get_updates_connection_pool_size from 1 to 2 so a single
  stalled connection doesn't exhaust the pool entirely
- Log non-timeout exceptions during updater.stop() instead of
  silently swallowing them (they may indicate real shutdown issues)

Resolves AstrBotDevs#8314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]telegram运行一段时间后无法接收新消息,重启适配器才能使用

2 participants