Skip to content

Fix cancelling in-flight LLM operations#175

Open
enderfirst wants to merge 3 commits into
AI-Shell-Team:mainfrom
enderfirst:fix-cancel-llm-timeout
Open

Fix cancelling in-flight LLM operations#175
enderfirst wants to merge 3 commits into
AI-Shell-Team:mainfrom
enderfirst:fix-cancel-llm-timeout

Conversation

@enderfirst
Copy link
Copy Markdown
Contributor

@enderfirst enderfirst commented May 13, 2026

Summary

  • Problem: 用户按 Ctrl+C 取消 AI 请求后,底层 LLM 请求有时仍会在后台继续运行,之后超时时又打印 LLM request timed out,容易让用户误以为是网络问题或取消没有生效。
  • Changes: 将取消信号传入实际执行 LLM 请求的异步任务;如果请求已被用户取消,后续 timeout 按取消处理,不再显示成 LLM 超时;补充取消行为的回归测试。
  • Related Issue: #

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Other

Scope

  • Core shell / PTY
  • AI agent / LLM
  • Skills / Tools
  • Security
  • Configuration
  • CLI / Interface
  • Packaging / Installation
  • CI/CD
  • Documentation

User-visible Changes

  • 用户按 Ctrl+C 取消 AI 请求后,已取消的旧请求不会再延迟打印 LLM request timed out
  • 不改变工具执行中断时允许当前任务收尾的行为。
  • 真实的 LLM 请求超时仍会按原有逻辑提示。

Compatibility

  • Backward compatible? Yes
  • Config changes? No

Testing

  • uv --native-tls run --group dev python -m pytest tests/shell/runtime/test_shell_pty_core.py -q
    • Result: 69 passed
  • uv --native-tls run --group dev ruff check src/aish/shell/runtime/ai.py src/aish/llm/session.py tests/shell/runtime/test_shell_pty_core.py
  • python3 -m compileall -q src/aish/shell/runtime/ai.py src/aish/llm/session.py tests/shell/runtime/test_shell_pty_core.py
  • git diff --check

Additional related test run:

  • uv --native-tls run --group dev python -m pytest tests/llm tests/tools/test_final_answer.py tests/test_ask_user_tool.py -q
    • Result: 74 passed, 1 skipped, 1 failed
    • Failed test: tests/llm/test_litellm_langfuse.py::testshell_integration
    • Note: this is a live LLM integration test. The API call completed but returned an empty response, which appears unrelated to this cancellation change.

Checklist

  • Code follows project style
  • Tests added if needed
  • [ ] Documentation updated if needed

Summary by CodeRabbit

  • Bug Fixes

    • Improved cancellation handling for AI operations: cancels running work promptly, emits cancellation events when appropriate, and avoids misreporting timeouts.
  • Tests

    • Added tests verifying coroutine cancellation behavior and that timeout vs cancellation produce the correct event flow and responses.

Review Change Stack

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the pull request. A maintainer will review it when available.

Please keep the PR focused, explain the why in the description, and make sure local checks pass before requesting review.

Contribution guide: https://github.com/AI-Shell-Team/aish/blob/main/CONTRIBUTING.md

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Template check passed. Thanks for updating the pull request description.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@enderfirst has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 11 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e56e496b-35f8-45a7-87ad-7dd95a915f53

📥 Commits

Reviewing files that changed from the base of the PR and between fde37a0 and 597584a.

📒 Files selected for processing (1)
  • src/aish/llm/session.py
📝 Walkthrough

Walkthrough

Adds early cancellation-token checks to LLMSession timeout handlers so actual cancellation emits cancelled events and raises AnyIO cancelled; adds tests verifying TimeoutError still reports status="timeout" and is not marked as cancelled.

Changes

LLM timeout & cancellation behavior

Layer / File(s) Summary
LLMSession timeout handlers check cancellation first
src/aish/llm/session.py
process_input and completion TimeoutError paths now test cancellation_token.is_cancelled() first; when cancelled they emit llm_cancelled + GENERATION_END(status="cancelled") and raise the AnyIO cancelled exception instead of running the timeout flow.
Tests: timeout remains timeout (not cancelled)
tests/llm/test_llm_events.py
Added two async tests that mock _get_acompletion to raise TimeoutError and assert process_input and completion return "LLM request timed out", emit OP_START/GENERATION_START/GENERATION_END/OP_END, set generation status=="timeout", and do not mark the final event as cancelled.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nudge the loop, I check the token tight,
Timeout stays a timeout, cancel flags take flight.
Events emitted clear, no mystery to mend,
The rabbit hops off—cleanup in the end.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: fixing cancellation handling for in-flight LLM operations. It directly reflects the core problem and solution described in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/aish/shell/runtime/ai.py`:
- Around line 169-175: The _cancel_running_task() helper currently calls
loop.call_soon_threadsafe(task.cancel) but may return before cancellation is
delivered and can raise RuntimeError if the loop closes; update
_cancel_running_task() and the _run_async_in_thread cancellation path so you:
(1) guard call_soon_threadsafe with loop.is_running() and catch RuntimeError,
(2) use asyncio.run_coroutine_threadsafe or schedule a small coroutine on the
target loop that cancels the task and awaits task completion, and (3) block the
caller (with a bounded timeout) until task.done() or the awaitable from
run_coroutine_threadsafe finishes, then only return/raise KeyboardInterrupt
after confirming task.cancelled() or task.done(); apply this logic to the
cancellation flows in _cancel_running_task() and the cancellation handling
inside _run_async_in_thread to avoid leaving the coroutine running in
background.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fb79ec45-81d3-4e6d-935f-0275bfdaddf4

📥 Commits

Reviewing files that changed from the base of the PR and between fd3895f and bc01dc0.

📒 Files selected for processing (3)
  • src/aish/llm/session.py
  • src/aish/shell/runtime/ai.py
  • tests/shell/runtime/test_shell_pty_core.py

Comment on lines +169 to +175
def _cancel_running_task() -> None:
loop = loop_box[0]
task = task_box[0]
if loop is None or task is None or task.done() or loop.is_closed():
return
loop.call_soon_threadsafe(task.cancel)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure cancellation is actually delivered before returning cancellation to caller.

On Line 174/Line 186–189, _run_async_in_thread can raise KeyboardInterrupt even if task cancellation was never delivered (loop not ready / future still running), and call_soon_threadsafe can race with loop close and raise RuntimeError. That can leave the coroutine running in the background after the caller thinks it was cancelled.

Suggested hardening
-        def _cancel_running_task() -> None:
+        def _cancel_running_task() -> bool:
             loop = loop_box[0]
             task = task_box[0]
             if loop is None or task is None or task.done() or loop.is_closed():
-                return
-            loop.call_soon_threadsafe(task.cancel)
+                return False
+            try:
+                loop.call_soon_threadsafe(task.cancel)
+                return True
+            except RuntimeError:
+                # loop closed between check and scheduling
+                return False
...
                     if cancellation_token and cancellation_token.is_cancelled():
-                        if loop_ready.wait(timeout=0.2):
-                            _cancel_running_task()
+                        cancel_sent = False
+                        if loop_ready.wait(timeout=0.2):
+                            cancel_sent = _cancel_running_task()
                         try:
                             future.result(timeout=2.0)
                         except FutureTimeoutError:
-                            pass
+                            if cancel_sent and not future.done():
+                                continue
                         raise KeyboardInterrupt("AI operation cancelled by user")

Also applies to: 183-189

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/aish/shell/runtime/ai.py` around lines 169 - 175, The
_cancel_running_task() helper currently calls
loop.call_soon_threadsafe(task.cancel) but may return before cancellation is
delivered and can raise RuntimeError if the loop closes; update
_cancel_running_task() and the _run_async_in_thread cancellation path so you:
(1) guard call_soon_threadsafe with loop.is_running() and catch RuntimeError,
(2) use asyncio.run_coroutine_threadsafe or schedule a small coroutine on the
target loop that cancels the task and awaits task completion, and (3) block the
caller (with a bounded timeout) until task.done() or the awaitable from
run_coroutine_threadsafe finishes, then only return/raise KeyboardInterrupt
after confirming task.cancelled() or task.done(); apply this logic to the
cancellation flows in _cancel_running_task() and the cancellation handling
inside _run_async_in_thread to avoid leaving the coroutine running in
background.

@enderfirst enderfirst closed this May 13, 2026
@enderfirst enderfirst reopened this May 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/aish/llm/session.py`:
- Around line 1777-1784: The except TimeoutError handler that checks
self.cancellation_token and then raises anyio.get_cancelled_exc_class() loses
the original TimeoutError context; capture the caught exception (e.g., "except
TimeoutError as err:") and re-raise the cancellation exception using "raise
anyio.get_cancelled_exc_class() from err" so exception chaining is preserved;
apply the same change to the other identical spot that re-raises the
cancellation exception (the code that calls events.emit_cancelled /
events.emit_generation_end and raises the anyio cancellation class).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d517254-0896-4b5b-9cd9-cf4c64582bed

📥 Commits

Reviewing files that changed from the base of the PR and between bc01dc0 and fde37a0.

📒 Files selected for processing (2)
  • src/aish/llm/session.py
  • tests/llm/test_llm_events.py

Comment thread src/aish/llm/session.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant