Skip to content

Use lower thinking/effort for cron and hook sessions#53

Closed
constkolesnyak wants to merge 7 commits into
ClickHouse:mainfrom
constkolesnyak:fix/cron-effort-downgrade
Closed

Use lower thinking/effort for cron and hook sessions#53
constkolesnyak wants to merge 7 commits into
ClickHouse:mainfrom
constkolesnyak:fix/cron-effort-downgrade

Conversation

@constkolesnyak

Copy link
Copy Markdown
Contributor

Problem

Every cron run was failing with:

API Error: 400 {"error":{"message":"level \"max\" not supported, valid levels: low, medium, high","type":"invalid_request_error"}}

Root cause: nerve passed the global agent.effort=max and agent.thinking=max to every session, including cron. Cron sessions run on cron_model (Sonnet by default), and under Claude OAuth (subscription) cli-proxy-api rejects level=max for non-flagship models — only low/medium/high are accepted. With a persistent cron session this poisons the SDK session id, so every subsequent run on the same cron job fails identically until the session is rotated.

This was the second time this hit (first incident: April 21, 2026, same error in inbox-processor).

Fix

  • Add agent.cron_thinking and agent.cron_effort fields (default high) — separate knobs for cron/hook sessions.
  • Extract _select_thinking_effort(agent_config, source) helper that returns (thinking, effort) based on session source — cron and hook get the cron overrides, everything else keeps the main settings.
  • Wire it into _build_options(...) so interactive sessions still get the full thinking budget on the flagship model, while cron jobs stay within what their proxy/provider accepts.

Test plan

  • tests/test_engine_options.py — 13 new tests covering source routing, default values, custom overrides, and back-compat for configs that don't define the new fields
  • Full suite passes: 344 passed, 2 skipped

constkolesnyak and others added 7 commits April 3, 2026 23:15
Gmail preprocessing was stripping all standalone URLs indiscriminately.
Now only removes boilerplate (unsubscribe, social media, tracking pixels)
and keeps actionable links (booking, messaging, payment, reply threads).
Condense prompts updated to explicitly preserve actionable URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Notifications sent via NotificationService._deliver_telegram() bypassed
TelegramChannel.send() and went directly through bot.send_message(),
so they were never stored in _message_cache. When a user reacted to a
notification, the reaction handler couldn't find the original text and
only showed "[Reaction: 👎]" without context.

Now _deliver_telegram caches sent messages via channel._cache_message()
so reactions on notifications include the message text.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
anyio 4.13.0's CancelScope._deliver_cancellation sets should_retry=True
unconditionally for every task in self._tasks, then reschedules itself
via call_soon(). When every task in the scope is the *current* task,
nothing gets cancelled but the callback re-queues on every event-loop
tick — pinning one CPU core at 100% with ~45k epoll_pwait syscalls/sec.

Observed on April 22 and again on April 23 (24h+ of 97% CPU, no work
done). The existing _safe_disconnect() workaround in agent.engine only
clears the stuck scope during client.disconnect(), so spins triggered
by telegram polling / cron / live SDK requests weren't covered.

The patch sets should_retry=True only when we actually delivered a
cancel or the task is still waiting pickup (_must_cancel). Semantics
otherwise match upstream byte-for-byte. Applied via nerve/__init__.py
so any import path picks it up.

Includes a regression test that exercises the exact pathological shape
(scope whose only task is the current task) and asserts the scope
stops rescheduling itself.
The April 23 patch (ec0f8f7) fixed the 96%-CPU hot loop where
should_retry was set unconditionally, but left a narrower version of
the same bug: the `_must_cancel` branch still set should_retry=True.

When the current task itself sits with _must_cancel=True while running
the cancel callback (observed in production today, nerve/SDK path),
this re-queues _deliver_cancellation on every event-loop tick:

    before fix:  20% CPU, ~61k epoll_pwait/sec
    after fix:   should be idle (~5% CPU range)

Changes:
- Skip the current task entirely — it cannot cancel itself from inside
  the callback it is running.
- Drop should_retry=True in the _must_cancel branch — asyncio's
  Task.__step raises CancelledError when the task resumes, no retry
  needed from us.
- should_retry is now True only when we actually called task.cancel()
  in this pass.
- Add regression test that poisons the _must_cancel branch with a
  fake task. The existing "current-task-only" test passed without
  reproducing this variant because it didn't set _must_cancel.
Every cron run was failing with `API Error: 400 level "max" not
supported, valid levels: low, medium, high` because the global
`agent.effort=max` and `agent.thinking=max` were applied to cron
sessions too. Cron sessions run on `cron_model` (Sonnet by default),
which under Claude OAuth (subscription) caps non-flagship models at
`high` and rejects `max`. With a persistent cron session this also
poisons every subsequent run on the same SDK session id.

Add dedicated `agent.cron_thinking` / `agent.cron_effort` settings
(default `high`) and a `_select_thinking_effort` helper that picks
the right pair based on session source. `cron` and `hook` get the
overrides; everything else keeps the main settings, so interactive
sessions still get the full thinking budget on the flagship model.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@constkolesnyak

Copy link
Copy Markdown
Contributor Author

Wrong target repo, recreating against constkolesnyak/nerve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant