fix(provider): restore native OpenRouter endpoint when switching back from a direct profile#280
Open
prateekjain-afk wants to merge 5 commits into
Open
Conversation
… from a direct profile
Selecting a built-in OpenAI-compatible profile (e.g. NVIDIA NIM via
"nvidia-nim:...") calls force_apply_openai_compatible_profile_env(Some(profile)),
which stamps that profile's endpoint and API key into the global
JCODE_OPENROUTER_* env. Switching back to a native OpenRouter catalog model
("openrouter/owl-alpha") never cleared those overrides, so the native model
was POSTed to the stale profile endpoint (https://integrate.api.nvidia.com/v1)
with the wrong key and returned 404. Because the leak lives in process-global
env, even brand-new sessions kept failing until the server was restarted.
Fix: in the OpenRouter set_model arm, when the previous selection was a built-in
direct profile (profile_id is Some) and the target is a native openrouter.ai
catalog model, reset the profile env to None and rebuild the provider so it
talks to the native endpoint again. Raw/custom JCODE_OPENROUTER_API_BASE
endpoints (profile_id == None), @-pinned ids, and locked named profiles are
deliberately left untouched.
Adds a regression test that reproduces the OpenRouter -> NVIDIA -> OpenRouter
switch-back and asserts the endpoint override is cleared (fails without the fix
with the exact integrate.api.nvidia.com URL).
Spawned swarm agents got stuck forever at 'startup queued' because the default spawn mode was Visible: the server forks a terminal launcher (e.g. 'open -a Terminal'), the fork succeeds, but on a server/headless host (jcode serve shared server, no GUI) no interactive client ever attaches to drive the agent loop. The member sits 'running / startup queued', DMs land in an unread mailbox, and wake/resume fail because no task ever ran. Fixes: - Auto mode now verifies a visible launch actually produced a live client attachment (SwarmMember.event_txs becomes non-empty) within a short timeout; if not, it tears down the orphaned visible session and falls back to the in-process headless runner, which always executes. - register_visible_spawned_member no longer clobbers a member that a real client already attached to (avoids a race when a client connects during the Auto attach-wait window). - Default swarm_spawn_mode changed Visible -> Auto so swarm works out of the box on both desktop and headless hosts. Adds unit tests for attach detection, timeout fallback, and the non-clobber guard.
00f546a to
8e35c8d
Compare
…ng/master Provides a one-command, safe way to pull upstream (origin/master) updates timely while keeping local fix commits on top. Tags a backup before rewriting history and aborts cleanly on conflict.
…ing "action missing" label Two swarm UX bugs surfaced when running a research swarm on a headless \`jcode serve\` shared server: 1. Auto spawn opened a useless bare-jcode Terminal window per child and then waited out the 8s attach timeout before falling back to headless. Now Auto checks up-front whether the requesting coordinator itself has a live interactive client (event_txs). If not (headless server), it skips the visible attempt entirely and spawns the child headless immediately, eliminating the orphan window and the per-spawn delay. The post-launch wait_for_live_attachment safety net is retained for the case where an attached coordinator opens a child window that fails to attach. 2. The TUI rendered swarm/memory/initiative/side_panel tool calls as "action missing" (with a warning logged) whenever the streamed tool input had not yet populated its arguments (empty object). This flashed "swarm action missing" for every spawned agent. Added tool_input_is_unpopulated + resolve_tool_action_for_display so an unpopulated/streaming call shows a neutral "…" without logging, while a genuinely malformed (populated-but-action-less) call still surfaces the diagnostic. Adds 6 unit tests (3 in jcode-app-core, 3 in jcode-tui).
…ed serve server The previous Auto gate keyed off the coordinator session having a live event channel, but an interactive coordinator attached to a *detached* \`jcode serve\` shared server still reports as attached, so visible spawns were still attempted (orphan window + 8s attach-timeout per child). Switch the signal to whether THIS process has a controlling TTY: a detached \`jcode serve\` server has none (ps TTY \`??\`), while an interactive jcode/desktop run by a user does. When detached, Auto spawns children headless directly. Add JCODE_SWARM_FORCE_VISIBLE=1 escape hatch. session_has_live_attachment is retained under #[cfg(test)]. Adds running_as_detached_server_respects_force_visible_override test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Switching models OpenRouter → NVIDIA NIM → back to a native OpenRouter model (e.g.
openrouter/owl-alpha) fails with a 404. The request is sent to the wrong endpoint:Root cause
Selecting a built-in OpenAI-compatible profile (e.g. NVIDIA NIM via
nvidia-nim:...) callsforce_apply_openai_compatible_profile_env(Some(profile)), which stamps that profile's endpoint + API key into the process-globalJCODE_OPENROUTER_*env vars. TheActiveProvider::OpenRouterarm ofset_modelnever cleared those overrides when switching back, so a native OpenRouter model was POSTed to the stale profile endpoint with the wrong key.Because the leak lives in process-global env, even brand-new sessions kept failing until the server was fully restarted. Other providers (Claude/OpenAI/Gemini/Copilot) are unaffected because they use self-contained providers and never touch this shared env.
Fix
In the OpenRouter
set_modelarm: when the previous selection was a built-in direct profile (profile_id.is_some()) and the target is a nativeopenrouter.aicatalog model (id starts withopenrouter/), reset the profile env toNoneand rebuild the provider so it talks to the native endpoint again.Deliberately left untouched:
JCODE_OPENROUTER_API_BASE(profile_id == None)@provider-pinned or opaque model ids on forced-OpenRouter providersJCODE_PROVIDER_PROFILE_ACTIVE)Tests
Adds
test_switch_back_to_native_openrouter_restores_endpoint_after_nvidia, which reproduces the OpenRouter → NVIDIA → OpenRouter switch-back and asserts the endpoint override is cleared. It fails without the fix with the exactintegrate.api.nvidia.comURL, and passes with it. Verified no regressions in the provider test suite (the only remaining failures are pre-existing parallel-env-contamination flakes present onmasterthat pass in isolation).