refactor(registry/coder-labs/modules/codex)!: remove agentapi, tasks and start logic#879
Conversation
…der-utils, remove start script Mirror the claude-code refactor from #861: - Replace module agentapi with module coder_utils - Replace install.sh with install.sh.tftpl (templatefile) - Delete start.sh entirely - Remove all AgentAPI, boundary, and start-script-only variables - Rename enable_aibridge to enable_ai_gateway - Make workdir optional (default null) - Output scripts list instead of task_app_id - Conditional env var resources with count - Update tests and README
…_policy, and sandbox_workspace_write from default config
…fault behavior with heredoc
…s from default config
…odel_reasoning_effort descriptions
…var, add test coverage - Remove codex_model variable (unused after model_migrations removal) - Add model_reasoning_effort assertion to AI gateway test - Add workdir-trusted-project and no-workdir-no-project-section tests - Run bun fmt
…de installation, and ARG_INSTALL Codex always installs via npm. Removed the install_codex toggle, the install_node/nvm bootstrap, and the ARG_INSTALL plumbing.
…k for non-root installs When NVM is not available, set npm prefix to ~/.npm-global so npm install -g works without root permissions.
…kdir sections from README
…_toml needs manual model_provider for AI Gateway
|
/coder-agents-review |
There was a problem hiding this comment.
Solid refactor that cleanly converges codex on the coder-utils pattern, removes substantial dead code (AgentAPI, Boundary, Tasks), and simplifies the install pipeline. The scripts output design and conditional coder_env resources are well-considered. The breaking change is documented and version-bumped correctly.
Severity breakdown: 3 P2, 11 P3, 2 Nit.
The P2s center on: (1) openai_api_key missing sensitive = true (5 reviewers flagged this), (2) ARG_WORKDIR/ARG_CODEX_VERSION passed unencoded through templatefile single-quote assignments, breaking or enabling injection when the value contains ' or ", and (3) zero test coverage for the enable_ai_gateway + base_config_toml combination, which is the most complex split behavior in the install script and the only path requiring manual user action.
Five reviewers independently flagged that enable_ai_gateway = true with a user-provided base_config_toml unconditionally appends [model_providers.aibridge], risking duplicate TOML headers if the user's config already defines that section.
Process note: the PR description lists install_codex as removed, but the final code retains it. The description should be updated to match the delivered code. Two "debug" commits remain unsquashed in the branch; consider squashing before merge.
P3 [DEREM-15] The PR description's "Removed variables" list includes install_codex, but main.tf:45 retains it. The description also claims "ARG_INSTALL toggle (Codex always installs via npm)" was removed, but install.sh.tftpl has ARG_INSTALL. A reviewer trusting the description would miss that these were restored after the initial removal. (Mafu-san)
P3 No test for the default install path. The deleted check-latest-codex-version-works was the only coverage for npm install -g "@openai/codex" (without version pin). The surviving install-codex-version exercises only the pinned branch. The most common real-world path now has zero test coverage. (Bisky) Note: -main.test.ts:78, outside current diff.
Pariston: "I tried to construct a scenario where the stated problem exists but this fix does not help. I could not. The refactor is structurally complete."
Mafu-san: "Despite the turbulent path to get there, the final install_codex function handles NVM, non-NVM, and non-root environments correctly."
🤖 This review was automatically generated with Coder Agents.
|
Addendum: findings incorrectly dropped from the initial review. The following were cut during cross-check with insufficient justification. Reopening them here. P3 [DEREM-1] P3 [DEREM-2] P3 [DEREM-3] Note [DEREM-4] Note [DEREM-5] Note [DEREM-24]
|
DEREM-6: mark openai_api_key as sensitive DEREM-7: base64-encode ARG_CODEX_VERSION and ARG_WORKDIR to prevent shell/TOML injection from user-controlled input DEREM-8: add test for AI gateway + custom base_config_toml, verifying provider section appended and no duplicates when user includes it DEREM-10: guard duplicate [model_providers.aibridge] with grep before appending DEREM-11: fail early with actionable error when npm is not available DEREM-13: add install-codex-latest test covering the unversioned path DEREM-16: mention npm requirement in v5 upgrade warning DEREM-17: restore allowed values and docs link on model_reasoning_effort, document empty-string behavior on codex_version DEREM-20: remove dead skipCodexMock from ai-gateway test DEREM-21: codex_version description now explains empty means latest
…cing NVM puts npm on PATH when sourced. The npm existence check must run after the NVM block so users with NVM-managed Node are not rejected.
|
@matifali, addressed the comments and tested. |
|
/coder-agents-review |
There was a problem hiding this comment.
Re-review blocked. 10 of 22 findings were addressed (thank you), but 12 remain without a substantive response. The panel cannot re-review until each open finding gets one of: a fix, a reasoned defense, or a linked deferral ticket.
Findings with content-free dismissals ("not a concern, do not address" with no reasoning):
- DEREM-9 (P3) Test boilerplate copied verbatim from claude-code. Author: "out of context of pr, do not address." The consequence (maintenance burden across six future module migrations) was not addressed.
- DEREM-12 (P3) minimal-default-config test has one positive assertion, no negative. Author: "not a concern, do not address." The consequence (silent regressions injecting aibridge or sandbox_mode into defaults) was not addressed.
- DEREM-14 (P3) TOML injection via ARG_WORKDIR in project trust section. Author: "Not a concern do not address." Note: DEREM-7 fix (base64-encoding) protects the shell assignment, but
ARG_WORKDIRis still interpolated unsanitized into a TOML heredoc at line 78. The TOML injection vector persists. - DEREM-18 (P3) Debug header prints internal template variable names. Author: "not a concern, do not address."
- DEREM-19 (P3) openai-api-key test spins up a container it never uses. Author: "do not address."
Findings with no response at all (posted in addendum comment):
- DEREM-1 (P3) model_reasoning_effort only tested with enable_ai_gateway, never standalone.
- DEREM-2 (P3) Display name "AI Bridge" while module calls feature "AI Gateway."
- DEREM-3 (P3) Tautological tftest assertions (var == input literal).
- DEREM-4 (Note) Unquoted path in README example.
- DEREM-5 (Note) model_reasoning_effort and workdir silently dropped when base_config_toml provided.
- DEREM-15 (P3) PR description claims install_codex removed but code retains it.
- DEREM-24 (Note) No migration section for v4 to v5 users.
For each: fix it, explain why the current behavior is correct, or file a ticket to defer. A bare "do not address" without engaging the stated consequence does not unblock. This needs a human decision.
🤖 This review was automatically generated with Coder Agents.
…in config, add pnpm/bun fallbacks - Rename model_provider and [model_providers.] section from 'aibridge' to 'aigateway' across main.tf, install template, tests, and README. The API path and env var stay unchanged (protocol-level). - Add pnpm and bun as fallback package managers for codex installation when npm is not available.
There was a problem hiding this comment.
Re-review of the delta since R1. 19 of 22 R1 findings were addressed with code changes; good progress. The PR description was updated to correctly list install_codex as retained. Migration guide and Node.js upgrade warning added.
New severity breakdown: 7 P3 open (3 Netero re-raises from R1 findings that regressed or were incompletely fixed, 4 new panel findings).
DEREM-14 contest (TOML injection via ARG_WORKDIR): rejected unanimously by all 7 panel reviewers. The author's defense conflates shell injection (fixed by base64 at assignment time) with TOML injection (decoded value interpolated raw into a heredoc at line 80). DEREM-25 tracks the active root cause. This needs a human decision: add a workdir validation constraint, or quote/escape the value before TOML interpolation.
DEREM-9/27 (test boilerplate duplication): the author extracted shared helpers in 41fa36c, then reverted in 2aa54d4 with no explanation. The duplication remains. This also needs a human decision: extract again (with claude-code updated too), or accept the duplication.
Mafu-san: "The commit messages identify the finding number, state what was done, and for contested or accepted findings explain the reasoning. This is traceable, deliberate work. The process failures are specific to verification depth and correction persistence, not to the volume or organization of work."
Hisoka on DEREM-14: "A workdir with an embedded newline decodes cleanly into the shell variable, and heredoc expansion writes the newline verbatim into config.toml. The result is a new [malicious] table injected above trust_level."
registry/coder-labs/modules/codex/main.tf:7
P3 [DEREM-30] The codex module declares coder >= 2.12 but its dependency module.coder_utils requires >= 2.13. A user who provisions with coder provider 2.12.x passes the codex constraint but fails during coder_utils resolution with a confusing error pointing at the wrong module.
Fix: change to version = ">= 2.13".
🤖
🤖 This review was automatically generated with Coder Agents.
matifali
left a comment
There was a problem hiding this comment.
@DevelopmentCats can you give it a test too?
Yeah I will retest and post with my results. I ran into some issues before and was trying to find the exact issue, but had issues getting things to launch properly on my end. |
|
Hey Jay, did a thorough test of this module and have some findings. Issues #1 and #3 are merge-blockers from my perspective — they're regressions vs both the old codex v4 install script and the merged claude-code v5 precedent. The rest are doc nits. Issue 1:
|
| # | Issue | Severity | Action |
|---|---|---|---|
| 1 | codex not symlinked / PATH not persisted |
Medium | Code fix needed — port ensure_codex_in_path pattern from claude-code v5 |
| 2 | PR description vs install_codex |
Low | Verify PR body is up to date |
| 3 | OPENAI_API_KEY env-var auth broken |
High | Code fix needed — restore auth.json seeding from old install.sh |
| 4 | AI Gateway + custom config | Low | Already documented, no fix |
| 6 | coder_app reconnect re-runs |
Low | README note |
Issues #1 and #3 are regressions vs both the old codex v4 script and the merged claude-code v5 module. Happy to help with the implementation if useful.
This comment was generated with the help of Coder Agents on behalf of @DevelopmentCats.
…auth.json seeding Issue 1: Port ensure_codex_in_path and add_path_to_shell_profiles from claude-code v5. Symlinks codex into CODER_SCRIPT_BIN_DIR and persists the binary dir to shell profiles (bash, zsh, fish). Called after both the install and skip-install branches. Issue 3: Restore add_auth_json from v4. Writes OPENAI_API_KEY to ~/.codex/auth.json when AI Gateway is not enabled, fixing 401s on Codex versions where the env var alone is insufficient. Issue 6: Add README note warning that coder_app commands re-execute on pane reconnect; recommend coder_script for one-shot prompts.
|
Addressed all findings from the testing review: Issue #1 (codex not on PATH): Fixed. Ported Issue #2 (PR description): Already fixed in an earlier commit; the PR body lists Issue #3 (OPENAI_API_KEY auth broken): Fixed. Restored Issue #4 (AI Gateway + custom config): Acknowledged, already documented in the README's Issue #6 (coder_app reconnect): Fixed. Added a All 15 bun tests and 11 terraform tests pass. This comment was generated by Coder Agents on behalf of @35C4n0r. |
I will go ahead and test again, but knock on wood we should be good to merge this, this morning. |
DevelopmentCats
left a comment
There was a problem hiding this comment.
Tested and Approved
matifali
left a comment
There was a problem hiding this comment.
a few nits i think we shoudla ddress and then good to merge.
…entCats review comments - Move migration guide from README to PR body, link from warning - Update codex_version example to 0.128.0 (latest) - Update config docs URL to developers.openai.com/codex/config-advanced - Restore AI Gateway docs link in References - Remove remaining tautological tftest assertions (DEREM-26) - Add custom-config-drops-reasoning-effort test (DEREM-29) - DEREM-31 already fixed by ensure_codex_in_path - DEREM-27/28 acknowledged, no change needed
…ude-code v5 (#885) Aligns codex module variable names with the claude-code v5 conventions established in #861 and #879. - Rename `additional_mcp_servers` to `mcp` to match claude-code's variable name. - Change `codex_version` default from `""` to `"latest"` to match `claude_code_version`. ## Type of Change - [ ] New module - [ ] New template - [x] Bug fix - [ ] Feature/enhancement - [ ] Documentation - [ ] Other ## Module Information **Path:** `registry/coder-labs/modules/codex` **Breaking change:** [x] Yes [ ] No > [!WARNING] > Breaking change for anyone referencing `additional_mcp_servers` by name. Since v5.0.0 was released and deleted on the same day (#879), this should have zero downstream impact. ## Testing & Validation - [x] Tests pass (`bun test`) - [x] Code formatted (`bun fmt`) - [x] Changes tested locally ## Related Issues - Follow-up to #879 - Filed #886 to track adding `mcp_config_remote_path` support to codex --- *This PR was authored by Coder Agents.*
Closes #878
What
Major refactor of the
coder-labs/codexmodule to mirror thecoder/claude-codev5 changes from #861.Changes
Structural
module "agentapi"withmodule "coder_utils"(registry.coder.com/coder/coder-utils/coder v0.0.1)scripts/install.shwithscripts/install.sh.tftpl(Terraform templatefile)scripts/start.sh.codex-moduleto.coder-modules/coder-labs/codextask_app_idtoscripts(ordered list of coder exp sync names)collectScripts,runScripts) intoagentapi/coder-utils-test-helpers.tsRemoved variables
All AgentAPI pass-throughs, boundary, and start-script-only variables:
order,group,report_tasks,subdomain,cli_app,web_app_display_name,cli_app_display_name,install_agentapi,agentapi_version,ai_prompt,continue,enable_state_persistence,codex_system_prompt,enable_boundary,boundary_config_path,boundary_version,compile_boundary_from_source,use_boundary_directly,codex_modelRetained
install_codex(toggle for skipping npm install when CLI is pre-installed)Renamed
enable_aibridge->enable_ai_gatewayChanged
workdir: now optional (default = null)openai_api_key: conditional env var withcount, markedsensitive = truebase_config_toml: heredoc description documenting generated defaults; notes thatmodel_reasoning_effortand workdir trust are only applied in default configconfig.toml: strippedsandbox_mode,approval_policy,sandbox_workspace_write,notice.model_migrationsARG_CODEX_VERSIONandARG_WORKDIRbase64-encoded to prevent shell/TOML injection[model_providers.aibridge]guarded with grep before appendingTests
collectScripts/runScriptspattern)model-reasoning-effort-standalone,ai-gateway-with-custom-base-config,ai-gateway-custom-config-no-duplicate-provider,install-codex-latest,workdir-trusted-project,no-workdir-no-project-sectionminimal-default-configDocs
base_config_tomlrequiring manualmodel_providerWarning
Breaking change. Drops support for Coder Tasks and Boundary. Keep using v4.x.x if you depend on them.
This PR was authored by Coder Agents.