🏥 CI Failure Investigation - Run #35342
Summary
make recompile plus the integration suites started failing after fix: pass shared temporary ID map (#15371) because the new strict-mode firewall validation refuses any custom network.allowed entries when the engine is copilot (no LLM gateway support).
Failure Details
- Run: 35342
- Commit: ec99734 (
fix: pass shared temporary ID map (#15371))
- Trigger: push
Root Cause Analysis
The commit rewrote pkg/workflow/strict_mode_validation.go with a validateStrictFirewall helper that now enforces the firewall sandbox for strict mode copilot/codex: it rejects network.allowed entries unless they map to built-in ecosystems (lines ~337‑378). The strict-mode tests in pkg/workflow/strict_mode_test.go still configure network.allowed: ["api.example.com"] while compiler strict mode is enabled, so compiler.CompileWorkflow now returns the new error: strict mode: engine 'copilot' does not support LLM gateway and requires network domains to be from known ecosystems (e.g., 'defaults', 'python', 'node'). Custom domains are not allowed for security.
Failed Jobs and Errors
build → ./gh-aw compile --validate --verbose --purge --stats exits with compilation failed because the workflow validation now enforces the new strict-mode firewall rules.
Integration: Workflow GitHub & Git → TestStrictModeAllowsGitHubWorkflowExpression variants all fail with the same copilot firewall rejection (see strict_mode_test.go:865‑935).
Integration: Workflow Misc Part 2 → TestStrictModeNetwork and the TestStrictModeBashTools variants fail for the same reason (logs around strict_mode_test.go:364 and strict_mode_test.go:593).
Integration: Workflow Tools & MCP → identical copilot strict-mode network/domain failures.
Investigation Findings
Every failure traces back to the new firewall validation rejecting network.allowed lists that include api.example.com when compiler.SetStrictMode(true) runs. The integration suites exercise those test helpers, so the new security guard rails are currently incompatible with the existing strict-mode regression tests.
Recommended Actions
Prevention Strategies
Any future modification to strict-mode firewall policy needs to include an audit of network.allowed test fixtures and sample workflows to ensure they still comply with the updated list of allowed ecosystems. Adding a targeted test for the new error message will also keep the regression visible on the next run.
AI Team Self-Improvement
When editing strict-mode validation logic, remind future AI prompts to cross-check pkg/workflow/strict_mode_test.go and any example workflows for custom network.allowed entries and update them to the known ecosystems (defaults, python, node, etc.) before landing the change.
Historical Context
This is the first time the validation has failed in CI because it landed in fix: pass shared temporary ID map (#15371); there were no prior failures of this pattern on main.
AI generated by CI Failure Doctor
To add this workflow in your repository, run gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. See usage guide.
🏥 CI Failure Investigation - Run #35342
Summary
make recompileplus the integration suites started failing afterfix: pass shared temporary ID map (#15371)because the new strict-mode firewall validation refuses any customnetwork.allowedentries when the engine iscopilot(no LLM gateway support).Failure Details
fix: pass shared temporary ID map (#15371))Root Cause Analysis
The commit rewrote
pkg/workflow/strict_mode_validation.gowith avalidateStrictFirewallhelper that now enforces the firewall sandbox for strict mode copilot/codex: it rejectsnetwork.allowedentries unless they map to built-in ecosystems (lines ~337‑378). The strict-mode tests inpkg/workflow/strict_mode_test.gostill configurenetwork.allowed: ["api.example.com"]while compiler strict mode is enabled, socompiler.CompileWorkflownow returns the new error:strict mode: engine 'copilot' does not support LLM gateway and requires network domains to be from known ecosystems (e.g., 'defaults', 'python', 'node'). Custom domains are not allowed for security.Failed Jobs and Errors
build→./gh-aw compile --validate --verbose --purge --statsexits withcompilation failedbecause the workflow validation now enforces the new strict-mode firewall rules.Integration: Workflow GitHub & Git→TestStrictModeAllowsGitHubWorkflowExpressionvariants all fail with the samecopilotfirewall rejection (seestrict_mode_test.go:865‑935).Integration: Workflow Misc Part 2→TestStrictModeNetworkand theTestStrictModeBashToolsvariants fail for the same reason (logs aroundstrict_mode_test.go:364andstrict_mode_test.go:593).Integration: Workflow Tools & MCP→ identicalcopilotstrict-mode network/domain failures.Investigation Findings
Every failure traces back to the new firewall validation rejecting
network.allowedlists that includeapi.example.comwhencompiler.SetStrictMode(true)runs. The integration suites exercise those test helpers, so the new security guard rails are currently incompatible with the existing strict-mode regression tests.Recommended Actions
pkg/workflow/strict_mode_test.go) to use known ecosystems/defaults instead of a customapi.example.comdomain so that the tests stay green under the new firewall rules.validateStrictFirewallpath rejects custom domains so the behavior stays visible when the policy changes again.Prevention Strategies
Any future modification to strict-mode firewall policy needs to include an audit of
network.allowedtest fixtures and sample workflows to ensure they still comply with the updated list of allowed ecosystems. Adding a targeted test for the new error message will also keep the regression visible on the next run.AI Team Self-Improvement
When editing strict-mode validation logic, remind future AI prompts to cross-check
pkg/workflow/strict_mode_test.goand any example workflows for customnetwork.allowedentries and update them to the known ecosystems (defaults,python,node, etc.) before landing the change.Historical Context
This is the first time the validation has failed in CI because it landed in
fix: pass shared temporary ID map (#15371); there were no prior failures of this pattern onmain.