feat: add chaos() API for HTTP-layer fault injection#2119
Conversation
- Add ChaosRule and ChaosRegistry classes in src/server/chaos.ts - Integrate ChaosRegistry into Dispatcher (applies rules post-response) - Create ChaosRegistry in ApiRunner and pass it to Dispatcher - Expose chaos() global in REPL via startRepl parameter - Thread chaosRegistry from app.ts -> startReplServer - Add 48 tests covering all required scenarios in test/server/chaos.test.ts - Document the chaos API in docs/reference.md Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/1b9bc7aa-b2e1-4090-9a24-b1165b72d855
- Improve CHAOS_TIMEOUT_DELAY_MS comment explaining Node.js 32-bit int constraint - Replace `any` body type in ChaosRule.tryApply with `unknown` + explicit cast - Extract PROBABILITY_TEST_ITERATIONS constant in test file Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/1b9bc7aa-b2e1-4090-9a24-b1165b72d855
There was a problem hiding this comment.
Pull request overview
Adds a new HTTP-layer fault-injection facility (“chaos rules”) to Counterfact, wiring it through the runtime dispatcher and exposing it as a REPL global for interactive testing of failure scenarios.
Changes:
- Added
ChaosRule/ChaosRegistrywith fluent configuration (status, delay/timeout, headers, body transforms) and deterministic “most recently updated” precedence. - Integrated chaos rule application into
Dispatcher.request()and plumbed a registry fromApiRunnerthrough to the REPL. - Added comprehensive unit/integration tests and documented the new Chaos API in the reference docs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/server/chaos.test.ts | New test suite covering rule behavior, registry selection, and dispatcher integration. |
| src/server/dispatcher.ts | Accepts an optional ChaosRegistry and applies a selected rule to responses (with optional delay). |
| src/server/chaos.ts | New implementation of ChaosRule + ChaosRegistry and exported timeout delay constant. |
| src/repl/repl.ts | Adds optional ChaosRegistry to REPL startup and exposes chaos() global when present. |
| src/app.ts | Passes the runner’s ChaosRegistry into REPL startup. |
| src/api-runner.ts | Instantiates a ChaosRegistry and injects it into the Dispatcher. |
| docs/reference.md | Adds a Chaos API reference section, usage, semantics, and examples. |
| // Apply chaos rules after normal response processing. | ||
| if (this.chaosRegistry !== undefined) { | ||
| const rule = this.chaosRegistry.findBestMatch(path); | ||
|
|
||
| if (rule !== undefined) { | ||
| const chaosResult = rule.tryApply(normalizedResponse); | ||
|
|
||
| if (chaosResult !== null) { | ||
| if (chaosResult.delayMs !== undefined && chaosResult.delayMs > 0) { | ||
| await new Promise<void>((resolve) => { | ||
| setTimeout(resolve, chaosResult.delayMs); | ||
| }); | ||
| } | ||
|
|
||
| return chaosResult.response; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
@copilot to be safe let's just have a special case that contentType cannot be modified or removed.
| // Apply header modifications. | ||
| const headers: CounterfactResponseObject["headers"] = { | ||
| ...(response.headers ?? {}), | ||
| }; | ||
|
|
||
| for (const [name, value] of this._headers) { | ||
| headers[name] = value; | ||
| } | ||
|
|
||
| for (const name of this._removedHeaders) { | ||
| delete headers[name]; | ||
| } | ||
|
|
||
| // Apply body modifications. | ||
| // The body is typed as unknown here because chaos rules may replace it | ||
| // with any value (including plain objects that Koa will serialize to JSON). | ||
| // The cast to CounterfactResponseObject['body'] is applied below when | ||
| // setting the result property. | ||
| let body: unknown = response.body; | ||
|
|
||
| if (this._body !== UNSET) { | ||
| body = this._body; | ||
| } else if (this._transformBody !== undefined) { | ||
| body = this._transformBody(body); | ||
| } | ||
|
|
||
| const result: CounterfactResponseObject = { | ||
| ...response, | ||
| // Cast is safe: Koa serializes object bodies to JSON at the middleware level. | ||
| body: body as CounterfactResponseObject["body"], | ||
| headers, | ||
| }; | ||
|
|
||
| if (this._status !== undefined) { | ||
| result.status = this._status; | ||
| } | ||
|
|
||
| const delayMs = this._isTimeout ? CHAOS_TIMEOUT_DELAY_MS : this._delay; | ||
|
|
||
| return { response: result, delayMs }; | ||
| } |
| ? Object.fromEntries(groupedBindings.map((binding) => [binding.key, {}])) | ||
| : {}; | ||
|
|
||
| if (chaosRegistry !== undefined) { |
There was a problem hiding this comment.
@copilot, again there should be a single chaosRegistry at the server level.
| primaryRunner.scenarioRegistry, | ||
| runners.map((runner) => ({ | ||
| contextRegistry: runner.contextRegistry, | ||
| group: runner.group, | ||
| openApiDocument: runner.openApiDocument, | ||
| registry: runner.registry, | ||
| scenarioRegistry: runner.scenarioRegistry, | ||
| })), | ||
| primaryRunner.chaosRegistry, | ||
| ), |
There was a problem hiding this comment.
@copilot the chaosRegistry shouldn't be tied to a runner. There should be one registry at the server level.
There was a problem hiding this comment.
| | `.header(name, value)` | Set or replace a response header. | | ||
| | `.removeHeader(name)` | Remove a response header if present. | | ||
| | `.body(value)` | Replace the response body. | | ||
| | `.transformBody(fn)` | Transform the response body: `fn` receives the current body and returns the new one. | | ||
|
|
pmcelhaney
left a comment
There was a problem hiding this comment.
See copilot's comments and my responses.
| this._removedHeaders.add(name); | ||
| this.touch(); | ||
| return this; | ||
| } |
| public probability(value: number): this { | ||
| this._probability = value; | ||
| this.touch(); | ||
| return this; | ||
| } |
There was a problem hiding this comment.
@copilot throw an error if it's outside of range
There was a problem hiding this comment.
Implemented in 8e7af9d: chaos().probability(value) now throws a RangeError when value is outside [0, 1] (including NaN), and I added unit coverage for invalid inputs in test/server/chaos.test.ts.
| const cr = new ChaosRegistry(); | ||
| cr.createRule("/orders").always().delay(2_000); | ||
| const dispatcher = makeDispatcher(cr); | ||
|
|
||
| const response = await get(dispatcher, "/orders"); | ||
| expect(response.status).toBe(200); | ||
| expect(delays).toContain(2_000); | ||
|
|
||
| jest.restoreAllMocks(); |
pmcelhaney
left a comment
There was a problem hiding this comment.
@copilot get rid of timeout(). It could leave CI processes hanging for days. Users can use delay() with a large enough value to simulate timeout scenarios.
Removed in 4f2a2d6: |
pmcelhaney
left a comment
There was a problem hiding this comment.
@copilot Add a pattern to the docs for simulating faults using chaos().
Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/29a1c8e4-af2b-414e-b266-0b4dfcefcfc5 Co-authored-by: pmcelhaney <51504+pmcelhaney@users.noreply.github.com>
Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/29a1c8e4-af2b-414e-b266-0b4dfcefcfc5 Co-authored-by: pmcelhaney <51504+pmcelhaney@users.noreply.github.com>
Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/29a1c8e4-af2b-414e-b266-0b4dfcefcfc5 Co-authored-by: pmcelhaney <51504+pmcelhaney@users.noreply.github.com>
Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/29a1c8e4-af2b-414e-b266-0b4dfcefcfc5 Co-authored-by: pmcelhaney <51504+pmcelhaney@users.noreply.github.com>
Added in 08e3556: the Chaos API docs now include a dedicated “Fault simulation pattern” section showing how to simulate intermittent faults with |
|
@copilot add a new pattern under docs/patterns explained how to test fault scenarios using the chaos() API |
Added in e24f997: I created a new pattern at |
Summary
Implements the
chaos()API for HTTP-layer fault injection and updates runtime wiring so chaos rules are managed by a single server-levelChaosRegistryshared across all API runners (including multi-API mode). The REPLchaos()global now affects all configured API groups consistently.Follow-up feedback updates were also applied:
Content-Typeis explicitly protected from chaos header mutation APIs (header()/removeHeader()ignore it).probability(value)now throws aRangeErrorwhen the value is outside0..1(includingNaN).timeout()was removed from the chaos API to avoid long-running timeout behavior; usedelay(ms)for timeout-like scenarios instead.docs/patternsexplaining how to test fault scenarios withchaos().Original Prompt
Implement a
chaos()API for injecting HTTP-layer faults into simulated API responses, expose it in the REPL, and support rule matching/lifecycle behavior (next/always/probability/status/delay/timeout/header/removeHeader/body/transformBody/stop/start) with deterministic rule selection and comprehensive test coverage.Manual acceptance tests
yarn go:example, runchaos("/pets").next(3).status(503), and verify the next 3/pets*requests return HTTP 503.chaos().always().delay(2000)and verify responses are delayed by about 2 seconds.const f = chaos("/pets").always().status(500), verify/pets*returns 500, then runf.stop()and verify/pets*returns normal status again.chaos().always().status(503)once in REPL and verify requests across different API groups all return 503 (shared server-level registry behavior).chaos("/pets").always().probability(1.5)(andchaos("/pets").always().probability(NaN)) and verify an out-of-range probability error is thrown.docs/patterns/index.mdand verify it links toTest Fault Scenarios with Chaos Rules, then open that page and verify it documents intermittent failures (probability), bounded failures (next(count)), and stopping rules (stop()).Tasks
ChaosRuleandChaosRegistryinsrc/server/chaos.tswith fluent rule configuration and deterministic rule selection.Dispatcher.request()after normal response processing.counterfact()creates oneChaosRegistryand injects it into allApiRunnerinstances and the REPL.header("content-type", ...)andremoveHeader("content-type")are ignored.0..1) inChaosRule.probability()with error-throwing behavior for invalid inputs.timeout()fromChaosRuleand deleted internal max-delay timeout behavior.docs/patterns/test-fault-scenarios-with-chaos.mdand linked it fromdocs/patterns/index.md.