Skip to content

feat: add chaos() API for HTTP-layer fault injection#2119

Open
Copilot wants to merge 14 commits into
mainfrom
copilot/add-chaos-api-for-http-layer
Open

feat: add chaos() API for HTTP-layer fault injection#2119
Copilot wants to merge 14 commits into
mainfrom
copilot/add-chaos-api-for-http-layer

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 25, 2026

Summary

Implements the chaos() API for HTTP-layer fault injection and updates runtime wiring so chaos rules are managed by a single server-level ChaosRegistry shared across all API runners (including multi-API mode). The REPL chaos() global now affects all configured API groups consistently.

Follow-up feedback updates were also applied:

  • Content-Type is explicitly protected from chaos header mutation APIs (header() / removeHeader() ignore it).
  • probability(value) now throws a RangeError when the value is outside 0..1 (including NaN).
  • timeout() was removed from the chaos API to avoid long-running timeout behavior; use delay(ms) for timeout-like scenarios instead.
  • Chaos docs now include:
    • a dedicated fault simulation pattern in the API reference, and
    • a new usage pattern page under docs/patterns explaining how to test fault scenarios with chaos().
Original Prompt

Implement a chaos() API for injecting HTTP-layer faults into simulated API responses, expose it in the REPL, and support rule matching/lifecycle behavior (next/always/probability/status/delay/timeout/header/removeHeader/body/transformBody/stop/start) with deterministic rule selection and comprehensive test coverage.

Manual acceptance tests

  • Start the REPL with yarn go:example, run chaos("/pets").next(3).status(503), and verify the next 3 /pets* requests return HTTP 503.
  • In REPL, run chaos().always().delay(2000) and verify responses are delayed by about 2 seconds.
  • In REPL, create const f = chaos("/pets").always().status(500), verify /pets* returns 500, then run f.stop() and verify /pets* returns normal status again.
  • In a multi-API setup, run chaos().always().status(503) once in REPL and verify requests across different API groups all return 503 (shared server-level registry behavior).
  • In REPL, run chaos("/pets").always().probability(1.5) (and chaos("/pets").always().probability(NaN)) and verify an out-of-range probability error is thrown.
  • Open docs/patterns/index.md and verify it links to Test Fault Scenarios with Chaos Rules, then open that page and verify it documents intermittent failures (probability), bounded failures (next(count)), and stopping rules (stop()).

Tasks

  • Added ChaosRule and ChaosRegistry in src/server/chaos.ts with fluent rule configuration and deterministic rule selection.
  • Integrated chaos application into Dispatcher.request() after normal response processing.
  • Refactored runtime wiring so counterfact() creates one ChaosRegistry and injects it into all ApiRunner instances and the REPL.
  • Updated tests to validate shared-registry behavior across multi-API groups, including REPL-driven chaos affecting all groups.
  • Added guards so header("content-type", ...) and removeHeader("content-type") are ignored.
  • Added probability range validation (0..1) in ChaosRule.probability() with error-throwing behavior for invalid inputs.
  • Removed timeout() from ChaosRule and deleted internal max-delay timeout behavior.
  • Added/updated chaos tests covering invalid probability values and removed timeout-specific tests.
  • Updated docs with Chaos API reference changes, including the Content-Type mutation limitation, timeout API removal, and a reusable fault simulation pattern.
  • Added a new docs pattern page at docs/patterns/test-fault-scenarios-with-chaos.md and linked it from docs/patterns/index.md.

Copilot AI added 2 commits May 25, 2026 00:51
- Add ChaosRule and ChaosRegistry classes in src/server/chaos.ts
- Integrate ChaosRegistry into Dispatcher (applies rules post-response)
- Create ChaosRegistry in ApiRunner and pass it to Dispatcher
- Expose chaos() global in REPL via startRepl parameter
- Thread chaosRegistry from app.ts -> startReplServer
- Add 48 tests covering all required scenarios in test/server/chaos.test.ts
- Document the chaos API in docs/reference.md

Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/1b9bc7aa-b2e1-4090-9a24-b1165b72d855
- Improve CHAOS_TIMEOUT_DELAY_MS comment explaining Node.js 32-bit int constraint
- Replace `any` body type in ChaosRule.tryApply with `unknown` + explicit cast
- Extract PROBABILITY_TEST_ITERATIONS constant in test file

Agent-Logs-Url: https://github.com/counterfact/api-simulator/sessions/1b9bc7aa-b2e1-4090-9a24-b1165b72d855
Copilot AI requested review from Copilot and removed request for Copilot May 25, 2026 00:55
Copilot AI requested a review from pmcelhaney May 25, 2026 00:56
@pmcelhaney pmcelhaney marked this pull request as ready for review May 25, 2026 00:59
Copilot AI review requested due to automatic review settings May 25, 2026 00:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new HTTP-layer fault-injection facility (“chaos rules”) to Counterfact, wiring it through the runtime dispatcher and exposing it as a REPL global for interactive testing of failure scenarios.

Changes:

  • Added ChaosRule / ChaosRegistry with fluent configuration (status, delay/timeout, headers, body transforms) and deterministic “most recently updated” precedence.
  • Integrated chaos rule application into Dispatcher.request() and plumbed a registry from ApiRunner through to the REPL.
  • Added comprehensive unit/integration tests and documented the new Chaos API in the reference docs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/server/chaos.test.ts New test suite covering rule behavior, registry selection, and dispatcher integration.
src/server/dispatcher.ts Accepts an optional ChaosRegistry and applies a selected rule to responses (with optional delay).
src/server/chaos.ts New implementation of ChaosRule + ChaosRegistry and exported timeout delay constant.
src/repl/repl.ts Adds optional ChaosRegistry to REPL startup and exposes chaos() global when present.
src/app.ts Passes the runner’s ChaosRegistry into REPL startup.
src/api-runner.ts Instantiates a ChaosRegistry and injects it into the Dispatcher.
docs/reference.md Adds a Chaos API reference section, usage, semantics, and examples.

Comment thread src/server/dispatcher.ts
Comment on lines +708 to +725
// Apply chaos rules after normal response processing.
if (this.chaosRegistry !== undefined) {
const rule = this.chaosRegistry.findBestMatch(path);

if (rule !== undefined) {
const chaosResult = rule.tryApply(normalizedResponse);

if (chaosResult !== null) {
if (chaosResult.delayMs !== undefined && chaosResult.delayMs > 0) {
await new Promise<void>((resolve) => {
setTimeout(resolve, chaosResult.delayMs);
});
}

return chaosResult.response;
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot to be safe let's just have a special case that contentType cannot be modified or removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented: chaos().header("content-type", ...) and chaos().removeHeader("content-type") are now ignored, with tests and docs updated accordingly. Addressed in commits 20fc1df and 831a623.

Comment thread src/server/chaos.ts
Comment on lines +264 to +304
// Apply header modifications.
const headers: CounterfactResponseObject["headers"] = {
...(response.headers ?? {}),
};

for (const [name, value] of this._headers) {
headers[name] = value;
}

for (const name of this._removedHeaders) {
delete headers[name];
}

// Apply body modifications.
// The body is typed as unknown here because chaos rules may replace it
// with any value (including plain objects that Koa will serialize to JSON).
// The cast to CounterfactResponseObject['body'] is applied below when
// setting the result property.
let body: unknown = response.body;

if (this._body !== UNSET) {
body = this._body;
} else if (this._transformBody !== undefined) {
body = this._transformBody(body);
}

const result: CounterfactResponseObject = {
...response,
// Cast is safe: Koa serializes object bodies to JSON at the middleware level.
body: body as CounterfactResponseObject["body"],
headers,
};

if (this._status !== undefined) {
result.status = this._status;
}

const delayMs = this._isTimeout ? CHAOS_TIMEOUT_DELAY_MS : this._delay;

return { response: result, delayMs };
}
Comment thread src/repl/repl.ts
? Object.fromEntries(groupedBindings.map((binding) => [binding.key, {}]))
: {};

if (chaosRegistry !== undefined) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, again there should be a single chaosRegistry at the server level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed and already wired this way: there is a single server-level ChaosRegistry shared across all runners and the REPL. Addressed in commit 395d262, with cross-group behavior covered in 81e0ece.

Comment thread src/app.ts
Comment on lines 389 to 398
primaryRunner.scenarioRegistry,
runners.map((runner) => ({
contextRegistry: runner.contextRegistry,
group: runner.group,
openApiDocument: runner.openApiDocument,
registry: runner.registry,
scenarioRegistry: runner.scenarioRegistry,
})),
primaryRunner.chaosRegistry,
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot the chaosRegistry shouldn't be tied to a runner. There should be one registry at the server level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this by moving chaos registry ownership to the server level in counterfact() and sharing a single instance across all ApiRunner dispatchers plus the REPL binding. Also added a multi-API test proving REPL chaos() affects all groups. Commits: 395d262 and 81e0ece.

Comment thread docs/reference.md Outdated
Comment on lines +402 to +406
| `.header(name, value)` | Set or replace a response header. |
| `.removeHeader(name)` | Remove a response header if present. |
| `.body(value)` | Replace the response body. |
| `.transformBody(fn)` | Transform the response body: `fn` receives the current body and returns the new one. |

Copy link
Copy Markdown
Collaborator

@pmcelhaney pmcelhaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See copilot's comments and my responses.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment thread src/server/chaos.ts
Comment on lines +186 to +189
this._removedHeaders.add(name);
this.touch();
return this;
}
Comment thread src/server/chaos.ts
Comment on lines +120 to +124
public probability(value: number): this {
this._probability = value;
this.touch();
return this;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot throw an error if it's outside of range

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 8e7af9d: chaos().probability(value) now throws a RangeError when value is outside [0, 1] (including NaN), and I added unit coverage for invalid inputs in test/server/chaos.test.ts.

Comment thread test/server/chaos.test.ts
Comment on lines +556 to +564
const cr = new ChaosRegistry();
cr.createRule("/orders").always().delay(2_000);
const dispatcher = makeDispatcher(cr);

const response = await get(dispatcher, "/orders");
expect(response.status).toBe(200);
expect(delays).toContain(2_000);

jest.restoreAllMocks();
Copilot AI requested a review from pmcelhaney May 25, 2026 15:12
Copy link
Copy Markdown
Collaborator

@pmcelhaney pmcelhaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot get rid of timeout(). It could leave CI processes hanging for days. Users can use delay() with a large enough value to simulate timeout scenarios.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 25, 2026

@copilot get rid of timeout(). It could leave CI processes hanging for days. Users can use delay() with a large enough value to simulate timeout scenarios.

Removed in 4f2a2d6: ChaosRule.timeout() and its internal max-delay behavior are gone, related tests were removed, and the reference docs now describe using delay(ms) only. No UI changes in this update.

Copilot AI requested a review from pmcelhaney May 25, 2026 15:23
Copy link
Copy Markdown
Collaborator

@pmcelhaney pmcelhaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Add a pattern to the docs for simulating faults using chaos().

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 25, 2026

@copilot Add a pattern to the docs for simulating faults using chaos().

Added in 08e3556: the Chaos API docs now include a dedicated “Fault simulation pattern” section showing how to simulate intermittent faults with chaos() using always(), probability(0.2), status(503), and Retry-After. No UI changes in this update.

Copilot AI requested a review from pmcelhaney May 25, 2026 22:15
@pmcelhaney
Copy link
Copy Markdown
Collaborator

@copilot add a new pattern under docs/patterns explained how to test fault scenarios using the chaos() API

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 25, 2026

@copilot add a new pattern under docs/patterns explained how to test fault scenarios using the chaos() API

Added in e24f997: I created a new pattern at docs/patterns/test-fault-scenarios-with-chaos.md explaining how to test fault scenarios with chaos() (intermittent failures, bounded outages, and stopping rules), and linked it from docs/patterns/index.md. No UI changes in this update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants