From eb92b9d75fa8841746c6c7e81c0c86046c91d466 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Tue, 17 Mar 2026 23:50:07 +0000
Subject: [PATCH] =?UTF-8?q?docs:=20rename=20assert=20=E2=86=92=20assertion?=
 =?UTF-8?q?s=20across=20all=20documentation?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Aligns documentation with the assert: → assertions: YAML key rename
completed in PR #604. Updates prose references, YAML examples, table
entries, SDK code samples, skill docs, and agent prompts across 17 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 README.md                                     |  2 +-
 apps/cli/README.md                            |  2 +-
 .../content/docs/evaluation/eval-cases.mdx    | 38 +++++++++----------
 .../content/docs/evaluation/eval-files.mdx    | 12 +++---
 .../src/content/docs/evaluation/examples.mdx  |  2 +-
 .../src/content/docs/evaluation/rubrics.mdx   |  4 +-
 .../content/docs/evaluation/running-evals.mdx |  2 +-
 apps/web/src/content/docs/evaluation/sdk.mdx  |  2 +-
 .../src/content/docs/evaluators/composite.mdx |  4 +-
 .../docs/evaluators/custom-evaluators.mdx     |  6 +--
 .../content/docs/evaluators/llm-graders.mdx   |  6 +--
 .../docs/guides/agent-skills-evals.mdx        |  2 +-
 .../src/content/docs/guides/human-review.mdx  |  2 +-
 apps/web/src/content/docs/tools/convert.mdx   |  2 +-
 plugins/agentv-dev/agents/eval-analyzer.md    |  2 +-
 .../skills/agentv-eval-analyzer/SKILL.md      |  2 +-
 .../skills/agentv-eval-writer/SKILL.md        |  8 ++--
 17 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/README.md b/README.md
index 887765a98..0a160be47 100644
--- a/README.md
+++ b/README.md
@@ -309,7 +309,7 @@ const { results, summary } = await evaluate({
     {
       id: 'greeting',
       input: 'Say hello',
-      assert: [{ type: 'contains', value: 'Hello' }],
+      assertions: [{ type: 'contains', value: 'Hello' }],
     },
   ],
 });
diff --git a/apps/cli/README.md b/apps/cli/README.md
index 887765a98..0a160be47 100644
--- a/apps/cli/README.md
+++ b/apps/cli/README.md
@@ -309,7 +309,7 @@ const { results, summary } = await evaluate({
     {
       id: 'greeting',
       input: 'Say hello',
-      assert: [{ type: 'contains', value: 'Hello' }],
+      assertions: [{ type: 'contains', value: 'Hello' }],
     },
   ],
 });
diff --git a/apps/web/src/content/docs/evaluation/eval-cases.mdx b/apps/web/src/content/docs/evaluation/eval-cases.mdx
index a6ca3c705..ae24420c2 100644
--- a/apps/web/src/content/docs/evaluation/eval-cases.mdx
+++ b/apps/web/src/content/docs/evaluation/eval-cases.mdx
@@ -31,7 +31,7 @@ tests:
 | `workspace` | No | Per-case workspace config (overrides suite-level) |
 | `metadata` | No | Arbitrary key-value pairs passed to evaluators and workspace scripts |
 | `rubrics` | No | Structured evaluation criteria |
-| `assert` | No | Per-test evaluators |
+| `assertions` | No | Per-test evaluators |
 
 ## Input
 
@@ -87,7 +87,7 @@ tests:
         prompt: ./graders/depth.md
 ```
 
-Per-case `assert` evaluators are **merged** with root-level `assert` evaluators — test-specific evaluators run first, then root-level defaults are appended. To opt out of root-level defaults for a specific test, set `execution.skip_defaults: true`:
+Per-case `assertions` evaluators are **merged** with root-level `assertions` evaluators — test-specific evaluators run first, then root-level defaults are appended. To opt out of root-level defaults for a specific test, set `execution.skip_defaults: true`:
 
 ```yaml
 assertions:
@@ -99,7 +99,7 @@ tests:
   - id: normal-case
     criteria: Returns correct answer
     input: What is 2+2?
-    # Gets latency_check from root-level assert
+    # Gets latency_check from root-level assertions
 
   - id: special-case
     criteria: Handles edge case
@@ -161,7 +161,7 @@ The `metadata` field is included in the stdin JSON passed to lifecycle commands
 
 ## Per-Test Assertions
 
-The `assert` field defines evaluators directly on a test. It supports both deterministic assertion types and LLM-based rubric evaluation.
+The `assertions` field defines evaluators directly on a test. It supports both deterministic assertion types and LLM-based rubric evaluation.
 
 ### Deterministic Assertions
 
@@ -217,7 +217,7 @@ tests:
 
 ### Required Gates
 
-Any evaluator in `assert` can be marked as `required`. When a required evaluator fails, the overall test verdict is `fail` regardless of the aggregate score.
+Any evaluator in `assertions` can be marked as `required`. When a required evaluator fails, the overall test verdict is `fail` regardless of the aggregate score.
 
 | Value | Behavior |
 |-------|----------|
@@ -239,39 +239,39 @@ assertions:
 
 Required gates are evaluated after all evaluators run. If any required evaluator falls below its threshold, the verdict is forced to `fail`.
 
-### Assert Merge Behavior
+### Assertions Merge Behavior
 
-`assert` can be defined at both suite and test levels:
+`assertions` can be defined at both suite and test levels:
 
-- Per-test `assert` evaluators run first.
-- Suite-level `assert` evaluators are appended automatically.
+- Per-test `assertions` evaluators run first.
+- Suite-level `assertions` evaluators are appended automatically.
 - Set `execution.skip_defaults: true` on a test to skip suite-level defaults.
 
-## How `criteria` and `assert` Interact
+## How `criteria` and `assertions` Interact
 
-The `criteria` field is a **data field** that describes what the response should accomplish. It is not an evaluator itself — how it gets used depends on whether `assert` is present.
+The `criteria` field is a **data field** that describes what the response should accomplish. It is not an evaluator itself — how it gets used depends on whether `assertions` is present.
 
-### No `assert` — implicit LLM grader
+### No `assertions` — implicit LLM grader
 
-When a test has no `assert` field, a default `llm-grader` evaluator runs automatically and uses `criteria` as the evaluation prompt:
+When a test has no `assertions` field, a default `llm-grader` evaluator runs automatically and uses `criteria` as the evaluation prompt:
 
 ```yaml
 tests:
   - id: simple-eval
     criteria: Assistant correctly explains the bug and proposes a fix
     input: "Debug this function..."
-    # No assert → default llm-grader evaluates against criteria
+    # No assertions → default llm-grader evaluates against criteria
 ```
 
-### `assert` present — explicit evaluators only
+### `assertions` present — explicit evaluators only
 
-When `assert` is defined, only the declared evaluators run. No implicit grader is added. Graders that are declared (such as `llm-grader`, `code-grader`, or `rubrics`) receive `criteria` as input automatically.
+When `assertions` is defined, only the declared evaluators run. No implicit grader is added. Graders that are declared (such as `llm-grader`, `code-grader`, or `rubrics`) receive `criteria` as input automatically.
 
-If `assert` contains only deterministic evaluators (like `contains` or `regex`), the `criteria` field is not evaluated and a warning is emitted:
+If `assertions` contains only deterministic evaluators (like `contains` or `regex`), the `criteria` field is not evaluated and a warning is emitted:
 
 ```
-Warning: Test 'my-test': criteria is defined but no evaluator in assert
-will evaluate it. Add 'type: llm-grader' to assert, or remove criteria
+Warning: Test 'my-test': criteria is defined but no evaluator in assertions
+will evaluate it. Add 'type: llm-grader' to assertions, or remove criteria
 if it is documentation-only.
 ```
 
diff --git a/apps/web/src/content/docs/evaluation/eval-files.mdx b/apps/web/src/content/docs/evaluation/eval-files.mdx
index 4e9c010e5..0869163e9 100644
--- a/apps/web/src/content/docs/evaluation/eval-files.mdx
+++ b/apps/web/src/content/docs/evaluation/eval-files.mdx
@@ -37,7 +37,7 @@ tests:
 | `execution` | Default execution config (`target`, `fail_on_error`, etc.) |
 | `workspace` | Suite-level workspace config — inline object or string path to an [external workspace file](/guides/workspace-pool/#external-workspace-config) |
 | `tests` | Array of individual tests, or a string path to an external file |
-| `assert` | Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test |
+| `assertions` | Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test |
 | `input` | Suite-level input messages prepended to each test's input unless `execution.skip_defaults: true` is set on the test |
 
 ### Metadata Fields
@@ -70,9 +70,9 @@ tests:
     input: Screen "Acme Corp" against denied parties list
 ```
 
-### Suite-level Assert
+### Suite-level Assertions
 
-The `assert` field is the canonical way to define suite-level evaluators. Suite-level assertions are appended to every test's evaluators unless a test sets `execution.skip_defaults: true`.
+The `assertions` field is the canonical way to define suite-level evaluators. Suite-level assertions are appended to every test's evaluators unless a test sets `execution.skip_defaults: true`.
 
 ```yaml
 description: API response validation
@@ -88,11 +88,11 @@ tests:
     input: Check API health
 ```
 
-`assert` supports all evaluator types, including deterministic assertion types (`contains`, `regex`, `is_json`, `equals`) and `rubrics`. See [Tests](/evaluation/eval-cases/#per-test-assertions) for per-test assert usage.
+`assertions` supports all evaluator types, including deterministic assertion types (`contains`, `regex`, `is_json`, `equals`) and `rubrics`. See [Tests](/evaluation/eval-cases/#per-test-assertions) for per-test assertions usage.
 
 ### Suite-level Input
 
-The `input` field defines messages that are **prepended** to every test's input. This avoids repeating the same prompt or system context in each test case — following the same pattern as suite-level `assert`.
+The `input` field defines messages that are **prepended** to every test's input. This avoids repeating the same prompt or system context in each test case — following the same pattern as suite-level `assertions`.
 
 ```yaml
 description: Travel assistant evaluation
@@ -119,7 +119,7 @@ Suite-level `input` accepts the same formats as test-level `input`:
 - **String** — wrapped as `[{ role: "user", content: "..." }]`
 - **Message array** — used as-is, including file references
 
-To opt out for a specific test, set `execution.skip_defaults: true` (same flag that skips suite-level `assert`).
+To opt out for a specific test, set `execution.skip_defaults: true` (same flag that skips suite-level `assertions`).
 
 ### Suite-level Input Files
 
diff --git a/apps/web/src/content/docs/evaluation/examples.mdx b/apps/web/src/content/docs/evaluation/examples.mdx
index 655955418..d7c33e7e6 100644
--- a/apps/web/src/content/docs/evaluation/examples.mdx
+++ b/apps/web/src/content/docs/evaluation/examples.mdx
@@ -343,7 +343,7 @@ tests:
 
 ## Suite-level Input
 
-Share a common prompt or system instruction across all tests. Suite-level `input` messages are prepended to each test's input — like suite-level `assert` for evaluators:
+Share a common prompt or system instruction across all tests. Suite-level `input` messages are prepended to each test's input — like suite-level `assertions` for evaluators:
 
 ```yaml
 description: Travel assistant evaluation
diff --git a/apps/web/src/content/docs/evaluation/rubrics.mdx b/apps/web/src/content/docs/evaluation/rubrics.mdx
index 362a50d00..6f03d7c8c 100644
--- a/apps/web/src/content/docs/evaluation/rubrics.mdx
+++ b/apps/web/src/content/docs/evaluation/rubrics.mdx
@@ -5,11 +5,11 @@ sidebar:
   order: 3
 ---
 
-Rubrics are defined with `assert` entries and support binary checklist grading and score-range analytic grading.
+Rubrics are defined with `assertions` entries and support binary checklist grading and score-range analytic grading.
 
 ## Basic Usage
 
-The simplest form — list plain strings in `assert` and each one becomes a required criterion:
+The simplest form — list plain strings in `assertions` and each one becomes a required criterion:
 
 ```yaml
 tests:
diff --git a/apps/web/src/content/docs/evaluation/running-evals.mdx b/apps/web/src/content/docs/evaluation/running-evals.mdx
index 924c7ded3..9a565cfbd 100644
--- a/apps/web/src/content/docs/evaluation/running-evals.mdx
+++ b/apps/web/src/content/docs/evaluation/running-evals.mdx
@@ -262,7 +262,7 @@ The `--file` option reads a JSON file with `{ "output": "...", "input": "..." }`
 
 **Exit codes:** 0 if score >= 0.5 (pass), 1 if score < 0.5 (fail).
 
-This is the same interface that agent-orchestrated evals use — the EVAL.yaml transpiler emits `assert` instructions for code graders so external grading agents can execute them directly.
+This is the same interface that agent-orchestrated evals use — the EVAL.yaml transpiler emits `assertions` instructions for code graders so external grading agents can execute them directly.
 
 ## Agent-Orchestrated Evals
 
diff --git a/apps/web/src/content/docs/evaluation/sdk.mdx b/apps/web/src/content/docs/evaluation/sdk.mdx
index e52c0862c..f8fa2a139 100644
--- a/apps/web/src/content/docs/evaluation/sdk.mdx
+++ b/apps/web/src/content/docs/evaluation/sdk.mdx
@@ -105,7 +105,7 @@ const { results, summary } = await evaluate({
     {
       id: 'greeting',
       input: 'Say hello',
-      assert: [{ type: 'contains', value: 'Hello' }],
+      assertions: [{ type: 'contains', value: 'Hello' }],
     },
   ],
 });
diff --git a/apps/web/src/content/docs/evaluators/composite.mdx b/apps/web/src/content/docs/evaluators/composite.mdx
index 668e27ddb..9f7340a41 100644
--- a/apps/web/src/content/docs/evaluators/composite.mdx
+++ b/apps/web/src/content/docs/evaluators/composite.mdx
@@ -30,9 +30,9 @@ assertions:
 ```
 
 Each sub-evaluator runs independently, then the aggregator combines their results.
-Use `assert` for composite members. `evaluators` is still accepted for backward compatibility.
+Use `assertions` for composite members. `evaluators` is still accepted for backward compatibility.
 
-If you only need weighted-average aggregation, a plain test-level `assert` list already computes a weighted mean across evaluators. Use `composite` when you need a custom aggregation strategy (`threshold`, `code_grader`, `llm_grader`) or nested evaluator groups.
+If you only need weighted-average aggregation, a plain test-level `assertions` list already computes a weighted mean across evaluators. Use `composite` when you need a custom aggregation strategy (`threshold`, `code_grader`, `llm_grader`) or nested evaluator groups.
 
 ## Aggregator Types
 
diff --git a/apps/web/src/content/docs/evaluators/custom-evaluators.mdx b/apps/web/src/content/docs/evaluators/custom-evaluators.mdx
index 73ee796e9..b7c847acf 100644
--- a/apps/web/src/content/docs/evaluators/custom-evaluators.mdx
+++ b/apps/web/src/content/docs/evaluators/custom-evaluators.mdx
@@ -13,11 +13,11 @@ AgentV supports multiple evaluator types that can be combined for comprehensive
 |------|-------------|----------|
 | `code_grader` | Deterministic command (Python/TS/any) | Exact matching, format validation, programmatic checks |
 | `llm_grader` | LLM-based evaluation with custom prompt | Semantic evaluation, nuance, subjective quality |
-| `rubrics` | Structured rubric evaluator via `assert` | Multi-criterion grading with weights |
+| `rubrics` | Structured rubric evaluator via `assertions` | Multi-criterion grading with weights |
 
 ## Referencing Evaluators
 
-Evaluators are configured using `assert` — either top-level (applies to all tests) or per-test:
+Evaluators are configured using `assertions` — either top-level (applies to all tests) or per-test:
 
 ### Top-Level (Default for All Tests)
 
@@ -72,7 +72,7 @@ tests:
 
 Each evaluator produces its own score. Results appear in `scores[]` in the output JSONL.
 
-For multiple evaluators in `assert`, the test score is the weighted mean:
+For multiple evaluators in `assertions`, the test score is the weighted mean:
 
 ```
 final_score = sum(score_i * weight_i) / sum(weight_i)
diff --git a/apps/web/src/content/docs/evaluators/llm-graders.mdx b/apps/web/src/content/docs/evaluators/llm-graders.mdx
index 246cffacc..2651fb8c2 100644
--- a/apps/web/src/content/docs/evaluators/llm-graders.mdx
+++ b/apps/web/src/content/docs/evaluators/llm-graders.mdx
@@ -9,17 +9,17 @@ LLM graders (also accepts `llm-judge` for backward compatibility) use a language
 
 ## Default Grader
 
-When a test defines `criteria` but has **no `assert` field**, a default `llm-grader` runs automatically. The built-in prompt evaluates the response against your `criteria` and `expected_output`:
+When a test defines `criteria` but has **no `assertions` field**, a default `llm-grader` runs automatically. The built-in prompt evaluates the response against your `criteria` and `expected_output`:
 
 ```yaml
 tests:
   - id: simple-eval
     criteria: Correctly explains the bug and proposes a fix
     input: "Debug this function..."
-    # No assert needed — default llm-grader evaluates against criteria
+    # No assertions needed — default llm-grader evaluates against criteria
 ```
 
-When `assert` **is** present, no default grader is added. To use an LLM grader alongside other evaluators, declare it explicitly. See [How criteria and assert interact](/evaluation/eval-cases/#how-criteria-and-assert-interact).
+When `assertions` **is** present, no default grader is added. To use an LLM grader alongside other evaluators, declare it explicitly. See [How criteria and assertions interact](/evaluation/eval-cases/#how-criteria-and-assertions-interact).
 
 ## Configuration
 
diff --git a/apps/web/src/content/docs/guides/agent-skills-evals.mdx b/apps/web/src/content/docs/guides/agent-skills-evals.mdx
index 6954360cc..a288711b2 100644
--- a/apps/web/src/content/docs/guides/agent-skills-evals.mdx
+++ b/apps/web/src/content/docs/guides/agent-skills-evals.mdx
@@ -56,7 +56,7 @@ When AgentV loads `evals.json`, it promotes fields to its internal representatio
 |---|---|---|
 | `prompt` | `input` | Wrapped as `[{role: "user", content: prompt}]` |
 | `expected_output` | `expected_output` + `criteria` | Used as reference answer and evaluation criteria |
-| `assertions[]` | `assert[]` | Each string becomes `{type: llm-grader, prompt: text}` |
+| `assertions[]` | `assertions[]` | Each string becomes `{type: llm-grader, prompt: text}` |
 | `files[]` | `file_paths` | Resolved relative to evals.json, copied into workspace |
 | `skill_name` | `metadata.skill_name` | Carried as metadata |
 | `id` (number) | `id` (string) | Converted via `String(id)` |
diff --git a/apps/web/src/content/docs/guides/human-review.mdx b/apps/web/src/content/docs/guides/human-review.mdx
index d8da87701..cff452ff9 100644
--- a/apps/web/src/content/docs/guides/human-review.mdx
+++ b/apps/web/src/content/docs/guides/human-review.mdx
@@ -147,7 +147,7 @@ For workspace evaluations with multiple evaluators (code graders, LLM graders, t
 }
 ```
 
-Keys use the format `evaluator-type:evaluator-name` to match the evaluators defined in `assert` blocks.
+Keys use the format `evaluator-type:evaluator-name` to match the evaluators defined in `assertions` blocks.
 
 ## Storing feedback across iterations
 
diff --git a/apps/web/src/content/docs/tools/convert.mdx b/apps/web/src/content/docs/tools/convert.mdx
index 417d0fe4c..aeac6640f 100644
--- a/apps/web/src/content/docs/tools/convert.mdx
+++ b/apps/web/src/content/docs/tools/convert.mdx
@@ -35,7 +35,7 @@ Converts an [Agent Skills `evals.json`](/guides/agent-skills-evals) file into an
 
 - Maps `prompt` → `input` message array
 - Maps `expected_output` → `expected_output`
-- Maps `assertions` → `assert` evaluators (llm-grader)
+- Maps `assertions` → `assertions` evaluators (llm-grader)
 - Resolves `files[]` paths relative to the evals.json directory
 - Adds TODO comments for AgentV-specific features (workspace setup, code graders, rubrics)
 
diff --git a/plugins/agentv-dev/agents/eval-analyzer.md b/plugins/agentv-dev/agents/eval-analyzer.md
index 31660128e..ad84ff377 100644
--- a/plugins/agentv-dev/agents/eval-analyzer.md
+++ b/plugins/agentv-dev/agents/eval-analyzer.md
@@ -48,7 +48,7 @@ For each evaluator entry in `scores` where `type` is `"llm-judge"` or `"rubrics"
 
 ### Step 3: Weak Assertion Detection
 
-Scan the EVAL.yaml `assert` entries (if `eval-path` provided) and the `reasoning` fields in results for weak assertions:
+Scan the EVAL.yaml `assertions` entries (if `eval-path` provided) and the `reasoning` fields in results for weak assertions:
 
 | Weakness | Detection | Improvement |
 |----------|-----------|-------------|
diff --git a/plugins/agentv-dev/skills/agentv-eval-analyzer/SKILL.md b/plugins/agentv-dev/skills/agentv-eval-analyzer/SKILL.md
index d059dc8d3..cf92fda0a 100644
--- a/plugins/agentv-dev/skills/agentv-eval-analyzer/SKILL.md
+++ b/plugins/agentv-dev/skills/agentv-eval-analyzer/SKILL.md
@@ -75,7 +75,7 @@ When results span multiple targets, flags evaluators with > 0.3 score variance a
 The analyzer report includes concrete YAML snippets for each suggestion. To apply:
 
 1. Open the EVAL.yaml referenced in the report
-2. Find the `assert` entry for the flagged evaluator (matched by `name` and `test_id`)
+2. Find the `assertions` entry for the flagged evaluator (matched by `name` and `test_id`)
 3. Replace or supplement the evaluator config with the suggested deterministic assertion
 4. Re-run `agentv eval` to verify the change produces equivalent scores
 
diff --git a/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md b/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
index 5ae6275a3..a32746d6e 100644
--- a/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
+++ b/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
@@ -35,7 +35,7 @@ agentv prompt eval --input evals.json --test-id 1
 agentv prompt eval --expected-output evals.json --test-id 1
 ```
 
-The converter maps `prompt` → `input`, `expected_output` → `expected_output`, `assertions` → `assert` (llm-judge), and resolves `files[]` paths. The generated YAML includes TODO comments for AgentV features to add (workspace setup, code judges, rubrics, required gates).
+The converter maps `prompt` → `input`, `expected_output` → `expected_output`, `assertions` → `assertions` (llm-grader), and resolves `files[]` paths. The generated YAML includes TODO comments for AgentV features to add (workspace setup, code judges, rubrics, required gates).
 
 If you're running the lifecycle through `agentv-bench`, use `agentv convert` and `agentv prompt eval` directly — the Python scripts in `agentv-bench/scripts/` orchestrate these same commands.
 
@@ -158,7 +158,7 @@ requires:
 
 ## Suite-level Input
 
-Prepend shared input messages to every test (like suite-level `assert`). Avoids repeating the same prompt file in each test:
+Prepend shared input messages to every test (like suite-level `assertions`). Avoids repeating the same prompt file in each test:
 
 ```yaml
 input:
@@ -505,7 +505,7 @@ Binary check: is the output valid JSON?
 LLM-judged structured evaluation with weighted criteria. Criteria items support `id`, `outcome`, `weight`, and `required` fields.
 
 ### rubrics (inline, deprecated)
-Top-level `rubrics:` field is deprecated. Use `type: rubrics` under `assert` instead.
+Top-level `rubrics:` field is deprecated. Use `type: rubrics` under `assertions` instead.
 See `references/rubric-evaluator.md` for score-range mode and scoring formula.
 
 ## Execution Error Tolerance
@@ -607,7 +607,7 @@ const { results, summary } = await evaluate({
     {
       id: 'greeting',
       input: 'Say hello',
-      assert: [{ type: 'contains', value: 'hello' }],
+      assertions: [{ type: 'contains', value: 'hello' }],
     },
   ],
   target: { provider: 'mock_agent' },