diff --git a/.gitignore b/.gitignore
index fbb9182..1985ac9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -31,3 +31,6 @@ datasets/
 
 # Playwright MCP scratch output (screenshots, console logs, downloads)
 .playwright-mcp/
+
+# Full-run log from `sqlbench regen` (tee'd locally, not an artifact)
+regen.log
diff --git a/Cargo.toml b/Cargo.toml
index f813457..5943d3a 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -59,6 +59,18 @@ harness = false
 name = "sqlbench"
 path = "src/bin/sqlbench.rs"
 
+[[bin]]
+name = "build_sqlite_suite"
+path = "src/bin/build_sqlite_suite.rs"
+
+[[bin]]
+name = "build_proc_suites"
+path = "src/bin/build_proc_suites.rs"
+
+[[bin]]
+name = "repair_corpus"
+path = "src/bin/repair_corpus.rs"
+
 # Strip only DWARF debug info from release builds. The WASM viewer is built with
 # `dx build --web --release`; rustc's DWARF tripped wasm-opt ("unsupported
 # version of DWARF"), so removing it lets wasm-opt succeed and shrinks the wasm.
diff --git a/README.md b/README.md
index d9d9fa4..1c3a10f 100644
--- a/README.md
+++ b/README.md
@@ -14,7 +14,7 @@ Choosing a SQL parser for a Rust project means weighing dialect coverage, correc
 
 We evaluated nine parser libraries: [sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs) (Apache DataFusion), [pg_query.rs](https://github.com/pganalyze/pg_query.rs) and its faster summary mode (Rust bindings to [libpg_query](https://github.com/pganalyze/libpg_query), PostgreSQL's own parser), [databend-common-ast](https://crates.io/crates/databend-common-ast), [polyglot-sql](https://github.com/tobilg/polyglot), [sqlglot-rust](https://crates.io/crates/sqlglot-rust), [qusql-parse](https://crates.io/crates/qusql-parse), [sqlite3-parser](https://crates.io/crates/sqlite3-parser) (lemon-rs), and [turso_parser](https://crates.io/crates/turso_parser) (the SQLite parser from Turso), plus [orql](https://codeberg.org/xitep/orql) on Oracle. We ran them against a corpus of 340,938 statements spanning 13 dialects, drawn from each engine's own regression suites and official samples and committed compressed so every run is reproducible.
 
-We exercised each parser in the dialect that matches the corpus under test. Where a dialect has a runnable engine, we labelled each statement valid or invalid with the real database engine itself, run in Docker via [testcontainers](https://github.com/testcontainers/testcontainers-rs): a statement counts as valid unless the engine reports a syntax error, so a missing table or column still counts as parsed. Against that ground truth we scored the parsers on recall (valid statements accepted), false positives (invalid statements wrongly accepted), display round-trip stability, and canonical-form fidelity. The other dialects have no runnable engine, so their statements count as provenance-valid and the metric is simply the acceptance rate. Across all dialects, we captured speed as a per-statement parse-time distribution over every accepted statement, and memory as the peak and retained bytes per statement under a counting allocator. A batch axis additionally parses each parser's whole accepted set as a single script, showing what bulk parsing amortizes, and a time machine benchmarks the historical releases of every pure-Rust parser (59 versions in total, including every sqlparser-rs minor since January 2023), so each parser page also charts how coverage, speed, and memory evolved across releases.
+We exercised each parser in the dialect that matches the corpus under test. Where a dialect has a runnable engine, we labelled each statement valid or invalid with the real database engine itself, run in Docker via [testcontainers](https://github.com/testcontainers/testcontainers-rs): a statement counts as valid unless the engine reports a syntax error, so a missing table or column still counts as parsed. Against that ground truth we scored the parsers on recall (valid statements accepted), false positives (invalid statements wrongly accepted), and display round-trip stability. The other dialects have no runnable engine, so their statements count as provenance-valid and the metric is simply the acceptance rate. Across all dialects, we captured speed as a per-statement parse-time distribution over every accepted statement, and memory as the peak and retained bytes per statement under a counting allocator. A batch axis additionally parses each parser's whole accepted set as a single script, showing what bulk parsing amortizes, and a time machine benchmarks the historical releases of every pure-Rust parser (59 versions in total, including every sqlparser-rs minor since January 2023), so each parser page also charts how coverage, speed, and memory evolved across releases.
 
 On their home dialect the reference bindings are exact by construction, so the more telling comparison is among the pure-Rust parsers. There, [sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs) is the most broadly capable, the permissive parsers such as [polyglot-sql](https://github.com/tobilg/polyglot) accept the most statements but pay for it with a high false-positive rate, and the stricter parsers reject more in exchange for precision. Speed spans more than an order of magnitude, from well under a microsecond per statement for the fastest parsers to the low single-digit microseconds for most, with [polyglot-sql](https://github.com/tobilg/polyglot) a clear outlier at roughly fifteen. No parser leads on every axis, so the right choice comes down to what a given project values most: broad coverage, few false positives, or raw speed.
 
@@ -38,7 +38,7 @@ Per-parser repository metadata (stars, contributors, fuzzing, test and benchmark
 
 340,938 statements across 32 files and 13 dialects, committed compressed as `datasets.tar.zst` (5.6 MB) and unpacked to `datasets/{dialect}/{name}.txt`, one statement per line. The commands below extract it automatically on first use. All sources are openly licensed (Apache-2.0, MIT, BSD, public domain or CC-BY), drawn from each engine's own regression suites and official samples. The SQLite corpus includes the SQLite project's own official test suite (public domain), which exercises SQLite-specific grammar such as PRAGMAs, virtual tables, recursive CTEs, and upsert. Natural-language-with-embedded-SQL datasets are intentionally excluded.
 
-Correctness is defined per dialect. Dialects with a runnable engine are graded against that real database engine, run in Docker via testcontainers by the `oracle` crate: a statement is valid unless the engine reports a syntax error (a missing table or column still counts as parsed). The validity labels are computed once and committed under `oracle/labels`, so grading and CI need no Docker. That reference splits the corpus into valid and invalid and scores recall, false positives, round-trip, and fidelity. Dialects with no runnable engine (cloud services, heavy JVM engines) have no reference, so their statements count as provenance-valid (sourced from each engine's own suites) and the metric is acceptance rate. Speed is a per-statement parse-time distribution over every accepted statement, timed with an adaptive iteration count on a no-`catch_unwind` path. Memory is measured separately with a counting allocator, as peak live bytes and retained (AST) bytes per statement. A companion batch axis parses each parser's whole accepted set as one script and normalizes the time and memory by the statement count, showing what bulk parsing amortizes against parsing one statement at a time. A batch that does not parse the whole set (a parser that bails out partway) is dropped rather than reported, and parsers without a multi-statement entry point (databend-common-ast) sit out the batch axis.
+Correctness is defined per dialect. Dialects with a runnable engine are graded against that real database engine, run in Docker via testcontainers by the `oracle` crate: a statement is valid unless the engine reports a syntax error (a missing table or column still counts as parsed). The validity labels are computed once and committed under `oracle/labels`, so grading and CI need no Docker. That reference splits the corpus into valid and invalid and scores recall, false positives, and round-trip. Dialects with no runnable engine (cloud services, heavy JVM engines) have no reference, so their statements count as provenance-valid (sourced from each engine's own suites) and the metric is acceptance rate. Speed is a per-statement parse-time distribution over every accepted statement, timed with an adaptive iteration count on a no-`catch_unwind` path. Memory is measured separately with a counting allocator, as peak live bytes and retained (AST) bytes per statement. A companion batch axis parses each parser's whole accepted set as one script and normalizes the time and memory by the statement count, showing what bulk parsing amortizes against parsing one statement at a time. A batch that does not parse the whole set (a parser that bails out partway) is dropped rather than reported, and parsers without a multi-statement entry point (databend-common-ast) sit out the batch axis.
 
 ## Running
 
diff --git a/benches/batch_parsing.rs b/benches/batch_parsing.rs
index 68c6650..51a5383 100644
--- a/benches/batch_parsing.rs
+++ b/benches/batch_parsing.rs
@@ -1,45 +1,39 @@
-//! Multi-dialect BATCH (whole-script) parse-time benchmark over the full
+//! Multi-dialect BATCH (multi-statement script) parse benchmark over the full
 //! `datasets/` corpus.
 //!
 //! Companion to `benches/parsing.rs`. Where `parsing` times each statement in
-//! isolation, this concatenates every statement a parser accepts in a dialect
-//! into one script and times parsing that whole script in a single call, then
-//! divides by the statement count to get a normalized per-statement cost. The
-//! contrast between this and the per-statement median isolates what a batch API
-//! pays or amortizes, the effect raised in issue #15: `Parser::parse_sql` grows
-//! a `Vec` of large `Statement` values, so bulk parsing can behave differently
-//! from many single-statement calls.
+//! isolation, this draws random fixed-size batches of statements a parser can
+//! individually digest, joins each into one script, and parses it in a single
+//! call. It reports two things per (parser, dialect): batch accuracy, the share
+//! of batches that reparse to exactly the expected statement count, and the
+//! per-statement parse time averaged over the batches that did. Sampling instead
+//! of concatenating the whole accepted set keeps one statement that mishandles
+//! the terminator (a real but narrow bug) from voiding the entire measurement
+//! under the all-or-nothing `parse_sql`.
 //!
-//! Both axes are measured over the SAME accepted set (statements the parser
-//! parses in that dialect), so the two numbers are directly comparable.
+//! The sampling, joining, and accuracy live in `sql_ast_benchmark::batch` so the
+//! memory bench (`membench -- batch`) and the time machine sample identically.
+//! Only parsers with a multi-statement entry point take part (`can_batch`).
 //!
-//! Only parsers with a multi-statement entry point take part (see
-//! `BenchParser::can_batch`). `databend-common-ast` parses one statement per
-//! call and is simply skipped here.
-//!
-//! Output (under `target/batch_dist/`), self-contained for now (not yet wired
-//! into the web export):
-//!   - `summary.csv` : per-pair statement count, statements the parser saw,
-//!     batch size in bytes, whole-script time, and time normalized per
-//!     statement.
+//! Output (`target/batch_dist/summary.csv`): per pair the eligible count, the
+//! number of batches, how many were correct, the accuracy percent, and the
+//! per-statement time over correct batches.
 //!
 //! Full run:        `cargo bench --bench batch_parsing`
 //! Smoke (default): `cargo test` or `cargo bench --bench batch_parsing -- --test`
-//!
-//! The full run unpacks `datasets.tar.zst` automatically if `datasets/` is
-//! missing. The smoke path needs no corpus, so `cargo test` stays fast.
 
-use sql_ast_benchmark::batch::join_batch;
+use sql_ast_benchmark::batch::{evaluate_batches, reports_statement_count, BATCH_K, BATCH_M};
 use sql_ast_benchmark::datasets::Dialect;
 use sql_ast_benchmark::report::load_dialect;
 use sql_ast_benchmark::BenchParser;
 use std::fs;
 use std::hint::black_box;
 use std::io::Write as _;
+use std::panic::AssertUnwindSafe;
 use std::time::Instant;
 
 /// Deep statements can exhaust the default stack inside recursive-descent
-/// parsers, and a stack overflow aborts the process, so time on a large stack.
+/// parsers, and a stack overflow aborts the process, so run on a large stack.
 const WORKER_STACK: usize = 1024 * 1024 * 1024;
 
 const OUT_DIR: &str = "target/batch_dist";
@@ -60,14 +54,13 @@ const DIALECTS: &[Dialect] = &[
     Dialect::Multi,
 ];
 
-/// Whole-script parse time (ns/batch): adaptive iteration count so a short
-/// script still accumulates enough work per round, capped low because one batch
-/// call already does a lot. Best (min) of `ROUNDS` rounds.
-fn time_batch(mut f: impl FnMut() -> usize) -> f64 {
-    const TARGET_NS: u128 = 2_000_000; // aim for ~2 ms of work per round
+/// Whole-sweep parse time (ns): adaptive iteration count so a short sweep still
+/// accumulates enough work per round, best (min) of `ROUNDS` rounds.
+fn time_sweep(mut f: impl FnMut() -> usize) -> f64 {
+    const TARGET_NS: u128 = 2_000_000;
     const ROUNDS: usize = 5;
 
-    black_box(f()); // warm up
+    black_box(f());
     let probe = Instant::now();
     black_box(f());
     let single = probe.elapsed().as_nanos().max(1);
@@ -85,56 +78,74 @@ fn time_batch(mut f: impl FnMut() -> usize) -> f64 {
     best
 }
 
+/// Parse one script to a statement count, treating a caught panic as 0 so a
+/// single pathological input does not abort the whole (parser, dialect) pair.
+fn safe_count(parser: BenchParser, sql: &str, dialect: Dialect) -> usize {
+    std::panic::catch_unwind(AssertUnwindSafe(|| {
+        parser.parse_batch(sql, dialect).unwrap_or(0)
+    }))
+    .unwrap_or(0)
+}
+
 struct Row {
     dialect: &'static str,
     parser: &'static str,
-    /// Statements fed into the batch (the parser's accepted set).
-    n_accepted: usize,
-    /// Statements the parser reported parsing from the batch (coverage).
-    n_parsed: usize,
-    batch_bytes: usize,
-    /// Whole-script parse time (ns).
-    batch_ns: f64,
-    /// `batch_ns / n_accepted`: time per statement in batch context.
-    ns_per_stmt: f64,
+    n_eligible: usize,
+    k: usize,
+    n_correct: usize,
+    accuracy_pct: Option<f64>,
+    /// Per-statement parse time over the correct batches (ns), `None` when none.
+    ns_per_stmt: Option<f64>,
 }
 
-/// Time one (parser, dialect) pair: build the accepted set, concatenate it into
-/// one script, time the whole-script parse, and normalize per statement.
+/// Evaluate one (parser, dialect) pair: build the eligible set, sample batches,
+/// measure accuracy, and time the batches that parsed correctly.
 fn run_pair(parser: BenchParser, dialect: Dialect, stmts: &[String]) -> Row {
-    let accepted: Vec<&str> = stmts
+    // Eligible = accepted, parses to exactly one statement alone, and safe to
+    // batch (not COPY ... FROM STDIN). The single==1 check makes the expected
+    // per-batch count exactly the batch size.
+    let eligible: Vec<&str> = stmts
         .iter()
-        .filter(|s| parser.accepts(s, dialect) == Some(true))
+        .filter(|s| {
+            parser.accepts(s, dialect) == Some(true)
+                && sql_ast_benchmark::batch::batch_eligible(s)
+                && safe_count(parser, s, dialect) == 1
+        })
         .map(String::as_str)
         .collect();
 
-    let mut row = Row {
+    let label = format!("{}/{}", dialect.dir_name(), parser.name());
+    let eval = evaluate_batches(&eligible, &label, |s| safe_count(parser, s, dialect));
+
+    let ns_per_stmt = if eval.n_correct == 0 {
+        None
+    } else {
+        let denom = (eval.n_correct * eval.effective_m) as f64;
+        let sweep = time_sweep(|| {
+            eval.correct_scripts
+                .iter()
+                .map(|s| safe_count(parser, s, dialect))
+                .sum()
+        });
+        Some(sweep / denom)
+    };
+
+    Row {
         dialect: dialect.dir_name(),
         parser: parser.name(),
-        n_accepted: accepted.len(),
-        n_parsed: 0,
-        batch_bytes: 0,
-        batch_ns: 0.0,
-        ns_per_stmt: 0.0,
-    };
-    if accepted.is_empty() {
-        return row;
+        n_eligible: eval.n_eligible,
+        k: eval.k,
+        n_correct: eval.n_correct,
+        accuracy_pct: eval.accuracy_pct(),
+        ns_per_stmt,
     }
-
-    let batch = join_batch(&accepted);
-    row.batch_bytes = batch.len();
-    row.n_parsed = parser.parse_batch(&batch, dialect).unwrap_or(0);
-    row.batch_ns = time_batch(|| parser.parse_batch(&batch, dialect).unwrap_or(0));
-    row.ns_per_stmt = row.batch_ns / accepted.len() as f64;
-    row
 }
 
 /// Quick smoke check used by `cargo test`: every batch-capable parser parses a
-/// tiny multi-statement script per supported dialect without panicking. Needs
-/// no corpus, so it stays instant.
+/// tiny multi-statement script per supported dialect without panicking.
 fn smoke() {
     std::panic::set_hook(Box::new(|_| {}));
-    let script = "SELECT 1;\nSELECT 2;\nSELECT 3";
+    let script = "SELECT 1\n;\nSELECT 2\n;\nSELECT 3";
     for &dialect in DIALECTS {
         for parser in BenchParser::all() {
             if !parser.can_batch() || !parser.supports(dialect) {
@@ -147,9 +158,6 @@ fn smoke() {
 }
 
 fn main() {
-    // Match `benches/parsing.rs`: only an explicit `cargo bench` (which passes
-    // `--bench` and not `--test`) does the full, datasets-backed run. `cargo
-    // test` and a bare run take the fast smoke path, which needs no corpus.
     let args: Vec<String> = std::env::args().collect();
     let full_run = args.iter().any(|a| a == "--bench") && !args.iter().any(|a| a == "--test");
     if !full_run {
@@ -157,8 +165,6 @@ fn main() {
         return;
     }
 
-    // Acceptance checks are panic-guarded. Suppress the default panic message so
-    // a caught panic does not spam stderr.
     std::panic::set_hook(Box::new(|_| {}));
 
     if let Err(e) = sql_ast_benchmark::datasets::ensure_corpus() {
@@ -170,12 +176,13 @@ fn main() {
     let mut summary = fs::File::create(format!("{OUT_DIR}/summary.csv")).expect("summary.csv");
     writeln!(
         summary,
-        "dialect,parser,n_accepted,n_parsed,batch_bytes,batch_ns,ns_per_stmt"
+        "dialect,parser,n_eligible,k,n_correct,accuracy_pct,ns_per_stmt"
     )
     .unwrap();
 
     let parsers = BenchParser::all();
     let start_all = Instant::now();
+    println!("batch sampling: m={BATCH_M} statements, k={BATCH_K} batches per pair");
 
     for &dialect in DIALECTS {
         let stmts = load_dialect(dialect);
@@ -187,9 +194,12 @@ fn main() {
             if !parser.can_batch() || !parser.supports(dialect) {
                 continue;
             }
+            // Skip parsers whose batch entry point does not report a true
+            // statement count (e.g. pg_query summary returns distinct types).
+            if !reports_statement_count(|s| safe_count(parser, s, dialect)) {
+                continue;
+            }
             let job_start = Instant::now();
-            // Run on a large stack: deeply nested accepted statements can
-            // otherwise overflow the default stack and abort the process.
             let result = std::thread::scope(|scope| {
                 std::thread::Builder::new()
                     .stack_size(WORKER_STACK)
@@ -206,33 +216,31 @@ fn main() {
                 continue;
             };
 
+            let acc = row
+                .accuracy_pct
+                .map_or_else(String::new, |a| format!("{a:.3}"));
+            let ns = row
+                .ns_per_stmt
+                .map_or_else(String::new, |n| format!("{n:.1}"));
             writeln!(
                 summary,
-                "{},{},{},{},{},{:.1},{:.1}",
-                row.dialect,
-                row.parser,
-                row.n_accepted,
-                row.n_parsed,
-                row.batch_bytes,
-                row.batch_ns,
-                row.ns_per_stmt,
+                "{},{},{},{},{},{acc},{ns}",
+                row.dialect, row.parser, row.n_eligible, row.k, row.n_correct,
             )
             .unwrap();
             summary.flush().unwrap();
 
-            let coverage = if row.n_accepted == 0 {
-                0.0
-            } else {
-                100.0 * row.n_parsed as f64 / row.n_accepted as f64
-            };
             println!(
-                "{:<11} {:<24} n={:>6} seen={:>6} ({:>3.0}%) batch={:>9.0}ns/stmt  ({:.1}s)",
+                "{:<11} {:<24} elig={:>6} ok={:>3}/{:<3} acc={:>6} batch={:>9}ns/stmt  ({:.1}s)",
                 row.dialect,
                 row.parser,
-                row.n_accepted,
-                row.n_parsed,
-                coverage,
-                row.ns_per_stmt,
+                row.n_eligible,
+                row.n_correct,
+                row.k,
+                row.accuracy_pct
+                    .map_or_else(|| "n/a".to_string(), |a| format!("{a:.1}%")),
+                row.ns_per_stmt
+                    .map_or_else(|| "n/a".to_string(), |n| format!("{n:.0}")),
                 job_start.elapsed().as_secs_f64(),
             );
         }
diff --git a/datasets.tar.zst b/datasets.tar.zst
index 2f8ce01..1de3a6e 100644
Binary files a/datasets.tar.zst and b/datasets.tar.zst differ
diff --git a/membench/src/main.rs b/membench/src/main.rs
index ab62f6e..4ba900d 100644
--- a/membench/src/main.rs
+++ b/membench/src/main.rs
@@ -26,7 +26,7 @@ use std::fs;
 use std::io::Write as _;
 use std::path::Path;
 
-use sql_ast_benchmark::batch::join_batch;
+use sql_ast_benchmark::batch::{batch_eligible, evaluate_batches, reports_statement_count};
 use sql_ast_benchmark::datasets::{ensure_corpus, Dialect};
 use sql_ast_benchmark::stats::slug;
 use sql_ast_benchmark::BenchParser;
@@ -172,16 +172,27 @@ fn run() {
     }
 }
 
-/// Whole-script (batch) memory: one (peak, retained) pair per (parser, dialect),
-/// normalized per statement, written to a single summary file. Only parsers with
-/// a batch entry point whose memory is visible to the Rust allocator take part.
+/// Parse one script to a statement count under panic protection, so a single
+/// pathological input cannot abort the whole batch run.
+fn safe_count(parser: BenchParser, sql: &str, dialect: Dialect) -> usize {
+    std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
+        parser.parse_batch(sql, dialect).unwrap_or(0)
+    }))
+    .unwrap_or(0)
+}
+
+/// Batch memory: per (parser, dialect) it samples the same random batches as the
+/// time bench (deterministic seed), measures peak/retained over the batches that
+/// reparse correctly, and records them normalized per statement to
+/// `target/batch_mem_dist/summary.csv`. Only parsers whose memory is visible to
+/// the Rust allocator take part (the libpg_query bindings report `None`).
 fn run_batch() {
     fs::create_dir_all(BATCH_OUT_DIR).expect("create batch_mem_dist dir");
     let mut summary =
         fs::File::create(format!("{BATCH_OUT_DIR}/summary.csv")).expect("create summary.csv");
     writeln!(
         summary,
-        "dialect,parser,n_accepted,n_parsed,peak_bytes,retained_bytes,peak_per_stmt,retained_per_stmt"
+        "dialect,parser,n_eligible,k,n_correct,accuracy_pct,peak_per_stmt,retained_per_stmt"
     )
     .expect("write header");
 
@@ -194,46 +205,66 @@ fn run_batch() {
             if !parser.can_batch() || !parser.supports(dialect) {
                 continue;
             }
-            let accepted: Vec<&str> = stmts
-                .iter()
-                .filter(|s| parser.accepts(s, dialect) == Some(true))
-                .map(String::as_str)
-                .collect();
-            if accepted.is_empty() {
+            // Skip parsers whose memory is invisible to the Rust allocator (the
+            // libpg_query bindings parse in C and report None).
+            if parser.measure_mem_batch("SELECT 1", dialect).is_none() {
                 continue;
             }
-            let batch = join_batch(&accepted);
-            // Warm up: let one-time caches/lazy statics allocate first, so they
-            // raise the baseline rather than this measurement. Also skips
-            // parsers whose memory is invisible to the Rust allocator (None).
-            if parser.measure_mem_batch(&batch, dialect).is_none() {
+            // Skip parsers whose batch entry point does not report a true count.
+            if !reports_statement_count(|s| safe_count(parser, s, dialect)) {
                 continue;
             }
-            let Some((peak, retained)) = parser.measure_mem_batch(&batch, dialect) else {
-                continue;
+            let eligible: Vec<&str> = stmts
+                .iter()
+                .filter(|s| {
+                    parser.accepts(s, dialect) == Some(true)
+                        && batch_eligible(s)
+                        && safe_count(parser, s, dialect) == 1
+                })
+                .map(String::as_str)
+                .collect();
+            let label = format!("{}/{}", dialect.dir_name(), parser.name());
+            let eval = evaluate_batches(&eligible, &label, |s| safe_count(parser, s, dialect));
+
+            let (peak_per_stmt, retained_per_stmt) = if eval.n_correct == 0 {
+                (String::new(), String::new())
+            } else {
+                let mut peak_sum = 0u128;
+                let mut ret_sum = 0u128;
+                for s in &eval.correct_scripts {
+                    if let Some((peak, retained)) = parser.measure_mem_batch(s, dialect) {
+                        peak_sum += peak as u128;
+                        ret_sum += retained as u128;
+                    }
+                }
+                let denom = (eval.n_correct * eval.effective_m) as f64;
+                (
+                    format!("{:.1}", peak_sum as f64 / denom),
+                    format!("{:.1}", ret_sum as f64 / denom),
+                )
             };
-            // Statements the parser actually consumed from the script, so the
-            // export can drop a pair whose batch parse bailed out early.
-            let n_parsed = parser.parse_batch(&batch, dialect).unwrap_or(0);
 
-            let n = accepted.len() as f64;
+            let acc = eval
+                .accuracy_pct()
+                .map_or_else(String::new, |a| format!("{a:.3}"));
             writeln!(
                 summary,
-                "{},{},{},{n_parsed},{peak},{retained},{:.1},{:.1}",
+                "{},{},{},{},{},{acc},{peak_per_stmt},{retained_per_stmt}",
                 dialect.dir_name(),
                 parser.name(),
-                accepted.len(),
-                peak as f64 / n,
-                retained as f64 / n,
+                eval.n_eligible,
+                eval.k,
+                eval.n_correct,
             )
             .expect("write row");
             summary.flush().expect("flush summary");
-            let coverage = 100.0 * n_parsed as f64 / n;
             eprintln!(
-                "batch-mem {} {}: n={} seen={n_parsed} ({coverage:.0}%) peak={peak} retained={retained}",
+                "batch-mem {} {}: elig={} ok={}/{} peak/stmt={peak_per_stmt} ret/stmt={retained_per_stmt}",
                 dialect.dir_name(),
                 parser.name(),
-                accepted.len(),
+                eval.n_eligible,
+                eval.n_correct,
+                eval.k,
             );
         }
     }
diff --git a/oracle/labels/clickhouse.tsv.zst b/oracle/labels/clickhouse.tsv.zst
index 730a01b..42b489f 100644
Binary files a/oracle/labels/clickhouse.tsv.zst and b/oracle/labels/clickhouse.tsv.zst differ
diff --git a/oracle/labels/duckdb.tsv.zst b/oracle/labels/duckdb.tsv.zst
index fb26d46..a5d9e80 100644
Binary files a/oracle/labels/duckdb.tsv.zst and b/oracle/labels/duckdb.tsv.zst differ
diff --git a/oracle/labels/mysql.tsv.zst b/oracle/labels/mysql.tsv.zst
index 4aac84e..74c149a 100644
Binary files a/oracle/labels/mysql.tsv.zst and b/oracle/labels/mysql.tsv.zst differ
diff --git a/oracle/labels/postgresql.tsv.zst b/oracle/labels/postgresql.tsv.zst
index 11e9d47..847ecc9 100644
Binary files a/oracle/labels/postgresql.tsv.zst and b/oracle/labels/postgresql.tsv.zst differ
diff --git a/oracle/labels/sqlite.tsv.zst b/oracle/labels/sqlite.tsv.zst
index ad69657..7c89cf0 100644
Binary files a/oracle/labels/sqlite.tsv.zst and b/oracle/labels/sqlite.tsv.zst differ
diff --git a/oracle/labels/tsql.tsv.zst b/oracle/labels/tsql.tsv.zst
index edd6fcb..1ff35a9 100644
Binary files a/oracle/labels/tsql.tsv.zst and b/oracle/labels/tsql.tsv.zst differ
diff --git a/oracle/src/main.rs b/oracle/src/main.rs
index 88305c6..ea2b3ea 100644
--- a/oracle/src/main.rs
+++ b/oracle/src/main.rs
@@ -117,32 +117,114 @@ async fn label_postgresql(stmts: &[String]) -> Result<Vec<bool>> {
     let port = node.get_host_port_ipv4(5432).await?;
     let conn_str =
         format!("host={host} port={port} user=postgres password=postgres dbname=postgres");
-    let (client, connection) = tokio_postgres::connect(&conn_str, NoTls)
-        .await
-        .context("connect postgres")?;
-    tokio::spawn(async move {
-        let _ = connection.await;
-    });
 
     let mut valid = Vec::with_capacity(stmts.len());
-    for (i, s) in stmts.iter().enumerate() {
-        // Make sure no aborted transaction is left from a prior error.
-        let _ = client.batch_execute("ROLLBACK").await;
-        let _ = client.batch_execute("BEGIN").await;
-        let res = client.batch_execute(s).await;
-        let _ = client.batch_execute("ROLLBACK").await;
-        let v = match res {
-            Ok(()) => true,
-            Err(e) => e.code() != Some(&SqlState::SYNTAX_ERROR),
-        };
-        valid.push(v);
-        if i % 2000 == 0 {
-            eprintln!("  postgresql {i}/{}", stmts.len());
+    let mut reconnects = 0usize;
+    // A statement that terminates the backend twice at the same index is a
+    // confirmed "poison" (some pg_regress statements crash/kill the connection);
+    // it is marked invalid and skipped, mirroring the ClickHouse handling.
+    let mut death_idx: Option<usize> = None;
+    let mut death_count = 0usize;
+
+    'session: while valid.len() < stmts.len() {
+        let (client, connection) = tokio_postgres::connect(&conn_str, NoTls)
+            .await
+            .context("connect postgres")?;
+        tokio::spawn(async move {
+            let _ = connection.await;
+        });
+
+        let mut unreachable_streak = 0usize;
+        while valid.len() < stmts.len() {
+            let i = valid.len();
+            // Make sure no aborted transaction is left from a prior error.
+            let _ = client.batch_execute("ROLLBACK").await;
+            let _ = client.batch_execute("BEGIN").await;
+            let res = client.batch_execute(&stmts[i]).await;
+            let _ = client.batch_execute("ROLLBACK").await;
+            // A verdict only counts if the server actually answered: an error with
+            // a SQLSTATE is a real result (syntax -> invalid, anything else parsed
+            // -> valid). An error with no code is a transport/connection failure,
+            // which must never be recorded as "valid".
+            let verdict = match res {
+                Ok(()) => Some(true),
+                Err(e) => e.code().map(|code| code != &SqlState::SYNTAX_ERROR),
+            };
+            match verdict {
+                Some(v) => {
+                    unreachable_streak = 0;
+                    valid.push(v);
+                    if i.is_multiple_of(2000) {
+                        eprintln!("  postgresql {i}/{}", stmts.len());
+                    }
+                }
+                None if is_copy_to_stdout(&stmts[i]) => {
+                    // `COPY ... TO STDOUT` parses fine but then streams rows over
+                    // the COPY sub-protocol, which `batch_execute` cannot consume,
+                    // so it breaks the connection with no SQLSTATE. Reaching that
+                    // stage proves it parsed (a syntax error would carry code
+                    // 42601 and be a real verdict above), so it is valid. Record it
+                    // and reconnect to replace the now-broken connection.
+                    valid.push(true);
+                    death_idx = None;
+                    death_count = 0;
+                    reconnects += 1;
+                    anyhow::ensure!(
+                        reconnects <= 50,
+                        "postgres reconnected {reconnects} times (last at statement {i}); aborting without writing a label cache"
+                    );
+                    continue 'session;
+                }
+                None if unreachable_streak + 1 < 6 => {
+                    unreachable_streak += 1;
+                    tokio::time::sleep(std::time::Duration::from_millis(200)).await;
+                }
+                None => {
+                    // Connection is gone: the backend died (often killed by the
+                    // statement itself). Reconnect and resume; if the same index
+                    // kills it twice, treat that statement as poison.
+                    if death_idx == Some(i) {
+                        death_count += 1;
+                    } else {
+                        death_idx = Some(i);
+                        death_count = 1;
+                    }
+                    if death_count >= 2 {
+                        eprintln!(
+                            "  postgresql: statement {i} repeatedly kills the backend; marking invalid and skipping: {}",
+                            stmts[i].chars().take(120).collect::<String>()
+                        );
+                        valid.push(false);
+                        death_idx = None;
+                        death_count = 0;
+                    } else {
+                        eprintln!(
+                            "  postgresql backend died at {i}/{}; reconnecting",
+                            stmts.len()
+                        );
+                    }
+                    reconnects += 1;
+                    anyhow::ensure!(
+                        reconnects <= 50,
+                        "postgres backend crashed {reconnects} times (last at statement {i}); aborting without writing a label cache"
+                    );
+                    continue 'session;
+                }
+            }
         }
     }
     Ok(valid)
 }
 
+/// Whether a statement is `COPY ... TO STDOUT`: valid SQL whose result is streamed
+/// over the COPY sub-protocol, which the simple-query probe cannot consume (it
+/// breaks the connection with no SQLSTATE). A syntactically invalid COPY instead
+/// returns a real syntax error, so this only matches genuinely-valid ones.
+fn is_copy_to_stdout(stmt: &str) -> bool {
+    let up = stmt.trim_start().to_ascii_uppercase();
+    up.starts_with("COPY") && up.contains("TO STDOUT")
+}
+
 /// MySQL: real server in a container. We use `PREPARE`, MySQL's parse-only path:
 /// it parses (and name-resolves) without executing, so there are no side effects
 /// and nothing blocks. Invalid iff `PREPARE` fails with error 1064
@@ -164,23 +246,43 @@ async fn label_mysql(stmts: &[String]) -> Result<Vec<bool>> {
     let mut conn = pool.get_conn().await.context("connect mysql")?;
 
     let mut valid = Vec::with_capacity(stmts.len());
-    for (i, s) in stmts.iter().enumerate() {
-        let stmt = s.trim().trim_end_matches(';');
+    let mut unreachable_streak = 0usize;
+    let mut i = 0;
+    while i < stmts.len() {
+        let stmt = stmts[i].trim().trim_end_matches(';');
         // Bind the statement text as a parameter (no injection), then PREPARE it.
-        let v = match conn.exec_drop("SET @q = ?", (stmt,)).await {
+        // Only a `Server` response is a real verdict (error 1064 = syntax ->
+        // invalid, any other server error parsed -> valid). A non-server error is
+        // a transport/connection failure and must never be recorded as "valid".
+        let verdict = match conn.exec_drop("SET @q = ?", (stmt,)).await {
             Ok(()) => match conn.query_drop("PREPARE _ck FROM @q").await {
                 Ok(()) => {
                     let _ = conn.query_drop("DEALLOCATE PREPARE _ck").await;
-                    true
+                    Some(true)
                 }
-                Err(mysql_async::Error::Server(e)) => e.code != 1064,
-                Err(_) => true,
+                Err(mysql_async::Error::Server(e)) => Some(e.code != 1064),
+                Err(_) => None,
             },
-            Err(_) => true,
+            Err(mysql_async::Error::Server(e)) => Some(e.code != 1064),
+            Err(_) => None,
         };
-        valid.push(v);
-        if i % 2000 == 0 {
-            eprintln!("  mysql {i}/{}", stmts.len());
+        match verdict {
+            Some(v) => {
+                unreachable_streak = 0;
+                valid.push(v);
+                i += 1;
+                if i.is_multiple_of(2000) {
+                    eprintln!("  mysql {i}/{}", stmts.len());
+                }
+            }
+            None => {
+                unreachable_streak += 1;
+                anyhow::ensure!(
+                    unreachable_streak < 10,
+                    "mysql became unreachable at statement {i}; aborting without writing a label cache"
+                );
+                tokio::time::sleep(std::time::Duration::from_millis(500)).await;
+            }
         }
     }
     drop(conn);
@@ -192,44 +294,153 @@ async fn label_mysql(stmts: &[String]) -> Result<Vec<bool>> {
 /// parses only (no execution, no tables needed). Invalid iff the exception code
 /// is 62 (SYNTAX_ERROR). Any other code (unknown table/identifier, not
 /// implemented) means it parsed, so it is valid.
+///
+/// Hardened on two fronts:
+///
+///  * Correctness: every response body is fully consumed before the connection is
+///    reused (the undrained error body was what desynced later responses and
+///    silently mislabeled statements valid), transport blips are retried, and a
+///    result that cannot be classified is never assumed valid.
+///  * Resilience: the pinned ClickHouse image segfaults nondeterministically under
+///    the sustained full-corpus load. When the engine stops responding the
+///    container is restarted and labeling resumes from the same statement (each
+///    `EXPLAIN AST` is independent, so a fresh engine yields identical verdicts).
+///    A restart cap stops an unrecoverable engine from looping forever.
 async fn label_clickhouse(stmts: &[String]) -> Result<Vec<bool>> {
     use testcontainers_modules::clickhouse::ClickHouse;
     use testcontainers_modules::testcontainers::runners::AsyncRunner;
 
-    let node = ClickHouse::default()
-        .start()
-        .await
-        .context("start clickhouse container")?;
-    let host = node.get_host().await?;
-    let port = node.get_host_port_ipv4(8123).await?;
-    let url = format!("http://{host}:{port}/");
-    let client = reqwest::Client::new();
+    let mut valid: Vec<bool> = Vec::with_capacity(stmts.len());
+    let mut restarts = 0usize;
+    let mut poisoned: Vec<usize> = Vec::new();
+    // Track repeated deaths at one index: a statement that crashes the engine twice
+    // in a row is a confirmed parser-crash ("poison") and is skipped as invalid.
+    let mut death_idx: Option<usize> = None;
+    let mut death_count = 0usize;
 
-    let mut valid = Vec::with_capacity(stmts.len());
-    for (i, s) in stmts.iter().enumerate() {
-        let query = format!("EXPLAIN AST {}", s.trim().trim_end_matches(';'));
-        let v = match client.post(&url).body(query).send().await {
-            Ok(resp) if resp.status().is_success() => true,
+    'engine: while valid.len() < stmts.len() {
+        let node = ClickHouse::default()
+            .start()
+            .await
+            .context("start clickhouse container")?;
+        let host = node.get_host().await?;
+        let port = node.get_host_port_ipv4(8123).await?;
+        let url = format!("http://{host}:{port}/");
+        let client = reqwest::Client::builder()
+            .timeout(std::time::Duration::from_secs(30))
+            .build()
+            .context("build clickhouse http client")?;
+
+        let mut consecutive_unreachable = 0usize;
+        while valid.len() < stmts.len() {
+            let i = valid.len();
+            let query = format!("EXPLAIN AST {}", stmts[i].trim().trim_end_matches(';'));
+            match clickhouse_classify(&client, &url, &query).await {
+                Some(v) => {
+                    consecutive_unreachable = 0;
+                    valid.push(v);
+                    if i.is_multiple_of(5000) {
+                        eprintln!("  clickhouse {i}/{}", stmts.len());
+                    }
+                }
+                None if consecutive_unreachable + 1 < 6 => {
+                    // A transient blip: wait and retry the SAME statement (do not
+                    // advance, do not guess a verdict).
+                    consecutive_unreachable += 1;
+                    tokio::time::sleep(std::time::Duration::from_millis(500)).await;
+                }
+                None => {
+                    // Engine unreachable: it has crashed. Was the crash provoked by
+                    // this exact statement (it died here last restart too)?
+                    if death_idx == Some(i) {
+                        death_count += 1;
+                    } else {
+                        death_idx = Some(i);
+                        death_count = 1;
+                    }
+                    drop(node);
+                    if death_count >= 2 {
+                        eprintln!(
+                            "  clickhouse: statement {i} repeatedly crashes the engine; marking invalid and skipping: {}",
+                            stmts[i].chars().take(120).collect::<String>()
+                        );
+                        valid.push(false);
+                        poisoned.push(i);
+                        death_idx = None;
+                        death_count = 0;
+                    } else {
+                        eprintln!(
+                            "  clickhouse unreachable at {i}/{}; restarting engine",
+                            stmts.len()
+                        );
+                    }
+                    restarts += 1;
+                    anyhow::ensure!(
+                        restarts <= 50,
+                        "ClickHouse crashed {restarts} times (last at statement {i}); aborting without writing a label cache"
+                    );
+                    continue 'engine;
+                }
+            }
+        }
+    }
+    if !poisoned.is_empty() {
+        eprintln!(
+            "  clickhouse: {} statement(s) crashed the engine and were marked invalid (indices: {:?})",
+            poisoned.len(),
+            poisoned
+        );
+    }
+    Ok(valid)
+}
+
+/// Classify one ClickHouse `EXPLAIN AST` request, retrying transient transport
+/// failures. The response body is always fully read before returning, so a
+/// connection is never left mid-stream (the bug that desynced reused connections).
+/// `Some(true)` if the request succeeded (2xx) or failed with a non-syntax
+/// exception code (the statement parsed, the engine just could not resolve or
+/// execute it); `Some(false)` for exception code 62 (`SYNTAX_ERROR`) or an
+/// unclassifiable response; `None` if the engine was unreachable after retries.
+async fn clickhouse_classify(client: &reqwest::Client, url: &str, query: &str) -> Option<bool> {
+    for attempt in 0..3 {
+        match client.post(url).body(query.to_string()).send().await {
             Ok(resp) => {
-                let code = resp
+                let success = resp.status().is_success();
+                let header_code = resp
                     .headers()
                     .get("x-clickhouse-exception-code")
                     .and_then(|h| h.to_str().ok())
                     .and_then(|s| s.parse::<i32>().ok());
-                match code {
+                let body = resp.text().await.unwrap_or_default();
+                if success {
+                    return Some(true);
+                }
+                return Some(match header_code.or_else(|| parse_clickhouse_code(&body)) {
                     Some(62) => false,
                     Some(_) => true,
-                    None => !resp.text().await.unwrap_or_default().contains("Code: 62."),
-                }
+                    None => false,
+                });
             }
-            Err(_) => true,
-        };
-        valid.push(v);
-        if i % 5000 == 0 {
-            eprintln!("  clickhouse {i}/{}", stmts.len());
+            Err(_) if attempt < 2 => {
+                tokio::time::sleep(std::time::Duration::from_millis(200)).await;
+            }
+            Err(_) => return None,
         }
     }
-    Ok(valid)
+    None
+}
+
+/// Parse the leading exception code from a ClickHouse error body, e.g.
+/// `"Code: 62. DB::Exception: ..."` -> `Some(62)`.
+fn parse_clickhouse_code(body: &str) -> Option<i32> {
+    let digits: String = body
+        .trim_start()
+        .strip_prefix("Code:")?
+        .trim_start()
+        .chars()
+        .take_while(char::is_ascii_digit)
+        .collect();
+    digits.parse().ok()
 }
 
 /// SQL Server (T-SQL): real server in a container. `SET PARSEONLY ON` parses
@@ -273,14 +484,38 @@ async fn label_tsql(stmts: &[String]) -> Result<Vec<bool>> {
         .await?;
 
     let mut valid = Vec::with_capacity(stmts.len());
-    for (i, s) in stmts.iter().enumerate() {
-        let v = match client.simple_query(s.as_str()).await {
-            Ok(stream) => stream.into_results().await.is_ok(),
-            Err(_) => false,
+    let mut unreachable_streak = 0usize;
+    let mut i = 0;
+    while i < stmts.len() {
+        // Under PARSEONLY the only `Server` error is a syntax error (invalid). A
+        // non-server error is a transport/connection failure: never record it as a
+        // verdict, retry, and abort if the engine stays unreachable.
+        let verdict = match client.simple_query(stmts[i].as_str()).await {
+            Ok(stream) => match stream.into_results().await {
+                Ok(_) => Some(true),
+                Err(tiberius::error::Error::Server(_)) => Some(false),
+                Err(_) => None,
+            },
+            Err(tiberius::error::Error::Server(_)) => Some(false),
+            Err(_) => None,
         };
-        valid.push(v);
-        if i % 2000 == 0 {
-            eprintln!("  tsql {i}/{}", stmts.len());
+        match verdict {
+            Some(v) => {
+                unreachable_streak = 0;
+                valid.push(v);
+                i += 1;
+                if i.is_multiple_of(2000) {
+                    eprintln!("  tsql {i}/{}", stmts.len());
+                }
+            }
+            None => {
+                unreachable_streak += 1;
+                anyhow::ensure!(
+                    unreachable_streak < 10,
+                    "sql server became unreachable at statement {i}; aborting without writing a label cache"
+                );
+                tokio::time::sleep(std::time::Duration::from_millis(500)).await;
+            }
         }
     }
     Ok(valid)
@@ -346,6 +581,18 @@ fn label_sqlite(stmts: &[String]) -> Result<Vec<bool>> {
     });
     let out = child.wait_with_output().context("sqlite3 wait")?;
     let _ = writer.join();
+
+    // sqlite3 normally exits 0 (clean) or 1 (some statement errored, `.bail off`
+    // keeps going). A crash or container failure surfaces as the container exit
+    // code >= 128 (128 + signal, e.g. 139 = SIGSEGV) or a docker error (125-127),
+    // or no code at all. In those cases the script stopped early, so the unscanned
+    // tail would silently default to "valid" -- abort instead of writing garbage.
+    match out.status.code() {
+        Some(0 | 1) => {}
+        other => anyhow::bail!(
+            "sqlite3 ended abnormally (exit {other:?}); a statement likely crashed the CLI. Aborting without writing a label cache"
+        ),
+    }
     let stderr = String::from_utf8_lossy(&out.stderr);
 
     let mut valid = vec![true; stmts.len()];
@@ -391,7 +638,35 @@ fn is_sqlite_invalid(msg: &str) -> bool {
 
 #[cfg(test)]
 mod tests {
-    use super::{is_sqlite_invalid, parse_sqlite_err};
+    use super::{is_copy_to_stdout, is_sqlite_invalid, parse_clickhouse_code, parse_sqlite_err};
+
+    #[test]
+    fn copy_to_stdout_is_recognized() {
+        assert!(is_copy_to_stdout("COPY (SELECT 1) TO STDOUT"));
+        assert!(is_copy_to_stdout("copy (select 1) to stdout"));
+        assert!(is_copy_to_stdout("COPY (SELECT 1) TO STDOUT WITH CSV"));
+        assert!(is_copy_to_stdout("  COPY t TO STDOUT"));
+        // Not COPY-to-stdout: a real syntax verdict handles these, or they differ.
+        assert!(!is_copy_to_stdout("SELECT 'COPY x TO STDOUT'"));
+        assert!(!is_copy_to_stdout("COPY t FROM STDIN"));
+        assert!(!is_copy_to_stdout("SELECT 1"));
+    }
+
+    #[test]
+    fn parse_clickhouse_code_reads_leading_code() {
+        assert_eq!(
+            parse_clickhouse_code("Code: 62. DB::Exception: Syntax error: ..."),
+            Some(62)
+        );
+        assert_eq!(
+            parse_clickhouse_code("Code: 47. DB::Exception: Unknown identifier"),
+            Some(47)
+        );
+        assert_eq!(parse_clickhouse_code("  Code: 999. foo"), Some(999));
+        // No parseable code -> None, which the caller treats as invalid.
+        assert_eq!(parse_clickhouse_code("totally unexpected body"), None);
+        assert_eq!(parse_clickhouse_code(""), None);
+    }
 
     #[test]
     fn missing_object_errors_are_valid() {
diff --git a/sqlparser-create-user-terminator-bug.md b/sqlparser-create-user-terminator-bug.md
new file mode 100644
index 0000000..a53da3f
--- /dev/null
+++ b/sqlparser-create-user-terminator-bug.md
@@ -0,0 +1,144 @@
+# `CREATE USER` and `ALTER USER ... SET` consume the statement terminator, breaking any following statement
+
+## Summary
+
+In sqlparser, `CREATE USER <name>` (and `ALTER USER <name> SET ...`) parse correctly on their own, but when one is followed by another statement in the same script the parse fails. The shared helper `parse_key_value_options`, used to read the trailing option list, consumes the `;` terminator. The top-level statement loop then no longer sees a separator before the next statement and returns `Expected: end of statement, found: <next token>`, pointing at the first token after the semicolon.
+
+The defect is in `parse_key_value_options` itself, so it affects every statement that ends by calling it in unparenthesized mode. In 0.62.0 there are three such call sites: `CREATE USER`, `ALTER USER ... SET <props>`, and `ALTER USER ... SET TAG ...`. The bug is dialect independent (`GenericDialect`, `MySqlDialect`, `PostgreSqlDialect`, and `SnowflakeDialect` all behave identically). Statements that do not reach the helper (for example `CREATE ROLE`, `DROP USER`, and `ALTER USER ... RENAME`) are unaffected.
+
+## Affected versions
+
+Reproduced on `sqlparser` 0.62.0 (crates.io) and on current `main`.
+
+## Reproduction
+
+`Cargo.toml`:
+
+```toml
+[dependencies]
+sqlparser = "0.62.0"
+```
+
+`src/main.rs`:
+
+```rust
+use sqlparser::dialect::GenericDialect;
+use sqlparser::parser::Parser;
+
+fn check(sql: &str) {
+    match Parser::parse_sql(&GenericDialect {}, sql) {
+        Ok(v) => println!("{sql:<46} -> Ok({} statements)", v.len()),
+        Err(e) => println!("{sql:<46} -> {e}"),
+    }
+}
+
+fn main() {
+    // Affected: each ends in an unparenthesized key-value option list.
+    check("CREATE USER user1; SELECT 1");
+    check("ALTER USER user1 SET x = 'y'; SELECT 1");
+    check("ALTER USER user1 SET TAG t = 'v'; SELECT 1");
+
+    // Fine on their own (the terminator is followed by EOF).
+    check("CREATE USER user1");
+    check("ALTER USER user1 SET x = 'y'");
+
+    // Unaffected: never reach parse_key_value_options.
+    check("SELECT 1; CREATE USER user1");
+    check("CREATE ROLE role1; SELECT 1");
+    check("DROP USER user1; SELECT 1");
+    check("ALTER USER user1 RENAME TO user2; SELECT 1");
+    check("SELECT 1; SELECT 2");
+}
+```
+
+## Observed behavior
+
+```text
+CREATE USER user1; SELECT 1                    -> Expected: end of statement, found: SELECT at Line: 1, Column: 20
+ALTER USER user1 SET x = 'y'; SELECT 1         -> Expected: end of statement, found: SELECT at Line: 1, Column: 31
+ALTER USER user1 SET TAG t = 'v'; SELECT 1     -> Expected: end of statement, found: SELECT at Line: 1, Column: 35
+CREATE USER user1                              -> Ok(1 statements)
+ALTER USER user1 SET x = 'y'                   -> Ok(1 statements)
+SELECT 1; CREATE USER user1                    -> Ok(2 statements)
+CREATE ROLE role1; SELECT 1                    -> Ok(2 statements)
+DROP USER user1; SELECT 1                      -> Ok(2 statements)
+ALTER USER user1 RENAME TO user2; SELECT 1     -> Ok(2 statements)
+SELECT 1; SELECT 2                             -> Ok(2 statements)
+```
+
+The first three inputs fail. Each affected statement parses alone, and the following statement parses alone, yet the two together fail. The affected statement even works when it is the last statement (`SELECT 1; CREATE USER user1` is `Ok(2)`), because then the terminator is followed by EOF and nothing is left to mis-parse. The reported column is always the position of the token immediately after the `;`, which shows the terminator has already been consumed by the time the error is raised.
+
+## Expected behavior
+
+`CREATE USER user1; SELECT 1` (and the two `ALTER USER ... SET` forms) should parse as two statements, the same way `CREATE ROLE role1; SELECT 1` and `SELECT 1; SELECT 2` do.
+
+## Root cause
+
+`parse_key_value_options` (src/parser/mod.rs, around line 20449) drives its loop with `self.next_token()`, which advances past the token it returns. Its terminator arm (around line 20468) breaks on a semicolon that has already been consumed:
+
+```rust
+loop {
+    match self.next_token().token {
+        // ...
+        Token::EOF | Token::SemiColon => break, // the ';' is consumed, then we break
+        // ...
+    }
+}
+```
+
+So when the option list is unparenthesized and ends at a `;`, the `;` is eaten and discarded. Control returns to the top-level statement loop, which expects a `;` separator (or EOF) before the next statement. Because the separator is gone, it sees the next statement's first token directly and fails with `Expected: end of statement, found: <token>`.
+
+The three unparenthesized call sites in 0.62.0 are:
+
+- `parse_create_user`, src/parser/mod.rs around line 5224: `self.parse_key_value_options(false, &[Keyword::WITH, Keyword::TAG])`.
+- `parse_alter_user`, src/parser/alter.rs around line 262 (the `SET TAG` branch): `self.parse_key_value_options(false, &[])`.
+- `parse_alter_user`, src/parser/alter.rs around line 280 (the `SET <props>` branch): `self.parse_key_value_options(false, &[])`.
+
+This explains every case above:
+
+- `CREATE USER user1` alone: the loop reads `EOF` and breaks. Fine.
+- `CREATE USER user1; SELECT 1`: the loop consumes `;` and breaks, then the top level sees `SELECT` with no preceding separator. Error.
+- `SELECT 1; CREATE USER user1`: `SELECT` is parsed normally and does not eat its trailing `;`, then `CREATE USER` is parsed last and ends at `EOF`. Fine.
+- `CREATE ROLE`, `DROP USER`, `ALTER USER ... RENAME`: they do not route through `parse_key_value_options`, so they terminate correctly.
+
+The parenthesized callers (`parse_key_value_options(true, ...)`, used by the Snowflake `FILE_FORMAT`, `COPY`, and similar option lists) are unaffected, because they end on `)` rather than on the statement terminator.
+
+## Suggested fix
+
+Do not consume the terminator. Put the semicolon back before breaking so the caller and the top-level statement loop can see it, for example:
+
+```rust
+loop {
+    match self.next_token().token {
+        // ...
+        Token::EOF => break,
+        Token::SemiColon => {
+            self.prev_token();
+            break;
+        }
+        // ...
+    }
+}
+```
+
+(The `EOF` case needs no `prev_token`.) Peeking instead of consuming would work as well. This mirrors how the `end_words` arm already calls `self.prev_token()` before breaking. Fixing the single helper repairs all three statements at once.
+
+## Impact
+
+Any multi-statement script that contains `CREATE USER` or `ALTER USER ... SET` in a non-final position fails to parse in full. This was found while benchmarking sqlparser on whole-script (multi-statement) parsing of real-world SQL corpora, where a single such statement voids the entire script because `parse_sql` is all-or-nothing.
+
+## Suggested regression test
+
+```rust
+#[test]
+fn key_value_option_statements_do_not_swallow_following_statement() {
+    for sql in [
+        "CREATE USER user1; SELECT 1",
+        "ALTER USER user1 SET x = 'y'; SELECT 1",
+        "ALTER USER user1 SET TAG t = 'v'; SELECT 1",
+    ] {
+        let stmts = Parser::parse_sql(&GenericDialect {}, sql).unwrap();
+        assert_eq!(stmts.len(), 2, "{sql}");
+    }
+}
+```
diff --git a/src/batch.rs b/src/batch.rs
index 4a169ed..a784662 100644
--- a/src/batch.rs
+++ b/src/batch.rs
@@ -1,51 +1,287 @@
-//! Shared construction of a multi-statement script for the batch benchmarks.
+//! Shared construction and sampling for the batch benchmarks.
 //!
-//! Both the batch time bench (`benches/batch_parsing.rs`) and the batch memory
-//! bench (`membench -- batch`) must feed parsers byte-identical input, so the
-//! join lives here in one place rather than in each binary.
+//! The batch axis measures how a parser handles a multi-statement script. Rather
+//! than concatenating a parser's whole accepted set (where one statement that
+//! mishandles the terminator makes the all-or-nothing `parse_sql` return zero and
+//! voids the entire measurement), we draw `BATCH_K` random batches of `BATCH_M`
+//! statements from the set the parser can individually digest, parse each as one
+//! script, and report the share that reparse to the exact expected count plus the
+//! time and memory over the batches that did. The time bench
+//! (`benches/batch_parsing.rs`), the memory bench (`membench -- batch`), and the
+//! time machine (`timemachine`) all use the helpers here so they sample and join
+//! identically.
 
-/// Join accepted statements into a single multi-statement script.
+/// Statements per sampled batch.
+pub const BATCH_M: usize = 128;
+
+/// Number of sampled batches per (parser, dialect).
+pub const BATCH_K: usize = 200;
+
+/// Base seed for the deterministic sampler. Mixed per (parser, dialect) so each
+/// pair samples reproducibly but distinctly.
+pub const BATCH_SEED: u64 = 0x5108_5A17_B47C_0DE5;
+
+/// A three-distinct-statement probe to check a parser reports a true count.
+///
+/// A parser whose batch entry point returns something other than 3 here (for
+/// example `pg_query` summary mode, which returns the number of distinct
+/// statement types) cannot be scored on batch accuracy and is left out of the
+/// batch axis.
+pub const COUNT_PROBE: &str = "SELECT 1\n;\nSELECT 2\n;\nSELECT 3";
+
+/// Whether `count` (a parser's whole-script statement count) reports a true
+/// statement count, checked against [`COUNT_PROBE`].
+pub fn reports_statement_count(mut count: impl FnMut(&str) -> usize) -> bool {
+    count(COUNT_PROBE) == 3
+}
+
+/// Join statements into a single multi-statement script.
 ///
-/// Each corpus statement is one line, so a `;`-and-newline separator yields an
-/// unambiguous script. A trailing `;` on a statement is stripped first to avoid
-/// an empty statement between terminators. The last statement gets no terminator
-/// (none is required at end of input).
+/// The separator is a newline, then the `;` terminator, then a newline. The
+/// leading newline is essential: a corpus statement is a single line and may end
+/// in a `--` (or `#`) line comment, which runs to end of line, so a terminator
+/// placed on the same line would be swallowed by that comment and silently merge
+/// two statements into one. Putting the terminator on its own line closes any
+/// trailing line comment first. A trailing `;` is stripped to avoid an empty
+/// statement between terminators, and the last statement gets no terminator.
 #[must_use]
-pub fn join_batch(accepted: &[&str]) -> String {
-    let mut out = String::with_capacity(accepted.iter().map(|s| s.len() + 2).sum());
-    for (i, s) in accepted.iter().enumerate() {
+pub fn join_batch(stmts: &[&str]) -> String {
+    let mut out = String::with_capacity(stmts.iter().map(|s| s.len() + 3).sum());
+    for (i, s) in stmts.iter().enumerate() {
         if i > 0 {
-            out.push_str(";\n");
+            out.push_str("\n;\n");
         }
         out.push_str(s.trim().trim_end_matches(';').trim_end());
     }
     out
 }
 
+/// Whether a statement is safe to place in a concatenated batch script.
+///
+/// `COPY ... FROM STDIN` reads the lines that follow it as inline data until a
+/// `\.` terminator, so in a single script it swallows every statement after it.
+/// It parses fine on its own, so it stays in the per-statement benchmarks, but it
+/// must be excluded from the batch. Statements are single-line, so a token scan
+/// is enough.
+#[must_use]
+pub fn batch_eligible(stmt: &str) -> bool {
+    let toks: Vec<String> = stmt
+        .split_whitespace()
+        .map(str::to_ascii_lowercase)
+        .collect();
+    let is_copy_from_stdin = toks.iter().any(|t| t == "copy")
+        && toks.windows(2).any(|w| w[0] == "from" && w[1] == "stdin");
+    !is_copy_from_stdin
+}
+
+/// Deterministic `SplitMix64`. Used to sample batches reproducibly without
+/// pulling in an RNG dependency (the rest of the benchmark is deterministic).
+struct SplitMix64(u64);
+
+impl SplitMix64 {
+    const fn new(seed: u64) -> Self {
+        Self(seed)
+    }
+
+    const fn next_u64(&mut self) -> u64 {
+        self.0 = self.0.wrapping_add(0x9E37_79B9_7F4A_7C15);
+        let mut z = self.0;
+        z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
+        z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
+        z ^ (z >> 31)
+    }
+
+    /// Uniform-ish index in `0..n` (n > 0). Modulo bias is negligible here.
+    const fn below(&mut self, n: usize) -> usize {
+        (self.next_u64() % n as u64) as usize
+    }
+}
+
+/// A reproducible per-pair seed derived from the base seed and a label.
+#[must_use]
+pub fn seed_for(label: &str) -> u64 {
+    let mut h = BATCH_SEED;
+    for &b in label.as_bytes() {
+        h ^= u64::from(b);
+        h = h.wrapping_mul(0x0000_0100_0000_01b3);
+    }
+    h
+}
+
+/// Sample `k` batches of distinct indices from `0..n`.
+///
+/// Each batch holds `min(m, n)` distinct indices (partial Fisher-Yates), batches
+/// may overlap, and the result is deterministic for a given `seed`. Returns an
+/// empty vec when `n == 0`.
+#[must_use]
+pub fn sample_batches(n: usize, m: usize, k: usize, seed: u64) -> Vec<Vec<usize>> {
+    if n == 0 {
+        return Vec::new();
+    }
+    let take = m.min(n);
+    let mut rng = SplitMix64::new(seed);
+    let mut pool: Vec<usize> = (0..n).collect();
+    let mut out = Vec::with_capacity(k);
+    for _ in 0..k {
+        // Partial Fisher-Yates: swap `take` random picks to the front, then read.
+        for i in 0..take {
+            let j = i + rng.below(n - i);
+            pool.swap(i, j);
+        }
+        out.push(pool[..take].to_vec());
+    }
+    out
+}
+
+/// Result of measuring a parser on `k` sampled batches.
+pub struct BatchEval {
+    /// Statements eligible for batching (accepted, single, not input-consuming).
+    pub n_eligible: usize,
+    /// Distinct statements per batch actually used (`min(BATCH_M, n_eligible)`).
+    pub effective_m: usize,
+    /// Number of batches attempted.
+    pub k: usize,
+    /// Batches that reparsed to exactly `effective_m` statements.
+    pub n_correct: usize,
+    /// The joined scripts of the correct batches, for timing or memory probing.
+    pub correct_scripts: Vec<String>,
+}
+
+impl BatchEval {
+    /// Accuracy as a percentage, or `None` when nothing was eligible.
+    #[must_use]
+    pub fn accuracy_pct(&self) -> Option<f64> {
+        (self.k > 0).then(|| 100.0 * self.n_correct as f64 / self.k as f64)
+    }
+}
+
+/// Sample batches from `eligible` and find those that reparse to the full count.
+///
+/// Draws `BATCH_K` batches of `BATCH_M` (seeded reproducibly by `label`), joins
+/// each, and uses `count` (the parser's whole-script statement count) to keep the
+/// batches that reparse to exactly `effective_m`. `eligible` must already be
+/// filtered to statements the parser accepts, that parse to exactly one statement
+/// alone, and that satisfy [`batch_eligible`].
+pub fn evaluate_batches(
+    eligible: &[&str],
+    label: &str,
+    mut count: impl FnMut(&str) -> usize,
+) -> BatchEval {
+    let n = eligible.len();
+    let batches = sample_batches(n, BATCH_M, BATCH_K, seed_for(label));
+    let effective_m = BATCH_M.min(n);
+    let mut correct_scripts = Vec::new();
+    for idxs in &batches {
+        let stmts: Vec<&str> = idxs.iter().map(|&i| eligible[i]).collect();
+        let script = join_batch(&stmts);
+        if count(&script) == effective_m {
+            correct_scripts.push(script);
+        }
+    }
+    BatchEval {
+        n_eligible: n,
+        effective_m,
+        k: batches.len(),
+        n_correct: correct_scripts.len(),
+        correct_scripts,
+    }
+}
+
 #[cfg(test)]
 mod tests {
-    use super::join_batch;
+    use super::*;
 
     #[test]
-    fn joins_with_terminators_and_strips_trailing_semicolons() {
+    fn joins_with_terminator_on_its_own_line() {
         assert_eq!(
             join_batch(&["SELECT 1;", "SELECT 2"]),
-            "SELECT 1;\nSELECT 2"
+            "SELECT 1\n;\nSELECT 2"
         );
-        // Already-terminated and whitespace-padded statements normalize cleanly.
         assert_eq!(
             join_batch(&["  SELECT 1 ;  ", "SELECT 2 ;"]),
-            "SELECT 1;\nSELECT 2"
+            "SELECT 1\n;\nSELECT 2"
         );
     }
 
+    #[test]
+    fn terminator_survives_a_trailing_line_comment() {
+        let joined = join_batch(&["SELECT 1 -- note", "SELECT 2"]);
+        assert_eq!(joined, "SELECT 1 -- note\n;\nSELECT 2");
+        assert!(joined.contains("\n;\n"));
+    }
+
     #[test]
     fn single_statement_has_no_terminator() {
         assert_eq!(join_batch(&["SELECT 1"]), "SELECT 1");
+        assert_eq!(join_batch(&[]), "");
     }
 
     #[test]
-    fn empty_input_is_empty() {
-        assert_eq!(join_batch(&[]), "");
+    fn copy_from_stdin_is_excluded() {
+        assert!(!batch_eligible("COPY t FROM STDIN"));
+        assert!(!batch_eligible("copy t  from   stdin null 'x'"));
+        assert!(batch_eligible("SELECT 1"));
+        assert!(batch_eligible("INSERT INTO t SELECT * FROM other"));
+    }
+
+    #[test]
+    fn sampler_is_deterministic_distinct_and_sized() {
+        let a = sample_batches(1000, 128, 200, 42);
+        let b = sample_batches(1000, 128, 200, 42);
+        assert_eq!(a, b, "same seed gives same batches");
+        assert_ne!(
+            a,
+            sample_batches(1000, 128, 200, 43),
+            "seed changes batches"
+        );
+        assert_eq!(a.len(), 200);
+        for batch in &a {
+            assert_eq!(batch.len(), 128);
+            let mut sorted = batch.clone();
+            sorted.sort_unstable();
+            sorted.dedup();
+            assert_eq!(sorted.len(), 128, "indices within a batch are distinct");
+            assert!(batch.iter().all(|&i| i < 1000));
+        }
+    }
+
+    #[test]
+    fn sampler_handles_small_and_empty_pools() {
+        assert!(sample_batches(0, 128, 200, 1).is_empty());
+        let small = sample_batches(10, 128, 5, 1);
+        assert_eq!(small.len(), 5);
+        for batch in &small {
+            assert_eq!(batch.len(), 10, "effective_m caps at the pool size");
+        }
+    }
+
+    #[test]
+    fn accuracy_drops_when_a_swallower_is_present() {
+        // A toy "parser": counts ';'-separated parts, but a statement that begins
+        // with SWALLOW eats the rest of the script (returns 1). Mirrors how a real
+        // terminator bug collapses the count.
+        let count = |script: &str| {
+            if script.contains("SWALLOW") {
+                1
+            } else {
+                script.split("\n;\n").count()
+            }
+        };
+        let mut clean: Vec<&str> = Vec::new();
+        let owned: Vec<String> = (0..500).map(|i| format!("SELECT {i}")).collect();
+        for s in &owned {
+            clean.push(s);
+        }
+        let ok = evaluate_batches(&clean, "clean", count);
+        assert_eq!(ok.accuracy_pct(), Some(100.0));
+
+        let mut withbug = clean.clone();
+        withbug.push("SWALLOW");
+        let bug = evaluate_batches(&withbug, "bug", count);
+        let acc = bug.accuracy_pct().unwrap();
+        assert!(
+            acc > 0.0 && acc < 100.0,
+            "accuracy {acc} should be between 0 and 100"
+        );
     }
 }
diff --git a/src/bin/build_proc_suites.rs b/src/bin/build_proc_suites.rs
new file mode 100644
index 0000000..2708b57
--- /dev/null
+++ b/src/bin/build_proc_suites.rs
@@ -0,0 +1,549 @@
+//! Rebuild the Spark SQL and Oracle corpus files from their original sources,
+//! keeping compound statements (`BEGIN ... END`, PL/SQL blocks) intact. The
+//! original extractor split on every `;`, shredding Spark SQL scripting blocks
+//! and Oracle PL/SQL blocks into invalid fragments (issue #22, provenance side).
+//!
+//! Spark source: apache/spark `sql/core/src/test/resources/sql-tests/inputs`.
+//! Spark's own harness wraps any statement that contains inner `;` (the scripting
+//! `BEGIN ... END` blocks) in `--QUERY-DELIMITER-START` / `--QUERY-DELIMITER-END`
+//! markers, so we honor those: text between a marker pair is one statement,
+//! everything else splits on `;`.
+//!
+//! Oracle source: oracle-samples/db-sample-schemas. These are SQL*Plus scripts:
+//! a PL/SQL block (`DECLARE`/`BEGIN`/`CREATE ... PROCEDURE|FUNCTION|PACKAGE|
+//! TRIGGER|TYPE`) runs until a line containing only `/`; every other statement
+//! ends at `;`.
+//!
+//!   cargo run --release --bin build_proc_suites -- <spark inputs dir> <oracle schemas dir>
+//!
+//! Then repack `datasets.tar.zst` (Spark and Oracle are provenance, no oracle).
+
+#![allow(
+    clippy::doc_markdown,
+    clippy::too_many_lines,
+    clippy::items_after_statements
+)]
+
+use std::collections::HashSet;
+use std::fs;
+use std::path::{Path, PathBuf};
+
+/// Collapse a raw statement to one trimmed line (drops comments already removed).
+fn normalize(s: &str) -> String {
+    s.split_whitespace().collect::<Vec<_>>().join(" ")
+}
+
+/// Copy a quoted literal verbatim from `chars[i..]` into `buf`, returning the
+/// index just past the closing quote. Handles `'`, `"`, backtick (doubling
+/// escape) and `[` (closed by `]`, no escape).
+fn copy_quote(chars: &[char], mut i: usize, buf: &mut String) -> usize {
+    let open = chars[i];
+    let close = if open == '[' { ']' } else { open };
+    buf.push(open);
+    i += 1;
+    while i < chars.len() {
+        let d = chars[i];
+        if d == close {
+            if close != ']' && chars.get(i + 1) == Some(&close) {
+                buf.push(d);
+                buf.push(d);
+                i += 2;
+                continue;
+            }
+            buf.push(d);
+            return i + 1;
+        }
+        buf.push(d);
+        i += 1;
+    }
+    i
+}
+
+/// Split Spark golden-test SQL into statements, honoring `--QUERY-DELIMITER`
+/// regions (one statement each) and otherwise splitting on top-level `;`. Lines
+/// that are pure directive comments (`--CONFIG`, `--SET`, `--IMPORT`, ...) are
+/// dropped; trailing `--` and `/* */` comments are stripped.
+fn split_spark(input: &str) -> Vec<String> {
+    let mut out = Vec::new();
+    let mut buf = String::new();
+    let mut region = false;
+
+    for raw_line in input.lines() {
+        let trimmed = raw_line.trim_start();
+        if trimmed.starts_with("--QUERY-DELIMITER-START") {
+            region = true;
+            continue;
+        }
+        if trimmed.starts_with("--QUERY-DELIMITER-END") {
+            let s = normalize(&buf);
+            if !s.is_empty() {
+                out.push(s);
+            }
+            buf.clear();
+            region = false;
+            continue;
+        }
+        if region {
+            // Whole region is one statement; keep code, drop full-line comments.
+            if !trimmed.starts_with("--") {
+                strip_line_into(raw_line, &mut buf, &mut Vec::new(), true);
+                buf.push(' ');
+            }
+            continue;
+        }
+        if trimmed.starts_with("--") {
+            continue; // directive / comment line
+        }
+        // Normal line: split on `;`, stripping inline comments and quotes.
+        strip_line_into(raw_line, &mut buf, &mut out, false);
+        buf.push(' ');
+    }
+    let s = normalize(&buf);
+    if !s.is_empty() {
+        out.push(s);
+    }
+    out
+}
+
+/// Append `line` to `buf`, stripping comments and copying quotes verbatim. When
+/// `region_only` is false, a top-level `;` flushes `buf` (normalized) into `out`.
+fn strip_line_into(line: &str, buf: &mut String, out: &mut Vec<String>, region_only: bool) {
+    let chars: Vec<char> = line.chars().collect();
+    let mut i = 0;
+    while i < chars.len() {
+        let c = chars[i];
+        if c == '-' && chars.get(i + 1) == Some(&'-') {
+            break; // rest of line is a comment
+        }
+        if c == '/' && chars.get(i + 1) == Some(&'*') {
+            i += 2;
+            while i < chars.len() && !(chars[i] == '*' && chars.get(i + 1) == Some(&'/')) {
+                i += 1;
+            }
+            i += 2;
+            continue;
+        }
+        if matches!(c, '\'' | '"' | '`' | '[') {
+            i = copy_quote(&chars, i, buf);
+            continue;
+        }
+        if c == ';' && !region_only {
+            let s = normalize(buf);
+            if !s.is_empty() {
+                out.push(s);
+            }
+            buf.clear();
+            i += 1;
+            continue;
+        }
+        buf.push(c);
+        i += 1;
+    }
+}
+
+/// Split a comment-free string on top-level `;`, respecting quoted literals.
+fn split_semicolons(s: &str) -> Vec<String> {
+    let chars: Vec<char> = s.chars().collect();
+    let mut out = Vec::new();
+    let mut buf = String::new();
+    let mut i = 0;
+    while i < chars.len() {
+        let c = chars[i];
+        if matches!(c, '\'' | '"' | '`' | '[') {
+            i = copy_quote(&chars, i, &mut buf);
+            continue;
+        }
+        if c == ';' {
+            out.push(std::mem::take(&mut buf));
+            i += 1;
+            continue;
+        }
+        buf.push(c);
+        i += 1;
+    }
+    out.push(buf);
+    out
+}
+
+/// Harvest the standalone DML statements from inside a PL/SQL block, so the bulk
+/// `INSERT`/`UPDATE`/... that the block wraps remain individual corpus entries. A
+/// leading `BEGIN` glued to the first inner statement is stripped. Non-DML pieces
+/// (declarations, control flow, BEGIN/END) are dropped.
+fn harvest_dml(block: &str) -> Vec<String> {
+    let mut out = Vec::new();
+    for piece in split_semicolons(block) {
+        let mut p = normalize(&piece);
+        if let Some(rest) = p
+            .strip_prefix("BEGIN ")
+            .or_else(|| p.strip_prefix("begin "))
+        {
+            p = rest.trim().to_string();
+        }
+        let first = p
+            .split_whitespace()
+            .next()
+            .unwrap_or("")
+            .to_ascii_uppercase();
+        if matches!(
+            first.as_str(),
+            "INSERT" | "UPDATE" | "DELETE" | "SELECT" | "MERGE" | "WITH"
+        ) {
+            out.push(p);
+        }
+    }
+    out
+}
+
+/// Split Oracle SQL*Plus script text into `(normal, special)`: normal per-statement
+/// corpus entries, and special whole PL/SQL anonymous blocks (kept once, isolated
+/// from the per-statement metrics). A `/` line ends a block; `;` ends other
+/// statements. Anonymous `DECLARE`/`BEGIN` blocks go to `special`, and their inner
+/// DML is also harvested into `normal`; `CREATE ... PROCEDURE/...` blocks are kept
+/// whole in `normal` (real DDL statements).
+fn split_oracle(input: &str) -> (Vec<String>, Vec<String>) {
+    let mut normal = Vec::new();
+    let mut special = Vec::new();
+    let mut buf = String::new();
+    let mut in_block = false;
+    let mut anon = false;
+    let mut started = false;
+
+    for raw_line in input.lines() {
+        let trimmed = raw_line.trim();
+        // SQL*Plus block terminator: end the current PL/SQL block.
+        if trimmed == "/" {
+            let s = normalize(&buf);
+            if !s.is_empty() {
+                if anon {
+                    special.push(s);
+                    normal.extend(harvest_dml(&buf));
+                } else {
+                    normal.push(s);
+                }
+            }
+            buf.clear();
+            in_block = false;
+            anon = false;
+            started = false;
+            continue;
+        }
+        // Skip pure comment lines and SQL*Plus client directives (REM, PROMPT,
+        // SET, ACCEPT, etc.) when no statement is in progress. These are not SQL
+        // and, left in the buffer, would also set `started` and mask a following
+        // `BEGIN` block opener (the ACCEPT ... HIDE / BEGIN IF ... pattern).
+        if buf.trim().is_empty() {
+            let up = trimmed.to_ascii_uppercase();
+            // Leading SQL*Plus command words (skip the whole line when one starts it).
+            const DIRECTIVES: &[&str] = &[
+                "PROMPT",
+                "SET ",
+                "DEFINE",
+                "UNDEFINE",
+                "SPOOL",
+                "WHENEVER",
+                "CONNECT",
+                "ALTER SESSION",
+                "COLUMN ",
+                "ACCEPT ",
+                "PAUSE",
+                "EXEC ",
+                "EXECUTE ",
+                "VARIABLE ",
+                "VAR ",
+                "PRINT ",
+                "SHOW ",
+                "BREAK",
+                "COMPUTE ",
+                "TTITLE",
+                "BTITLE",
+                "STORE ",
+                "SAVE ",
+                "HOST",
+                "CLEAR ",
+                "TIMING",
+                "START ",
+                "ACCEPT",
+            ];
+            if trimmed.is_empty()
+                || trimmed.starts_with("--")
+                || trimmed.starts_with('@')
+                || up.starts_with("REM ")
+                || up == "REM"
+                || DIRECTIVES.iter().any(|d| up.starts_with(d))
+            {
+                continue;
+            }
+        }
+
+        let chars: Vec<char> = raw_line.chars().collect();
+        let mut i = 0;
+        while i < chars.len() {
+            let c = chars[i];
+            if c == '-' && chars.get(i + 1) == Some(&'-') {
+                break;
+            }
+            if c == '/' && chars.get(i + 1) == Some(&'*') {
+                i += 2;
+                while i < chars.len() && !(chars[i] == '*' && chars.get(i + 1) == Some(&'/')) {
+                    i += 1;
+                }
+                i += 2;
+                continue;
+            }
+            if matches!(c, '\'' | '"' | '`' | '[') {
+                i = copy_quote(&chars, i, &mut buf);
+                started = true;
+                continue;
+            }
+            if (c.is_alphanumeric() || c == '_') && !started && c.is_alphabetic() {
+                // Detect the leading keyword to decide block vs simple.
+                let mut j = i;
+                while j < chars.len() && (chars[j].is_alphanumeric() || chars[j] == '_') {
+                    j += 1;
+                }
+                let w: String = chars[i..j].iter().collect::<String>().to_ascii_uppercase();
+                if w == "DECLARE" || w == "BEGIN" {
+                    in_block = true;
+                    anon = true;
+                }
+                started = true;
+                // fall through to copy chars normally below
+            }
+            // Once inside a CREATE statement, promote to block on a body keyword.
+            if c.is_alphabetic() {
+                let mut j = i;
+                while j < chars.len() && (chars[j].is_alphanumeric() || chars[j] == '_') {
+                    j += 1;
+                }
+                let w: String = chars[i..j].iter().collect::<String>().to_ascii_uppercase();
+                if matches!(
+                    w.as_str(),
+                    "PROCEDURE" | "FUNCTION" | "PACKAGE" | "TRIGGER" | "TYPE"
+                ) && buf.to_ascii_uppercase().trim_start().starts_with("CREATE")
+                {
+                    in_block = true;
+                }
+                buf.push_str(&chars[i..j].iter().collect::<String>());
+                i = j;
+                continue;
+            }
+            if c == ';' && !in_block {
+                let s = normalize(&buf);
+                if !s.is_empty() {
+                    normal.push(s);
+                }
+                buf.clear();
+                started = false;
+                i += 1;
+                continue;
+            }
+            buf.push(c);
+            i += 1;
+        }
+        buf.push(' ');
+    }
+    let s = normalize(&buf);
+    if !s.is_empty() {
+        if anon {
+            special.push(s);
+            normal.extend(harvest_dml(&buf));
+        } else {
+            normal.push(s);
+        }
+    }
+    (normal, special)
+}
+
+fn sql_files(dir: &Path) -> Vec<PathBuf> {
+    let mut out = Vec::new();
+    let mut stack = vec![dir.to_path_buf()];
+    while let Some(d) = stack.pop() {
+        let Ok(entries) = fs::read_dir(&d) else {
+            continue;
+        };
+        for e in entries.flatten() {
+            let p = e.path();
+            if p.is_dir() {
+                stack.push(p);
+            } else if p.extension().is_some_and(|x| x == "sql") {
+                out.push(p);
+            }
+        }
+    }
+    out.sort();
+    out
+}
+
+/// Load the lines of `datasets/<dialect>/<file>` into `seen` (for cross-file dedup).
+fn seed_seen(seen: &mut HashSet<String>, rel: &str) {
+    if let Ok(c) = fs::read_to_string(Path::new("datasets").join(rel)) {
+        for l in c.lines() {
+            if !l.trim().is_empty() {
+                seen.insert(l.trim().to_string());
+            }
+        }
+    }
+}
+
+fn build_spark(src: &Path) {
+    let mut seen = HashSet::new();
+    seed_seen(&mut seen, "spark_sql/clickbench_spark.txt");
+    seed_seen(&mut seen, "spark_sql/databricks_perf.txt");
+    let mut kept = Vec::new();
+    let mut total = 0usize;
+    for f in sql_files(src) {
+        for s in split_spark(&fs::read_to_string(&f).unwrap_or_default()) {
+            total += 1;
+            if seen.insert(s.clone()) {
+                kept.push(s);
+            }
+        }
+    }
+    fs::write(
+        "datasets/spark_sql/spark_sql_tst.txt",
+        format!("{}\n", kept.join("\n")),
+    )
+    .expect("write spark corpus");
+    println!(
+        "spark_sql: {total} parsed, {} kept. wrote datasets/spark_sql/spark_sql_tst.txt",
+        kept.len()
+    );
+}
+
+fn build_oracle(src: &Path) {
+    let mut seen = HashSet::new();
+    seed_seen(&mut seen, "oracle/oracle_examples.txt");
+    let mut normal_kept = Vec::new();
+    let mut special_seen = HashSet::new();
+    let mut special_kept = Vec::new();
+    let (mut n_total, mut s_total) = (0usize, 0usize);
+    for f in sql_files(src) {
+        let (normal, special) = split_oracle(&fs::read_to_string(&f).unwrap_or_default());
+        for s in normal {
+            n_total += 1;
+            if seen.insert(s.clone()) {
+                normal_kept.push(s);
+            }
+        }
+        for s in special {
+            s_total += 1;
+            if special_seen.insert(s.clone()) {
+                special_kept.push(s);
+            }
+        }
+    }
+    fs::write(
+        "datasets/oracle/oracle_schemas.txt",
+        format!("{}\n", normal_kept.join("\n")),
+    )
+    .expect("write oracle corpus");
+    // Special PL/SQL blocks live outside any dialect directory, so the
+    // per-statement benchmark never loads them (they would be huge outliers); they
+    // are kept once as whole-block test cases.
+    fs::create_dir_all("datasets/special").expect("create datasets/special");
+    fs::write(
+        "datasets/special/oracle_plsql_blocks.txt",
+        format!("{}\n", special_kept.join("\n")),
+    )
+    .expect("write oracle blocks");
+    println!(
+        "oracle: {n_total} normal parsed, {} kept; {s_total} blocks, {} special kept (datasets/special/oracle_plsql_blocks.txt)",
+        normal_kept.len(),
+        special_kept.len(),
+    );
+}
+
+fn main() {
+    if let Some(s) = std::env::args().nth(1) {
+        build_spark(Path::new(&s));
+    }
+    if let Some(o) = std::env::args().nth(2) {
+        build_oracle(Path::new(&o));
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::{split_oracle, split_spark};
+
+    #[test]
+    fn spark_region_is_one_statement() {
+        let sql = "SELECT 1;\n--QUERY-DELIMITER-START\nBEGIN\n  DECLARE x INT;\n  SET x = 1;\nEND;\n--QUERY-DELIMITER-END\nSELECT 2;";
+        assert_eq!(
+            split_spark(sql),
+            vec![
+                "SELECT 1".to_string(),
+                "BEGIN DECLARE x INT; SET x = 1; END;".to_string(),
+                "SELECT 2".to_string(),
+            ]
+        );
+    }
+
+    #[test]
+    fn spark_strips_directives_and_comments() {
+        let sql = "--CONFIG dim\n--SET spark.x=1\nSELECT 1 -- trailing\n;\nSET spark.y = 2;";
+        assert_eq!(
+            split_spark(sql),
+            vec!["SELECT 1".to_string(), "SET spark.y = 2".to_string()]
+        );
+    }
+
+    #[test]
+    fn oracle_anon_block_is_special_and_inner_dml_harvested() {
+        // The whole anonymous block is kept once as a special entry, and its inner
+        // INSERTs also become individual normal corpus statements (in order).
+        let sql = "INSERT INTO t VALUES (1);\nBEGIN\n  INSERT INTO t VALUES (2);\n  INSERT INTO t VALUES (3);\nEND;\n/\nINSERT INTO t VALUES (4);";
+        let (normal, special) = split_oracle(sql);
+        assert_eq!(
+            normal,
+            vec![
+                "INSERT INTO t VALUES (1)".to_string(),
+                "INSERT INTO t VALUES (2)".to_string(),
+                "INSERT INTO t VALUES (3)".to_string(),
+                "INSERT INTO t VALUES (4)".to_string(),
+            ]
+        );
+        assert_eq!(
+            special,
+            vec!["BEGIN INSERT INTO t VALUES (2); INSERT INTO t VALUES (3); END;".to_string()]
+        );
+    }
+
+    #[test]
+    fn oracle_declare_block_keeps_inner_semicolons() {
+        // Declarations and assignments are not DML, so only the INSERT is harvested.
+        let sql = "DECLARE v NUMBER;\nBEGIN\n  v := 1;\n  INSERT INTO t VALUES (v);\nEND;\n/";
+        let (normal, special) = split_oracle(sql);
+        assert_eq!(normal, vec!["INSERT INTO t VALUES (v)".to_string()]);
+        assert_eq!(
+            special,
+            vec!["DECLARE v NUMBER; BEGIN v := 1; INSERT INTO t VALUES (v); END;".to_string()]
+        );
+    }
+
+    #[test]
+    fn oracle_plain_statements_split_on_semicolon() {
+        let sql = "CREATE TABLE t (a NUMBER);\nINSERT INTO t VALUES (1);";
+        let (normal, special) = split_oracle(sql);
+        assert_eq!(
+            normal,
+            vec![
+                "CREATE TABLE t (a NUMBER)".to_string(),
+                "INSERT INTO t VALUES (1)".to_string(),
+            ]
+        );
+        assert!(special.is_empty());
+    }
+
+    #[test]
+    fn oracle_create_procedure_block_stays_whole_in_normal() {
+        // A CREATE PROCEDURE block is real DDL: kept whole, in normal, not special.
+        let sql = "CREATE PROCEDURE p IS\nBEGIN\n  INSERT INTO t VALUES (1);\nEND;\n/";
+        let (normal, special) = split_oracle(sql);
+        assert_eq!(
+            normal,
+            vec!["CREATE PROCEDURE p IS BEGIN INSERT INTO t VALUES (1); END;".to_string()]
+        );
+        assert!(special.is_empty());
+    }
+}
diff --git a/src/bin/build_sqlite_suite.rs b/src/bin/build_sqlite_suite.rs
new file mode 100644
index 0000000..d2d3764
--- /dev/null
+++ b/src/bin/build_sqlite_suite.rs
@@ -0,0 +1,354 @@
+//! Rebuild `datasets/sqlite/sqlite_official_suite.txt` from the original SQLite
+//! official test suite, with a SQLite-aware statement splitter that keeps
+//! compound `CREATE TRIGGER ... BEGIN ...; ... END` statements intact.
+//!
+//! The corpus is one statement per line. The original extractor (removed from the
+//! repo) split on every `;`, which shredded trigger bodies on their inner
+//! semicolons and produced invalid fragments (issue #22). This rebuilds the suite
+//! correctly: it splits only on top-level `;` (outside string/identifier quotes,
+//! comments, `BEGIN ... END` trigger bodies, and `CASE ... END`), normalizes each
+//! statement to one line, strips comments, and dedupes within the suite and
+//! against the other committed SQLite corpus files.
+//!
+//! Source: the SQLite project's own tests, public domain, as bundled in
+//! codeschool/sqlite-parser under `test/sql/official-suite/*.sql`. Clone that repo
+//! and pass the directory:
+//!
+//!   git clone --depth 1 https://github.com/codeschool/sqlite-parser /tmp/sp
+//!   cargo run --release --bin build_sqlite_suite -- /tmp/sp/test/sql/official-suite
+//!
+//! Then repack (`tar --zstd -cf datasets.tar.zst datasets`) and re-run the SQLite
+//! oracle (`cargo run --release -p oracle -- sqlite`).
+
+#![allow(
+    clippy::doc_markdown,
+    clippy::too_many_lines,
+    clippy::items_after_statements
+)]
+
+use std::collections::HashSet;
+use std::fs;
+use std::path::Path;
+
+/// Split raw SQLite script text into normalized one-line statements.
+///
+/// Splits on top-level `;` only: semicolons inside single/double/backtick/bracket
+/// quotes, `--` and block comments, a `CREATE TRIGGER` `BEGIN ... END` body, or a
+/// `CASE ... END` are not statement terminators. Each statement is normalized to a
+/// single line (whitespace runs collapsed) with comments removed.
+#[must_use]
+fn split_sql(input: &str) -> Vec<String> {
+    let mut out = Vec::new();
+    let mut buf = String::new();
+    let mut word = String::new();
+    let mut case_depth = 0usize;
+    let mut block_depth = 0usize;
+    let mut is_trigger = false;
+
+    // Apply a completed word's effect on block/case tracking.
+    fn classify(
+        word: &mut String,
+        case_depth: &mut usize,
+        block_depth: &mut usize,
+        is_trigger: &mut bool,
+    ) {
+        if word.is_empty() {
+            return;
+        }
+        match word.to_ascii_uppercase().as_str() {
+            "TRIGGER" => *is_trigger = true,
+            "CASE" => *case_depth += 1,
+            "END" => {
+                if *case_depth > 0 {
+                    *case_depth -= 1;
+                } else if *block_depth > 0 {
+                    *block_depth -= 1;
+                }
+            }
+            // The only BEGIN inside a CREATE TRIGGER is the body opener. A bare
+            // BEGIN (transaction) is not a trigger, so it does not open a block.
+            "BEGIN" if *is_trigger => *block_depth += 1,
+            _ => {}
+        }
+        word.clear();
+    }
+
+    // Push a single normalizing space (collapse runs, skip leading).
+    fn push_space(buf: &mut String) {
+        if !buf.is_empty() && !buf.ends_with(' ') {
+            buf.push(' ');
+        }
+    }
+
+    let end_statement = |buf: &mut String,
+                         out: &mut Vec<String>,
+                         case_depth: &mut usize,
+                         block_depth: &mut usize,
+                         is_trigger: &mut bool| {
+        let s = buf.trim().to_string();
+        if !s.is_empty() {
+            // Final pass: collapse any whitespace that survived inside quoted
+            // literals so the statement is one line (string contents do not
+            // affect parse benchmarking).
+            let normalized = s.split_whitespace().collect::<Vec<_>>().join(" ");
+            out.push(normalized);
+        }
+        buf.clear();
+        *case_depth = 0;
+        *block_depth = 0;
+        *is_trigger = false;
+    };
+
+    let chars: Vec<char> = input.chars().collect();
+    let mut i = 0;
+    while i < chars.len() {
+        let c = chars[i];
+
+        // Comments: strip to a single space.
+        if c == '-' && chars.get(i + 1) == Some(&'-') {
+            classify(
+                &mut word,
+                &mut case_depth,
+                &mut block_depth,
+                &mut is_trigger,
+            );
+            while i < chars.len() && chars[i] != '\n' {
+                i += 1;
+            }
+            push_space(&mut buf);
+            continue;
+        }
+        if c == '/' && chars.get(i + 1) == Some(&'*') {
+            classify(
+                &mut word,
+                &mut case_depth,
+                &mut block_depth,
+                &mut is_trigger,
+            );
+            i += 2;
+            while i < chars.len() && !(chars[i] == '*' && chars.get(i + 1) == Some(&'/')) {
+                i += 1;
+            }
+            i += 2;
+            push_space(&mut buf);
+            continue;
+        }
+
+        // Quoted string / identifier: copy verbatim, honoring doubling escapes.
+        if matches!(c, '\'' | '"' | '`' | '[') {
+            classify(
+                &mut word,
+                &mut case_depth,
+                &mut block_depth,
+                &mut is_trigger,
+            );
+            let close = if c == '[' { ']' } else { c };
+            buf.push(c);
+            i += 1;
+            loop {
+                if i >= chars.len() {
+                    break;
+                }
+                let d = chars[i];
+                if d == close {
+                    // Doubling escape ('' "" ``) keeps the quote open. Brackets
+                    // have no escape in SQLite.
+                    if close != ']' && chars.get(i + 1) == Some(&close) {
+                        buf.push(d);
+                        buf.push(d);
+                        i += 2;
+                        continue;
+                    }
+                    buf.push(d);
+                    i += 1;
+                    break;
+                }
+                buf.push(d);
+                i += 1;
+            }
+            continue;
+        }
+
+        if c.is_alphanumeric() || c == '_' {
+            word.push(c);
+            buf.push(c);
+            i += 1;
+            continue;
+        }
+
+        // Non-word character: settle the pending word first.
+        classify(
+            &mut word,
+            &mut case_depth,
+            &mut block_depth,
+            &mut is_trigger,
+        );
+
+        if c == ';' && case_depth == 0 && block_depth == 0 {
+            end_statement(
+                &mut buf,
+                &mut out,
+                &mut case_depth,
+                &mut block_depth,
+                &mut is_trigger,
+            );
+            i += 1;
+            continue;
+        }
+
+        if c.is_whitespace() {
+            push_space(&mut buf);
+        } else {
+            buf.push(c);
+        }
+        i += 1;
+    }
+    classify(
+        &mut word,
+        &mut case_depth,
+        &mut block_depth,
+        &mut is_trigger,
+    );
+    end_statement(
+        &mut buf,
+        &mut out,
+        &mut case_depth,
+        &mut block_depth,
+        &mut is_trigger,
+    );
+    out
+}
+
+fn main() {
+    let src = std::env::args().nth(1).unwrap_or_else(|| {
+        eprintln!("usage: build_sqlite_suite <official-suite dir>");
+        std::process::exit(2);
+    });
+    let src = Path::new(&src);
+
+    // Statements already in the other committed SQLite corpus files, to dedupe
+    // against (keep the suite from duplicating Spider / sql-create-context).
+    let mut seen: HashSet<String> = HashSet::new();
+    for other in ["spider_sqlite.txt", "sql_create_ctx.txt"] {
+        let p = Path::new("datasets/sqlite").join(other);
+        if let Ok(content) = fs::read_to_string(&p) {
+            for line in content.lines() {
+                let l = line.trim();
+                if !l.is_empty() {
+                    seen.insert(l.to_string());
+                }
+            }
+        }
+    }
+    let existing = seen.len();
+
+    let mut files: Vec<_> = fs::read_dir(src)
+        .expect("read official-suite dir")
+        .filter_map(Result::ok)
+        .map(|e| e.path())
+        .filter(|p| p.extension().is_some_and(|x| x == "sql"))
+        .collect();
+    files.sort();
+
+    let mut out_lines: Vec<String> = Vec::new();
+    let mut total = 0usize;
+    for f in &files {
+        let content = fs::read_to_string(f).expect("read sql file");
+        for stmt in split_sql(&content) {
+            total += 1;
+            if seen.insert(stmt.clone()) {
+                out_lines.push(stmt);
+            }
+        }
+    }
+
+    let dest = Path::new("datasets/sqlite/sqlite_official_suite.txt");
+    fs::write(dest, format!("{}\n", out_lines.join("\n"))).expect("write suite");
+    println!(
+        "{} source files, {total} statements parsed, {} kept after dedup ({} were dupes of the existing {existing} SQLite statements or each other).",
+        files.len(),
+        out_lines.len(),
+        total - out_lines.len(),
+    );
+    println!("wrote {}", dest.display());
+}
+
+#[cfg(test)]
+mod tests {
+    use super::split_sql;
+
+    #[test]
+    fn keeps_trigger_body_intact() {
+        let sql = "CREATE TRIGGER r1 AFTER INSERT ON t2 BEGIN\n  SELECT 'hello';\nEND;\nSELECT 1;";
+        assert_eq!(
+            split_sql(sql),
+            vec![
+                "CREATE TRIGGER r1 AFTER INSERT ON t2 BEGIN SELECT 'hello'; END".to_string(),
+                "SELECT 1".to_string(),
+            ]
+        );
+    }
+
+    #[test]
+    fn multi_statement_trigger_body_stays_one_statement() {
+        let sql = "CREATE TRIGGER t AFTER UPDATE ON x BEGIN UPDATE a SET b=1; DELETE FROM c; END; DROP TABLE x;";
+        assert_eq!(
+            split_sql(sql),
+            vec![
+                "CREATE TRIGGER t AFTER UPDATE ON x BEGIN UPDATE a SET b=1; DELETE FROM c; END"
+                    .to_string(),
+                "DROP TABLE x".to_string(),
+            ]
+        );
+    }
+
+    #[test]
+    fn leading_semicolons_and_newlines() {
+        // The suite often puts the terminator at the start of the next line.
+        let sql = "CREATE TABLE abc(a, b, c)\n;ALTER TABLE abc ADD d INTEGER\n;SELECT 1\n";
+        assert_eq!(
+            split_sql(sql),
+            vec![
+                "CREATE TABLE abc(a, b, c)".to_string(),
+                "ALTER TABLE abc ADD d INTEGER".to_string(),
+                "SELECT 1".to_string(),
+            ]
+        );
+    }
+
+    #[test]
+    fn semicolons_in_strings_and_comments_do_not_split() {
+        let sql = "SELECT ';' AS x -- ; not a split\n; SELECT /* ; */ 2;";
+        assert_eq!(
+            split_sql(sql),
+            vec!["SELECT ';' AS x".to_string(), "SELECT 2".to_string()]
+        );
+    }
+
+    #[test]
+    fn case_end_does_not_close_a_trigger() {
+        let sql =
+            "CREATE TRIGGER t AFTER INSERT ON x BEGIN SELECT CASE WHEN 1 THEN 2 ELSE 3 END; END; SELECT 9;";
+        assert_eq!(
+            split_sql(sql),
+            vec![
+                "CREATE TRIGGER t AFTER INSERT ON x BEGIN SELECT CASE WHEN 1 THEN 2 ELSE 3 END; END"
+                    .to_string(),
+                "SELECT 9".to_string(),
+            ]
+        );
+    }
+
+    #[test]
+    fn bare_begin_transaction_is_its_own_statement() {
+        let sql = "BEGIN; INSERT INTO t VALUES(1); COMMIT;";
+        assert_eq!(
+            split_sql(sql),
+            vec![
+                "BEGIN".to_string(),
+                "INSERT INTO t VALUES(1)".to_string(),
+                "COMMIT".to_string(),
+            ]
+        );
+    }
+}
diff --git a/src/bin/repair_corpus.rs b/src/bin/repair_corpus.rs
new file mode 100644
index 0000000..e0b889e
--- /dev/null
+++ b/src/bin/repair_corpus.rs
@@ -0,0 +1,387 @@
+//! Clean residual corpus artifacts left by the original `;`-only extractor in the
+//! corpus files that have no upstream reconstruction tool (issue #22, the long
+//! tail). The reconstructed SQLite/Spark/Oracle suites are rebuilt by
+//! `build_sqlite_suite` / `build_proc_suites`; this pass repairs the rest in
+//! place on the unpacked `datasets/`.
+//!
+//! Two transforms, both conservative (they never invent SQL and only ever drop a
+//! line that cannot be a valid standalone statement):
+//!
+//!  1. T-SQL `GO` batch separators. `GO` is a sqlcmd/SSMS client directive, not
+//!     T-SQL grammar. The extractor split on `;`, so `GO` lines with no semicolon
+//!     were glued onto the next statement (`GO SELECT ...`) or sat between two
+//!     statements on one line (`... GO ...`). The real SQL Server oracle accepts
+//!     `GO <stmt>`, so every parser that correctly rejects `GO` was charged a
+//!     false recall failure. We split each line on top-level `GO` tokens,
+//!     recovering the real statements. Applied to the `tsql` corpus and the mixed
+//!     `multi` corpus (which also carries T-SQL GO batches).
+//!
+//!  2. Pure procedural fragments (all dialects). Lines that are only a block
+//!     keyword (`END IF`, `END LOOP`, `END TRY`, `BEGIN CATCH`, ...) or that start
+//!     with a clause keyword that can never begin a statement (`ELSE`, `ELSIF`,
+//!     `WHEN`, `THEN`, `AND`, `OR`, `LOOP`) are body pieces of a split
+//!     `CREATE FUNCTION`/`PROCEDURE`/batch and are dropped. Bare `END`/`END;` is
+//!     kept for SQLite, where `END` is a COMMIT synonym, but dropped elsewhere.
+//!     `DELIMITER` client directives (any dialect) and MySQL/`multi` `//`
+//!     routine-delimiter fragments are dropped too. The prefix rule also catches
+//!     the string-literal fragments that begin mid-prose (`And then my heart ...`,
+//!     `loop will exit ...`).
+//!
+//! A general multi-line string-literal repair was considered and rejected: the
+//! corpus mixes `''`-doubling and backslash-escaping dialects (plus PG `E'...'`
+//! and dollar-quoting), so a quote scanner mislabels valid statements wholesale.
+//! The few genuine string fragments that remain are mostly provenance-only noise.
+//!
+//! Run `--apply` to write; otherwise it is a dry run reporting counts and samples.
+//! After applying, repack `datasets.tar.zst` and re-run the T-SQL oracle (the
+//! `GO` split produces new statement strings that need fresh labels).
+
+#![allow(
+    clippy::doc_markdown,
+    clippy::too_many_lines,
+    clippy::items_after_statements
+)]
+
+use std::collections::HashSet;
+use std::fs;
+use std::path::Path;
+
+use sql_ast_benchmark::datasets::{ensure_corpus, Dialect};
+
+/// Split a T-SQL line on top-level `GO` batch separators (a `GO` token bounded by
+/// whitespace or line edges, outside any quote). Returns the recovered statement
+/// pieces (callers normalize/drop empties). Quote tracking (`'...'` with `''`
+/// escaping, `"..."`, `[...]`) keeps a `GO` inside a literal from splitting; T-SQL
+/// does not use backslash escapes, so `''`/`""` doubling is the only escape.
+fn split_go(line: &str) -> Vec<String> {
+    let chars: Vec<char> = line.chars().collect();
+    let mut pieces = Vec::new();
+    let mut buf = String::new();
+    let mut i = 0;
+    while i < chars.len() {
+        let c = chars[i];
+        // Consume a quoted literal verbatim so a `GO` inside it is not a split.
+        if matches!(c, '\'' | '"' | '[') {
+            let close = if c == '[' { ']' } else { c };
+            buf.push(c);
+            i += 1;
+            while i < chars.len() {
+                let d = chars[i];
+                if d == close {
+                    if close != ']' && chars.get(i + 1) == Some(&close) {
+                        buf.push(d);
+                        buf.push(d);
+                        i += 2;
+                        continue;
+                    }
+                    buf.push(d);
+                    i += 1;
+                    break;
+                }
+                buf.push(d);
+                i += 1;
+            }
+            continue;
+        }
+        // A `GO` token: preceded by start-or-space, followed by space-or-end.
+        let at_boundary = i == 0 || chars[i - 1].is_whitespace();
+        if at_boundary
+            && (c == 'G' || c == 'g')
+            && matches!(chars.get(i + 1), Some('O' | 'o'))
+            && chars.get(i + 2).is_none_or(|n| n.is_whitespace())
+        {
+            pieces.push(std::mem::take(&mut buf));
+            i += 2;
+            continue;
+        }
+        buf.push(c);
+        i += 1;
+    }
+    pieces.push(buf);
+    pieces
+}
+
+fn normalize(s: &str) -> String {
+    s.split_whitespace().collect::<Vec<_>>().join(" ")
+}
+
+/// Whether a balanced line is a pure procedural fragment that should be dropped.
+fn is_procedural_fragment(line: &str, dialect: Dialect) -> bool {
+    let up = normalize(line).to_ascii_uppercase();
+    let bare = up.trim_end_matches(';').trim_end();
+
+    // `END` and `END <one token>` (END IF / END LOOP / END CASE / END <label>)
+    // are block closers, never a standalone statement, EXCEPT in SQLite where a
+    // bare `END` (and `END TRANSACTION`) is a COMMIT synonym.
+    let end_word = bare.strip_prefix("END ").map(str::trim);
+    if bare == "END" || end_word.is_some_and(|w| !w.is_empty() && !w.contains(' ')) {
+        if dialect == Dialect::Sqlite {
+            return false;
+        }
+        return end_word != Some("TRANSACTION");
+    }
+
+    // Clause / control-flow keywords that can never begin a statement. `END TRY`
+    // and `BEGIN TRY`/`CATCH` are matched as prefixes to also catch the glued
+    // `END TRY BEGIN CATCH SELECT ...` chunks of a split TRY/CATCH batch.
+    const PREFIX: &[&str] = &[
+        "ELSE ",
+        "ELSIF ",
+        "ELSEIF ",
+        "WHEN ",
+        "THEN ",
+        "AND ",
+        "OR ",
+        "LOOP ",
+        "END TRY",
+        "END CATCH",
+        "BEGIN TRY",
+        "BEGIN CATCH",
+        "END IF",
+        "END LOOP",
+        "END WHILE",
+        "END FOR",
+        "END CASE",
+    ];
+    if PREFIX.iter().any(|p| up.starts_with(p)) {
+        return true;
+    }
+
+    // `DELIMITER` is a client directive, never valid SQL in any dialect.
+    if up.starts_with("DELIMITER ") || up == "DELIMITER" {
+        return true;
+    }
+    // MySQL `//` custom-delimiter routine wreckage (also present in the mixed
+    // `multi` corpus). `//` is not an operator in these dialects.
+    if matches!(dialect, Dialect::Mysql | Dialect::Multi)
+        && (up == "//" || up.ends_with(" //") || up.contains(" // "))
+    {
+        return true;
+    }
+    // Leaked python `print(...)` lines (from shell/python test fixtures, e.g. the
+    // ClickHouse `print(xxhash...)` snippets). Never valid SQL, except T-SQL, where
+    // `PRINT(expr)` is a real statement, so that dialect is excluded.
+    if dialect != Dialect::Tsql && up.starts_with("PRINT(") {
+        return true;
+    }
+    false
+}
+
+/// Result of repairing one corpus file.
+struct FileStat {
+    name: String,
+    kept: usize,
+    go_split: usize,
+    dropped_proc: usize,
+    dropped_dup: usize,
+    sample_go: Vec<String>,
+    sample_drop: Vec<String>,
+}
+
+fn repair_file(path: &Path, dialect: Dialect) -> (Vec<String>, FileStat) {
+    let content = fs::read_to_string(path).unwrap_or_default();
+    let mut out = Vec::new();
+    let mut seen = HashSet::new();
+    let mut stat = FileStat {
+        name: path.file_name().unwrap().to_string_lossy().into_owned(),
+        kept: 0,
+        go_split: 0,
+        dropped_proc: 0,
+        dropped_dup: 0,
+        sample_go: Vec::new(),
+        sample_drop: Vec::new(),
+    };
+
+    for raw in content.lines() {
+        if raw.trim().is_empty() {
+            continue;
+        }
+        // 1. T-SQL GO batch-separator split. Also applied to the mixed `multi`
+        // corpus, which carries T-SQL GO batches. Other dialects: identity.
+        let pieces = if matches!(dialect, Dialect::Tsql | Dialect::Multi) {
+            let p = split_go(raw);
+            if p.len() > 1 {
+                stat.go_split += 1;
+                if stat.sample_go.len() < 6 {
+                    stat.sample_go
+                        .push(raw.chars().take(90).collect::<String>());
+                }
+            }
+            p
+        } else {
+            vec![raw.to_string()]
+        };
+
+        // 2. Procedural-fragment drop on each resulting piece.
+        for piece in pieces {
+            let n = normalize(&piece);
+            if n.is_empty() {
+                continue;
+            }
+            if is_procedural_fragment(&n, dialect) {
+                stat.dropped_proc += 1;
+                if stat.sample_drop.len() < 12 {
+                    stat.sample_drop
+                        .push(n.chars().take(90).collect::<String>());
+                }
+                continue;
+            }
+            // Preserve the corpus's one-occurrence invariant (GO-splitting can
+            // re-introduce duplicate SET/USE statements).
+            if !seen.insert(n.clone()) {
+                stat.dropped_dup += 1;
+                continue;
+            }
+            out.push(n);
+            stat.kept += 1;
+        }
+    }
+    (out, stat)
+}
+
+fn main() {
+    if let Err(e) = ensure_corpus() {
+        eprintln!("ERROR: could not prepare datasets/: {e}");
+        std::process::exit(1);
+    }
+    let apply = std::env::args().any(|a| a == "--apply");
+    println!(
+        "repair_corpus: {} (pass --apply to write)\n",
+        if apply { "APPLYING" } else { "DRY RUN" }
+    );
+
+    for dialect in Dialect::ALL {
+        let dir = Path::new("datasets").join(dialect.dir_name());
+        let Ok(entries) = fs::read_dir(&dir) else {
+            continue;
+        };
+        let mut files: Vec<_> = entries
+            .filter_map(Result::ok)
+            .map(|e| e.path())
+            .filter(|p| p.extension().is_some_and(|x| x == "txt"))
+            .collect();
+        files.sort();
+        for f in files {
+            let (out, stat) = repair_file(&f, dialect);
+            let changed = stat.go_split + stat.dropped_proc + stat.dropped_dup;
+            if changed == 0 {
+                continue;
+            }
+            println!(
+                "{}/{}: kept {}, GO-split {}, dropped {} proc + {} dup",
+                dialect.dir_name(),
+                stat.name,
+                stat.kept,
+                stat.go_split,
+                stat.dropped_proc,
+                stat.dropped_dup
+            );
+            for s in &stat.sample_go {
+                println!("    GO  | {s}");
+            }
+            for s in &stat.sample_drop {
+                println!("    DROP| {s}");
+            }
+            if apply {
+                fs::write(&f, format!("{}\n", out.join("\n"))).expect("write repaired corpus");
+            }
+        }
+    }
+    if !apply {
+        println!("\nDry run only. Re-run with --apply to write, then repack and re-run the T-SQL oracle.");
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::{is_procedural_fragment, split_go};
+    use sql_ast_benchmark::datasets::Dialect;
+
+    #[test]
+    fn go_splits_leading_and_midline() {
+        assert_eq!(
+            split_go("SET XACT_ABORT ON GO SELECT 1 GO"),
+            vec!["SET XACT_ABORT ON ", " SELECT 1 ", ""]
+        );
+        assert_eq!(split_go("GO SELECT 1"), vec!["", " SELECT 1"]);
+    }
+
+    #[test]
+    fn go_inside_a_string_or_identifier_is_not_a_separator() {
+        assert_eq!(split_go("SELECT 'A GO B'"), vec!["SELECT 'A GO B'"]);
+        assert_eq!(
+            split_go("SELECT 'it''s GO time'"),
+            vec!["SELECT 'it''s GO time'"]
+        );
+        assert_eq!(split_go("SELECT [GO]"), vec!["SELECT [GO]"]);
+        // A column/alias literally named GO would also be left intact only when
+        // quoted; a bare GO token is always treated as the batch separator.
+    }
+
+    #[test]
+    fn procedural_fragments_are_detected() {
+        for s in [
+            "END IF",
+            "end loop",
+            "END mylabel",
+            "END if_function",
+            "ELSE result := 1",
+            "ELSIF x THEN y",
+            "AND b = 2",
+            "loop will exit after 30 seconds",
+            "BEGIN CATCH",
+            "END TRY BEGIN CATCH SELECT ERROR_LINE()",
+        ] {
+            assert!(is_procedural_fragment(s, Dialect::Postgresql), "{s:?}");
+        }
+    }
+
+    #[test]
+    fn real_statements_are_kept() {
+        for s in [
+            "SELECT 1",
+            "CREATE TABLE t (a int)",
+            "ALTER TABLE t RENAME CONSTRAINT c TO d",
+            "WITH x AS (SELECT 1) SELECT * FROM x",
+        ] {
+            assert!(!is_procedural_fragment(s, Dialect::Postgresql), "{s:?}");
+        }
+    }
+
+    #[test]
+    fn bare_end_is_a_commit_in_sqlite_only() {
+        assert!(!is_procedural_fragment("END", Dialect::Sqlite));
+        assert!(!is_procedural_fragment("END;", Dialect::Sqlite));
+        assert!(is_procedural_fragment("END", Dialect::Tsql));
+    }
+
+    #[test]
+    fn leaked_python_print_is_dropped_except_tsql() {
+        assert!(is_procedural_fragment(
+            "print(xxhash.xxh3_128_hexdigest(b'ClickHouse').upper())",
+            Dialect::Clickhouse
+        ));
+        assert!(is_procedural_fragment("print(t1_row.c2)", Dialect::Multi));
+        // T-SQL PRINT(expr) is a real statement -> kept.
+        assert!(!is_procedural_fragment("PRINT('hello')", Dialect::Tsql));
+        // A normal SELECT is never matched.
+        assert!(!is_procedural_fragment(
+            "SELECT print_col FROM t",
+            Dialect::Clickhouse
+        ));
+    }
+
+    #[test]
+    fn mysql_delimiter_wreckage_is_dropped() {
+        assert!(is_procedural_fragment("delimiter //", Dialect::Mysql));
+        assert!(is_procedural_fragment(
+            "end // create function f() returns int",
+            Dialect::Mysql
+        ));
+        // Same line in another dialect is not delimiter wreckage.
+        assert!(!is_procedural_fragment(
+            "SELECT 1 // comment",
+            Dialect::Postgresql
+        ));
+    }
+}
diff --git a/src/bin/sqlbench.rs b/src/bin/sqlbench.rs
index d4b9595..3b24282 100644
--- a/src/bin/sqlbench.rs
+++ b/src/bin/sqlbench.rs
@@ -76,10 +76,10 @@ fn print_report(r: &DialectReport) {
             reference, r.valid_total, r.invalid_total
         );
         println!(
-            "{:<nw$}  {:>7}  {:>7}  {:>7}  {:>8}",
-            "parser", "Recall", "FalseP", "RTrip", "Fidelity"
+            "{:<nw$}  {:>7}  {:>7}  {:>7}",
+            "parser", "Recall", "FalseP", "RTrip"
         );
-        println!("{}", "-".repeat(nw + 2 + 7 + 2 + 7 + 2 + 7 + 2 + 8));
+        println!("{}", "-".repeat(nw + 2 + 7 + 2 + 7 + 2 + 7));
         for (p, a) in r.parsers.iter().zip(r.stats.iter()) {
             let recall = cell(pct(a.accepted_valid, r.valid_total));
             let fp = if r.invalid_total > 0 {
@@ -92,12 +92,7 @@ fn print_report(r: &DialectReport) {
             } else {
                 NA.to_string()
             };
-            let fid = if a.can_reprint {
-                cell(pct(a.fidelity_ok, a.accepted_valid))
-            } else {
-                NA.to_string()
-            };
-            println!("{:<nw$}  {recall:>7}  {fp:>7}  {rt:>7}  {fid:>8}", p.family);
+            println!("{:<nw$}  {recall:>7}  {fp:>7}  {rt:>7}", p.family);
         }
     } else {
         println!(
diff --git a/src/datasets.rs b/src/datasets.rs
index da71f71..39511a2 100644
--- a/src/datasets.rs
+++ b/src/datasets.rs
@@ -155,6 +155,44 @@ mod tests {
         Dialect::Multi,
     ];
 
+    /// Guard against issue #22: the corpus must keep `CREATE TRIGGER ... BEGIN
+    /// ... END` bodies intact. A trigger split on its inner semicolons is
+    /// incomplete and fails to parse. Skips when the corpus is not unpacked.
+    #[test]
+    fn sqlite_create_triggers_parse_as_complete_statements() {
+        if super::ensure_corpus().is_err() {
+            return;
+        }
+        let dir = std::path::Path::new("datasets/sqlite");
+        let Ok(entries) = std::fs::read_dir(dir) else {
+            return;
+        };
+        let mut incomplete = Vec::new();
+        for entry in entries.flatten() {
+            let p = entry.path();
+            if p.extension().and_then(|e| e.to_str()) != Some("txt") {
+                continue;
+            }
+            let content = std::fs::read_to_string(&p).unwrap_or_default();
+            for line in content.lines() {
+                let l = line.trim();
+                let lower = l.to_ascii_lowercase();
+                if lower.starts_with("create")
+                    && lower.contains("trigger")
+                    && crate::BenchParser::Sqlite3.accepts(l, Dialect::Sqlite) != Some(true)
+                {
+                    incomplete.push(l.chars().take(90).collect::<String>());
+                }
+            }
+        }
+        assert!(
+            incomplete.is_empty(),
+            "{} CREATE TRIGGER statements do not parse (truncated?):\n{}",
+            incomplete.len(),
+            incomplete.join("\n")
+        );
+    }
+
     #[test]
     fn dir_name_roundtrips_for_every_variant() {
         for d in ALL {
diff --git a/src/export.rs b/src/export.rs
index c44bb41..f9fff06 100644
--- a/src/export.rs
+++ b/src/export.rs
@@ -130,11 +130,6 @@ fn metrics(report: &DialectReport) -> Vec<ParserMetrics> {
             } else {
                 None
             },
-            fidelity_pct: if crate::has_canonical(report.dialect) && s.can_reprint {
-                pct(s.fidelity_ok, s.accepted_valid)
-            } else {
-                None
-            },
             accept_pct: if reference {
                 None
             } else {
@@ -216,32 +211,30 @@ fn mem_for(dir: &str, parsers: &[BenchParser]) -> Vec<ParserMem> {
 }
 
 /// One row of the batch time summary (`batch_dist/summary.csv`):
-/// `dialect,parser,n_accepted,n_parsed,batch_bytes,batch_ns,ns_per_stmt`.
+/// `dialect,parser,n_eligible,k,n_correct,accuracy_pct,ns_per_stmt`. The last two
+/// may be blank (no batch parsed correctly).
 struct BatchPerfRow {
     dialect: String,
     parser: String,
-    n_accepted: usize,
-    n_parsed: usize,
-    ns_per_stmt: f64,
+    n_eligible: usize,
+    accuracy_pct: Option<f64>,
+    ns_per_stmt: Option<f64>,
 }
 
 /// One row of the batch memory summary (`batch_mem_dist/summary.csv`):
-/// `dialect,parser,n_accepted,n_parsed,peak_bytes,retained_bytes,peak_per_stmt,retained_per_stmt`.
+/// `dialect,parser,n_eligible,k,n_correct,accuracy_pct,peak_per_stmt,retained_per_stmt`.
+/// The per-statement figures may be blank (no batch parsed correctly).
 struct BatchMemRow {
     dialect: String,
     parser: String,
-    n_accepted: usize,
-    n_parsed: usize,
-    peak_per_stmt: f64,
-    retained_per_stmt: f64,
+    peak_per_stmt: Option<f64>,
+    retained_per_stmt: Option<f64>,
 }
 
-/// Whether a batch parse consumed the whole accepted set, so its normalized cost
-/// can be trusted. A fail-fast parser that errors partway yields `n_parsed`
-/// below `n_accepted`. Statements with internal `;` only push `n_parsed` higher,
-/// so `>=` is the right "fully consumed" test.
-const fn batch_complete(n_parsed: usize, n_accepted: usize) -> bool {
-    n_accepted > 0 && n_parsed >= n_accepted
+/// Parse a possibly-blank CSV float field into `Option<f64>`.
+fn opt_f64(s: &str) -> Option<f64> {
+    let s = s.trim();
+    (!s.is_empty()).then(|| s.parse().ok()).flatten()
 }
 
 fn parse_batch_perf(content: &str) -> Vec<BatchPerfRow> {
@@ -256,9 +249,9 @@ fn parse_batch_perf(content: &str) -> Vec<BatchPerfRow> {
             Some(BatchPerfRow {
                 dialect: f[0].to_string(),
                 parser: f[1].to_string(),
-                n_accepted: f[2].trim().parse().ok()?,
-                n_parsed: f[3].trim().parse().ok()?,
-                ns_per_stmt: f[6].trim().parse().ok()?,
+                n_eligible: f[2].trim().parse().ok()?,
+                accuracy_pct: opt_f64(f[5]),
+                ns_per_stmt: opt_f64(f[6]),
             })
         })
         .collect()
@@ -276,10 +269,8 @@ fn parse_batch_mem(content: &str) -> Vec<BatchMemRow> {
             Some(BatchMemRow {
                 dialect: f[0].to_string(),
                 parser: f[1].to_string(),
-                n_accepted: f[2].trim().parse().ok()?,
-                n_parsed: f[3].trim().parse().ok()?,
-                peak_per_stmt: f[6].trim().parse().ok()?,
-                retained_per_stmt: f[7].trim().parse().ok()?,
+                peak_per_stmt: opt_f64(f[6]),
+                retained_per_stmt: opt_f64(f[7]),
             })
         })
         .collect()
@@ -296,16 +287,16 @@ fn read_batch_mem() -> Vec<BatchMemRow> {
 }
 
 /// Merge batch time and batch memory rows for one dialect into per-parser
-/// `ParserBatch`. A parser appears only if at least one axis parsed the whole
-/// accepted set (see [`batch_complete`]). An axis whose batch bailed out early
-/// is dropped to `None` so the explorer never shows a misleading number. Pure,
-/// so the merge and the guard are testable.
+/// `ParserBatch`. Every batch-capable parser has a time row carrying its accuracy
+/// (the time bench runs them all), so it appears here; memory is added where the
+/// parser's allocations are Rust-visible. Pure, so the merge is testable.
 fn batch_for(dir: &str, perf: &[BatchPerfRow], mem: &[BatchMemRow]) -> Vec<ParserBatch> {
     use std::collections::BTreeMap;
     let mut map: BTreeMap<&str, ParserBatch> = BTreeMap::new();
     let blank = |parser: &str, n: usize| ParserBatch {
         parser: parser.to_string(),
         n_accepted: n,
+        accuracy_pct: None,
         ns_per_stmt: None,
         peak_per_stmt: None,
         retained_per_stmt: None,
@@ -313,25 +304,25 @@ fn batch_for(dir: &str, perf: &[BatchPerfRow], mem: &[BatchMemRow]) -> Vec<Parse
     for r in perf.iter().filter(|r| r.dialect == dir) {
         let e = map
             .entry(r.parser.as_str())
-            .or_insert_with(|| blank(&r.parser, r.n_accepted));
-        e.n_accepted = r.n_accepted;
-        if batch_complete(r.n_parsed, r.n_accepted) {
-            e.ns_per_stmt = Some(r.ns_per_stmt);
-        }
+            .or_insert_with(|| blank(&r.parser, r.n_eligible));
+        e.n_accepted = r.n_eligible;
+        e.accuracy_pct = r.accuracy_pct;
+        e.ns_per_stmt = r.ns_per_stmt;
     }
     for r in mem.iter().filter(|r| r.dialect == dir) {
         let e = map
             .entry(r.parser.as_str())
-            .or_insert_with(|| blank(&r.parser, r.n_accepted));
-        if batch_complete(r.n_parsed, r.n_accepted) {
-            e.peak_per_stmt = Some(r.peak_per_stmt);
-            e.retained_per_stmt = Some(r.retained_per_stmt);
-        }
+            .or_insert_with(|| blank(&r.parser, 0));
+        e.peak_per_stmt = r.peak_per_stmt;
+        e.retained_per_stmt = r.retained_per_stmt;
     }
-    // Drop parsers whose every axis was incomplete (nothing trustworthy to show).
+    // Keep a parser if it carries any batch signal.
     map.into_values()
         .filter(|b| {
-            b.ns_per_stmt.is_some() || b.peak_per_stmt.is_some() || b.retained_per_stmt.is_some()
+            b.accuracy_pct.is_some()
+                || b.ns_per_stmt.is_some()
+                || b.peak_per_stmt.is_some()
+                || b.retained_per_stmt.is_some()
         })
         .collect()
 }
@@ -616,8 +607,8 @@ pub fn run() -> Result<(), Box<dyn std::error::Error>> {
 #[cfg(test)]
 mod tests {
     use super::{
-        batch_complete, batch_for, build_coverage_matrix, format_failure_tsv, git_short, metrics,
-        now_utc, parse_batch_mem, parse_batch_perf, parse_summary, pct, perf_row_to_perf, PerfRow,
+        batch_for, build_coverage_matrix, format_failure_tsv, git_short, metrics, now_utc,
+        parse_batch_mem, parse_batch_perf, parse_summary, pct, perf_row_to_perf, PerfRow,
     };
     use crate::datasets::Dialect;
     use crate::report::{DialectReport, FileCoverage};
@@ -774,56 +765,54 @@ mod tests {
     }
 
     #[test]
-    fn batch_perf_parses_and_skips_short_lines() {
-        let csv = "dialect,parser,n_accepted,n_parsed,batch_bytes,batch_ns,ns_per_stmt\n\
-                   postgresql,sqlparser-rs,100,100,5000,400000.0,4000.0\n\
+    fn batch_perf_parses_accuracy_and_blank_fields() {
+        let csv = "dialect,parser,n_eligible,k,n_correct,accuracy_pct,ns_per_stmt\n\
+                   postgresql,sqlparser-rs,5000,200,180,90.000,4000.0\n\
+                   postgresql,worst,5000,200,0,0.000,\n\
                    short,row\n";
         let rows = parse_batch_perf(csv);
-        assert_eq!(rows.len(), 1);
+        assert_eq!(rows.len(), 2);
         assert_eq!(rows[0].parser, "sqlparser-rs");
-        assert_eq!(rows[0].n_accepted, 100);
-        assert_eq!(rows[0].n_parsed, 100);
-        assert!((rows[0].ns_per_stmt - 4000.0).abs() < 1e-9);
+        assert_eq!(rows[0].n_eligible, 5000);
+        assert_eq!(rows[0].accuracy_pct, Some(90.0));
+        assert_eq!(rows[0].ns_per_stmt, Some(4000.0));
+        // No batch parsed correctly: accuracy 0, time blank.
+        assert_eq!(rows[1].accuracy_pct, Some(0.0));
+        assert_eq!(rows[1].ns_per_stmt, None);
     }
 
     #[test]
     fn batch_mem_parses_peak_and_retained_columns() {
-        let csv = "dialect,parser,n_accepted,n_parsed,peak_bytes,retained_bytes,peak_per_stmt,retained_per_stmt\n\
-                   sqlite,turso_parser,50,50,100000,40000,2000.0,800.0\n";
+        let csv =
+            "dialect,parser,n_eligible,k,n_correct,accuracy_pct,peak_per_stmt,retained_per_stmt\n\
+                   sqlite,turso_parser,5000,200,200,100.000,2000.0,800.0\n";
         let rows = parse_batch_mem(csv);
         assert_eq!(rows.len(), 1);
-        assert_eq!(rows[0].n_parsed, 50);
-        assert!((rows[0].peak_per_stmt - 2000.0).abs() < 1e-9);
-        assert!((rows[0].retained_per_stmt - 800.0).abs() < 1e-9);
-    }
-
-    #[test]
-    fn batch_complete_requires_full_consumption() {
-        assert!(batch_complete(10, 10)); // exactly consumed
-        assert!(batch_complete(12, 10)); // internal-semicolon inflation is fine
-        assert!(!batch_complete(9, 10)); // bailed out early
-        assert!(!batch_complete(0, 0)); // nothing accepted
+        assert_eq!(rows[0].peak_per_stmt, Some(2000.0));
+        assert_eq!(rows[0].retained_per_stmt, Some(800.0));
     }
 
     #[test]
-    fn batch_merge_combines_time_and_memory_and_filters_by_dialect() {
+    fn batch_merge_combines_accuracy_time_and_memory_by_dialect() {
         let perf = parse_batch_perf(
             "h,h,h,h,h,h,h\n\
-             postgresql,sqlparser-rs,10,10,1,100.0,10.0\n\
-             postgresql,pg_query.rs,10,10,1,80.0,8.0\n",
+             postgresql,sqlparser-rs,10,200,180,90.0,10.0\n\
+             postgresql,pg_query.rs,10,200,200,100.0,8.0\n",
         );
         let mem = parse_batch_mem(
             "h,h,h,h,h,h,h,h\n\
-             postgresql,sqlparser-rs,10,10,1,1,500.0,200.0\n",
+             postgresql,sqlparser-rs,10,200,180,90.0,500.0,200.0\n",
         );
         let merged = batch_for("postgresql", &perf, &mem);
         assert_eq!(merged.len(), 2);
         let sp = merged.iter().find(|x| x.parser == "sqlparser-rs").unwrap();
+        assert_eq!(sp.accuracy_pct, Some(90.0));
         assert_eq!(sp.ns_per_stmt, Some(10.0));
         assert_eq!(sp.peak_per_stmt, Some(500.0));
         assert_eq!(sp.retained_per_stmt, Some(200.0));
-        // pg_query has batch time but no Rust-visible batch memory.
+        // pg_query has batch time and accuracy but no Rust-visible batch memory.
         let pg = merged.iter().find(|x| x.parser == "pg_query.rs").unwrap();
+        assert_eq!(pg.accuracy_pct, Some(100.0));
         assert_eq!(pg.ns_per_stmt, Some(8.0));
         assert_eq!(pg.peak_per_stmt, None);
         // A different dialect yields nothing from the same rows.
@@ -831,22 +820,15 @@ mod tests {
     }
 
     #[test]
-    fn batch_merge_drops_incomplete_parses() {
-        // Time bailed out early (5 of 10): its ns_per_stmt is untrustworthy and
-        // dropped. Memory parsed fully, so it survives and keeps the entry.
-        let perf = parse_batch_perf("h,h,h,h,h,h,h\npostgresql,sqlparser-rs,10,5,1,100.0,10.0\n");
-        let mem =
-            parse_batch_mem("h,h,h,h,h,h,h,h\npostgresql,sqlparser-rs,10,10,1,1,500.0,200.0\n");
-        let merged = batch_for("postgresql", &perf, &mem);
+    fn batch_merge_keeps_accuracy_even_when_time_is_blank() {
+        // A parser whose batches never parsed correctly still appears, with its
+        // accuracy (0) recorded and time/memory left absent.
+        let perf = parse_batch_perf("h,h,h,h,h,h,h\npostgresql,worst,10,200,0,0.0,\n");
+        let merged = batch_for("postgresql", &perf, &[]);
         assert_eq!(merged.len(), 1);
-        assert_eq!(merged[0].ns_per_stmt, None); // dropped: incomplete
-        assert_eq!(merged[0].peak_per_stmt, Some(500.0));
-
-        // Both axes incomplete -> the parser is omitted entirely.
-        let perf2 = parse_batch_perf("h,h,h,h,h,h,h\npostgresql,sqlparser-rs,10,2,1,100.0,10.0\n");
-        let mem2 =
-            parse_batch_mem("h,h,h,h,h,h,h,h\npostgresql,sqlparser-rs,10,3,1,1,500.0,200.0\n");
-        assert!(batch_for("postgresql", &perf2, &mem2).is_empty());
+        assert_eq!(merged[0].accuracy_pct, Some(0.0));
+        assert_eq!(merged[0].ns_per_stmt, None);
+        assert_eq!(merged[0].peak_per_stmt, None);
     }
 
     #[test]
diff --git a/src/lib.rs b/src/lib.rs
index 4c0397b..d3f31b4 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -16,10 +16,6 @@ use sqlparser::dialect::{
     RedshiftSqlDialect, SQLiteDialect,
 };
 
-fn pg_query_canonical(sql: &str) -> Option<String> {
-    pg_query::parse(sql).ok()?.deparse().ok()
-}
-
 // Multi-dialect benchmark layer. Each parser runs in its best-matching dialect.
 // One it does not model returns `None` (N/A). Correctness uses reference where
 // one exists (pg_query for PostgreSQL, lemon-rs for SQLite), else acceptance rate.
@@ -286,16 +282,6 @@ fn databend_reprint(sql: &str, d: DatabendDialect) -> Option<String> {
 
 // Reference.
 
-/// Canonical form for a reference-backed dialect, used for fidelity checks.
-/// `None` for dialects with no reference (or when the relevant feature is off).
-fn reference_canonical(sql: &str, d: Dialect) -> Option<String> {
-    match d {
-        Dialect::Postgresql => pg_query_canonical(sql),
-        Dialect::Sqlite => sqlite3_reprint(sql),
-        _ => None,
-    }
-}
-
 /// Does the reference accept this statement?
 ///
 /// The reference is the real database engine, validated offline and read from
@@ -314,20 +300,10 @@ pub fn has_reference(d: Dialect) -> bool {
     oracle_cache::has_reference(d)
 }
 
-/// Dialects with a library canonicalizer for the fidelity metric.
-///
-/// `PostgreSQL` via `pg_query`, `SQLite` via `lemon-rs`. This is independent of
-/// the validity reference (the real engine), which only labels statements
-/// valid/invalid.
-#[must_use]
-pub const fn has_canonical(d: Dialect) -> bool {
-    matches!(d, Dialect::Postgresql | Dialect::Sqlite)
-}
-
 // BenchParser.
 
 /// A parser under test. The single source of truth for dialect support,
-/// acceptance, round-trip stability and reference fidelity.
+/// acceptance, and round-trip stability.
 #[derive(Clone, Copy, Debug, PartialEq, Eq)]
 pub enum BenchParser {
     Sqlparser,
@@ -769,8 +745,7 @@ impl BenchParser {
         }
     }
 
-    /// Whether this parser has a pretty-printer (can round-trip / be graded for
-    /// fidelity) for `dialect`.
+    /// Whether this parser has a pretty-printer (can round-trip) for `dialect`.
     #[must_use]
     pub fn can_reprint(self, dialect: Dialect) -> bool {
         match self {
@@ -797,26 +772,6 @@ impl BenchParser {
                 .is_some_and(|second| first == second),
         )
     }
-
-    /// Reference fidelity: the reference's canonical form of this parser's output
-    /// equals the reference's canonical form of the input. `None` if the parser
-    /// cannot reprint or the dialect has no reference.
-    #[must_use]
-    pub fn fidelity(self, sql: &str, dialect: Dialect) -> Option<bool> {
-        if !self.can_reprint(dialect) || !has_canonical(dialect) {
-            return None;
-        }
-        let Some(out) = self.reprint(sql, dialect) else {
-            return Some(false);
-        };
-        match (
-            reference_canonical(sql, dialect),
-            reference_canonical(&out, dialect),
-        ) {
-            (Some(a), Some(b)) => Some(a == b),
-            _ => Some(false),
-        }
-    }
 }
 
 /// Identity of one benchmarked parser build: which library and which version.
@@ -840,7 +795,7 @@ pub struct ParserId {
 /// drivers operate over `&dyn Parser`, so one implementation serves both.
 ///
 /// Implementors provide the required methods. `accepts`, `measure_mem_batch`,
-/// `roundtrips`, and `fidelity` have default implementations built on them
+/// and `roundtrips` have default implementations built on them
 /// (mirroring [`BenchParser`]'s inherent methods), so a historical version only
 /// needs the core parse hooks.
 pub trait Parser: Sync {
@@ -904,24 +859,6 @@ pub trait Parser: Sync {
                 .is_some_and(|second| first == second),
         )
     }
-
-    /// Reference fidelity: the reference canonical form of the parser's output
-    /// equals that of the input. `None` without a printer or reference.
-    fn fidelity(&self, sql: &str, dialect: Dialect) -> Option<bool> {
-        if !self.can_reprint(dialect) || !has_canonical(dialect) {
-            return None;
-        }
-        let Some(out) = self.reprint(sql, dialect) else {
-            return Some(false);
-        };
-        match (
-            reference_canonical(sql, dialect),
-            reference_canonical(&out, dialect),
-        ) {
-            (Some(a), Some(b)) => Some(a == b),
-            _ => Some(false),
-        }
-    }
 }
 
 /// Delegation shim: every method forwards to [`BenchParser`]'s inherent method,
@@ -973,9 +910,6 @@ impl Parser for BenchParser {
     fn roundtrips(&self, sql: &str, dialect: Dialect) -> Option<bool> {
         (*self).roundtrips(sql, dialect)
     }
-    fn fidelity(&self, sql: &str, dialect: Dialect) -> Option<bool> {
-        (*self).fidelity(sql, dialect)
-    }
 }
 
 pub mod batch;
@@ -990,7 +924,7 @@ pub mod stats;
 #[cfg(test)]
 mod tests {
     use super::ParseOutcome;
-    use super::{catch_outcome, has_canonical, has_reference, reference_accepts, BenchParser};
+    use super::{catch_outcome, has_reference, reference_accepts, BenchParser};
     use crate::datasets::Dialect;
 
     #[test]
@@ -1179,14 +1113,10 @@ mod tests {
             assert!(!has_reference(d), "{d:?} should never be reference-graded");
             assert_eq!(reference_accepts("SELECT 1", d), None);
         }
-        // PostgreSQL and SQLite keep a library canonicalizer for fidelity,
-        // independent of the validity cache.
-        assert!(has_canonical(Dialect::Postgresql));
-        assert!(has_canonical(Dialect::Sqlite));
     }
 
     #[test]
-    fn roundtrip_and_fidelity_gating() {
+    fn roundtrip_gating() {
         // No pretty-printer => round-trip is N/A.
         assert_eq!(
             BenchParser::Orql.roundtrips("SELECT 1 FROM dual", Dialect::Oracle),
@@ -1196,18 +1126,12 @@ mod tests {
             BenchParser::Qusql.roundtrips("SELECT 1", Dialect::Postgresql),
             None
         );
-        // Fidelity needs a library canonicalizer: None on a dialect without one
-        // even for a parser that can reprint.
-        assert_eq!(
-            BenchParser::Sqlparser.fidelity("SELECT 1", Dialect::Mysql),
-            None
-        );
-        // Reprintable parser on a reference dialect => a verdict (Some).
+        // Reprintable parser => a verdict (Some).
         assert!(BenchParser::Sqlparser
             .roundtrips("SELECT 1", Dialect::Postgresql)
             .is_some());
         assert!(BenchParser::Sqlparser
-            .fidelity("SELECT 1", Dialect::Sqlite)
+            .roundtrips("SELECT 1", Dialect::Sqlite)
             .is_some());
     }
 
diff --git a/src/report.rs b/src/report.rs
index f37e3fd..f2a4f16 100644
--- a/src/report.rs
+++ b/src/report.rs
@@ -4,9 +4,9 @@
 //!
 //! Shared by the `sqlbench` tool and unit-tested here. `grade_chunk` is the
 //! correctness core: it splits a dialect's statements by reference verdict (where
-//! one exists) and tallies per parser recall, false-positive, round-trip and
-//! fidelity. It is deterministic, so callers may chunk the corpus and `merge`
-//! partial reports for speed.
+//! one exists) and tallies per parser recall, false-positive, and round-trip. It
+//! is deterministic, so callers may chunk the corpus and `merge` partial reports
+//! for speed.
 
 use crate::datasets::Dialect;
 use crate::{has_reference, reference_accepts, Parser, ParserId};
@@ -21,7 +21,7 @@ pub const WORKER_STACK: usize = 512 * 1024 * 1024;
 /// Per-parser tallies within one dialect.
 #[derive(Clone, Default)]
 pub struct ParserStat {
-    /// Whether the parser can pretty-print in this dialect (round-trip/fidelity).
+    /// Whether the parser can pretty-print in this dialect (round-trip).
     pub can_reprint: bool,
     /// Accepted among reference-valid statements (recall numerator). For a
     /// provenance dialect (no reference) every statement is treated as valid, so
@@ -31,8 +31,6 @@ pub struct ParserStat {
     pub accepted_invalid: usize,
     /// Round-trip-stable among accepted-valid.
     pub roundtrip_ok: usize,
-    /// Reference-fidelity-preserving among accepted-valid.
-    pub fidelity_ok: usize,
     /// Statements the parser attempted in this dialect (the panic-rate
     /// denominator): every graded statement, since a supporting parser is run on
     /// all of them. Zero for a parser that does not model the dialect.
@@ -47,7 +45,6 @@ impl ParserStat {
         self.accepted_valid += other.accepted_valid;
         self.accepted_invalid += other.accepted_invalid;
         self.roundtrip_ok += other.roundtrip_ok;
-        self.fidelity_ok += other.fidelity_ok;
         self.attempted += other.attempted;
         self.panicked += other.panicked;
     }
@@ -137,13 +134,8 @@ pub fn grade_chunk(stmts: &[String], dialect: Dialect, parsers: &[&dyn Parser])
             }
             if is_valid {
                 report.stats[i].accepted_valid += 1;
-                if report.stats[i].can_reprint {
-                    if p.roundtrips(sql, dialect) == Some(true) {
-                        report.stats[i].roundtrip_ok += 1;
-                    }
-                    if p.fidelity(sql, dialect) == Some(true) {
-                        report.stats[i].fidelity_ok += 1;
-                    }
+                if report.stats[i].can_reprint && p.roundtrips(sql, dialect) == Some(true) {
+                    report.stats[i].roundtrip_ok += 1;
                 }
             } else {
                 report.stats[i].accepted_invalid += 1;
diff --git a/timemachine/src/families/polyglot.rs b/timemachine/src/families/polyglot.rs
index 20d6d01..3614d2c 100644
--- a/timemachine/src/families/polyglot.rs
+++ b/timemachine/src/families/polyglot.rs
@@ -1,5 +1,5 @@
 //! Historical polyglot-sql versions. Models every dialect and regenerates SQL,
-//! so it is graded for round-trip and fidelity.
+//! so it is graded for round-trip.
 
 use sql_ast_benchmark::datasets::Dialect;
 use sql_ast_benchmark::{Parser, ParserId};
diff --git a/timemachine/src/families/qusql.rs b/timemachine/src/families/qusql.rs
index edf23fb..e0c14f3 100644
--- a/timemachine/src/families/qusql.rs
+++ b/timemachine/src/families/qusql.rs
@@ -1,6 +1,6 @@
 //! Historical qusql-parse versions. Models PostgreSQL, MariaDB, and SQLite, is
 //! resilient (collects ranked issues rather than failing on the first error),
-//! and has no pretty-printer (so round-trip and fidelity are N/A).
+//! and has no pretty-printer (so round-trip is N/A).
 
 use sql_ast_benchmark::datasets::Dialect;
 use sql_ast_benchmark::{Parser, ParserId};
diff --git a/timemachine/src/families/sqlglot.rs b/timemachine/src/families/sqlglot.rs
index e65522f..ac097ad 100644
--- a/timemachine/src/families/sqlglot.rs
+++ b/timemachine/src/families/sqlglot.rs
@@ -1,5 +1,5 @@
 //! Historical sqlglot-rust versions. Models every dialect and pretty-prints, so
-//! it is graded for round-trip and fidelity like the current build.
+//! it is graded for round-trip like the current build.
 
 use sql_ast_benchmark::datasets::Dialect;
 use sql_ast_benchmark::{Parser, ParserId};
diff --git a/timemachine/src/run.rs b/timemachine/src/run.rs
index b283d0a..1219fbb 100644
--- a/timemachine/src/run.rs
+++ b/timemachine/src/run.rs
@@ -8,10 +8,10 @@
 //! history. The timing binary merges in the memory sidecar and writes the final
 //! per-family file.
 
-use sql_ast_benchmark::batch::join_batch;
+use sql_ast_benchmark::batch::{batch_eligible, evaluate_batches, reports_statement_count};
 use sql_ast_benchmark::datasets::Dialect;
 use sql_ast_benchmark::report::{self, load_dialect};
-use sql_ast_benchmark::{has_canonical, stats, Parser};
+use sql_ast_benchmark::{stats, Parser};
 use std::collections::BTreeMap;
 use std::hint::black_box;
 use std::path::PathBuf;
@@ -125,7 +125,7 @@ fn load_corpus(full: bool) -> BTreeMap<&'static str, Vec<String>> {
 }
 
 /// `ParserMetrics` from a one-parser grading report.
-fn metrics_of(report: &report::DialectReport, dialect: Dialect) -> ParserMetrics {
+fn metrics_of(report: &report::DialectReport) -> ParserMetrics {
     let s = &report.stats[0];
     let id = report.parsers[0];
     let reference = report.has_reference;
@@ -149,11 +149,6 @@ fn metrics_of(report: &report::DialectReport, dialect: Dialect) -> ParserMetrics
         } else {
             None
         },
-        fidelity_pct: if has_canonical(dialect) && s.can_reprint {
-            pct(s.fidelity_ok, s.accepted_valid)
-        } else {
-            None
-        },
         accept_pct: if reference {
             None
         } else {
@@ -207,28 +202,45 @@ fn timing_dialect_run(p: &dyn Parser, d: Dialect, stmts: &[String]) -> DialectRu
         ))
     };
 
-    let batch = if accepted.is_empty() || !p.can_batch() {
+    let count = |s: &str| {
+        std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
+            p.parse_batch(s, d).unwrap_or(0)
+        }))
+        .unwrap_or(0)
+    };
+    let eligible: Vec<&str> = if !p.can_batch() || !reports_statement_count(|s| count(s)) {
+        Vec::new()
+    } else {
+        accepted
+            .iter()
+            .copied()
+            .filter(|s| batch_eligible(s) && count(s) == 1)
+            .collect()
+    };
+    let batch = if eligible.is_empty() {
         None
     } else {
-        let script = join_batch(&accepted);
-        let n_parsed = p.parse_batch(&script, d).unwrap_or(0);
-        // Only trust the batch number if the whole accepted set parsed.
-        if n_parsed >= accepted.len() {
-            let ns = time_batch(|| p.parse_batch(&script, d).unwrap_or(0));
-            Some(ParserBatch {
-                parser: p.id().family.to_string(),
-                n_accepted: accepted.len(),
-                ns_per_stmt: Some(ns / accepted.len() as f64),
-                peak_per_stmt: None,
-                retained_per_stmt: None,
-            })
-        } else {
+        let label = format!("{}/{}", d.dir_name(), p.id().family);
+        let eval = evaluate_batches(&eligible, &label, count);
+        let ns_per_stmt = if eval.n_correct == 0 {
             None
-        }
+        } else {
+            let denom = (eval.n_correct * eval.effective_m) as f64;
+            let ns = time_batch(|| eval.correct_scripts.iter().map(|s| count(s)).sum());
+            Some(ns / denom)
+        };
+        Some(ParserBatch {
+            parser: p.id().family.to_string(),
+            n_accepted: eval.n_eligible,
+            accuracy_pct: eval.accuracy_pct(),
+            ns_per_stmt,
+            peak_per_stmt: None,
+            retained_per_stmt: None,
+        })
     };
 
     let report = report::grade_chunk(stmts, d, &[p]);
-    let correctness = Some(metrics_of(&report, d));
+    let correctness = Some(metrics_of(&report));
 
     DialectRun {
         dir_name: d.dir_name().to_string(),
diff --git a/viz/src/schema.rs b/viz/src/schema.rs
index c2015b5..7d30f4c 100644
--- a/viz/src/schema.rs
+++ b/viz/src/schema.rs
@@ -171,15 +171,22 @@ pub struct DialectData {
 #[derive(Serialize, Deserialize, Clone, Debug)]
 pub struct ParserBatch {
     pub parser: String,
-    /// Statements fed into the batch (the parser's accepted set).
+    /// Statements eligible for batching (accepted, parse to one statement alone,
+    /// and not input-consuming), the pool the random batches were drawn from.
     pub n_accepted: usize,
-    /// Whole-script parse time divided by statement count (ns).
+    /// Batch accuracy: the share of sampled multi-statement scripts the parser
+    /// reparsed to exactly the expected statement count, as a percent. Lower than
+    /// 100 means the parser mishandles a statement boundary in some scripts.
+    #[serde(default)]
+    pub accuracy_pct: Option<f64>,
+    /// Per-statement parse time averaged over the batches that parsed correctly
+    /// (ns). `None` when no batch parsed correctly.
     #[serde(default)]
     pub ns_per_stmt: Option<f64>,
-    /// Peak live bytes during the whole-script parse, per statement.
+    /// Peak live bytes per statement over the correctly parsed batches.
     #[serde(default)]
     pub peak_per_stmt: Option<f64>,
-    /// Retained bytes after the whole-script parse, per statement.
+    /// Retained bytes per statement over the correctly parsed batches.
     #[serde(default)]
     pub retained_per_stmt: Option<f64>,
 }
@@ -263,8 +270,6 @@ pub struct ParserMetrics {
     pub false_positive_pct: Option<f64>,
     /// Display round-trip rate among accepted (None without a printer).
     pub roundtrip_pct: Option<f64>,
-    /// Reference dialects: canonical-form fidelity among accepted.
-    pub fidelity_pct: Option<f64>,
     /// Provenance dialects: fraction of the corpus accepted.
     pub accept_pct: Option<f64>,
     /// Statements the parser attempted in this dialect (the panic-rate
diff --git a/web/assets/bench.json.zst b/web/assets/bench.json.zst
index cab8c08..568f182 100644
Binary files a/web/assets/bench.json.zst and b/web/assets/bench.json.zst differ
diff --git a/web/assets/history.json.zst b/web/assets/history.json.zst
index 315d9a5..1f4348f 100644
Binary files a/web/assets/history.json.zst and b/web/assets/history.json.zst differ
diff --git a/web/src/components.rs b/web/src/components.rs
index 97cb0ef..04b9888 100644
--- a/web/src/components.rs
+++ b/web/src/components.rs
@@ -6,11 +6,12 @@ use crate::Route;
 use dioxus::prelude::*;
 use dioxus_free_icons::icons::fa_brands_icons::{FaGit, FaGithub, FaRust};
 use dioxus_free_icons::icons::fa_solid_icons::{
-    FaArrowLeftLong, FaArrowsRotate, FaBan, FaBomb, FaBox, FaBug, FaBuilding, FaCalendarDays,
-    FaChartColumn, FaChartLine, FaCircleXmark, FaCode, FaCodeCommit, FaCodeFork, FaCopy, FaCube,
-    FaDatabase, FaDna, FaDownload, FaFileShield, FaFlaskVial, FaHeartPulse, FaLayerGroup,
-    FaMicrochip, FaMobileScreen, FaScaleBalanced, FaServer, FaShieldHalved, FaSitemap, FaStar,
-    FaStopwatch, FaTableCells, FaTag, FaTriangleExclamation, FaUsers, FaVial,
+    FaArrowLeftLong, FaArrowsRotate, FaBan, FaBomb, FaBox, FaBug, FaBuilding, FaBullseye,
+    FaCalendarDays, FaChartColumn, FaChartLine, FaCircleXmark, FaCode, FaCodeCommit, FaCodeFork,
+    FaCopy, FaCube, FaDatabase, FaDna, FaDownload, FaFileShield, FaFlaskVial, FaGaugeHigh,
+    FaHeartPulse, FaLayerGroup, FaMicrochip, FaMobileScreen, FaRankingStar, FaScaleBalanced,
+    FaServer, FaShieldHalved, FaSitemap, FaStar, FaStopwatch, FaTableCells, FaTag,
+    FaTriangleExclamation, FaUsers, FaVial,
 };
 use dioxus_free_icons::Icon;
 use std::cmp::Ordering;
@@ -350,12 +351,23 @@ pub fn Overview() -> Element {
                 {rich_text(&format!("We evaluated nine parser libraries: [sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs) (Apache DataFusion), [pg_query.rs](https://github.com/pganalyze/pg_query.rs) and its faster summary mode (Rust bindings to [libpg_query](https://github.com/pganalyze/libpg_query), PostgreSQL's own parser), [databend-common-ast](https://crates.io/crates/databend-common-ast), [polyglot-sql](https://github.com/tobilg/polyglot), [sqlglot-rust](https://crates.io/crates/sqlglot-rust), [qusql-parse](https://crates.io/crates/qusql-parse), [sqlite3-parser](https://crates.io/crates/sqlite3-parser) (lemon-rs), and [turso_parser](https://crates.io/crates/turso_parser) (the SQLite parser from Turso), plus [orql](https://codeberg.org/xitep/orql) on Oracle. We ran them against a corpus of 340,938 statements spanning these {} dialects, drawn from each engine's own regression suites and official samples and committed compressed so every run is reproducible.", b.dialects.len())).into_iter()}
             }
             p { class: "blurb",
-                {rich_text("We exercised each parser in the dialect that matches the corpus under test. Where a dialect has a runnable engine, we labelled each statement valid or invalid with the real database engine itself, run in Docker via [testcontainers](https://github.com/testcontainers/testcontainers-rs): a statement counts as valid unless the engine reports a syntax error, so a missing table or column still counts as parsed. Against that ground truth we scored the parsers on recall (valid statements accepted), false positives (invalid statements wrongly accepted), display round-trip stability, and canonical-form fidelity. The other dialects have no runnable engine, so their statements count as provenance-valid and the metric is simply the acceptance rate. Across all dialects, we captured speed as a per-statement parse-time distribution over every accepted statement, and memory as the peak and retained bytes per statement under a counting allocator. A batch axis additionally parses each parser's whole accepted set as a single script, showing what bulk parsing amortizes, and a time machine benchmarks the historical releases of every pure-Rust parser (59 versions in total, including every sqlparser-rs minor since January 2023), so each parser page also charts how coverage, speed, and memory evolved across releases.").into_iter()}
+                {rich_text("We exercised each parser in the dialect that matches the corpus under test. Where a dialect has a runnable engine, we labelled each statement valid or invalid with the real database engine itself, run in Docker via [testcontainers](https://github.com/testcontainers/testcontainers-rs): a statement counts as valid unless the engine reports a syntax error, so a missing table or column still counts as parsed. Against that ground truth we scored the parsers on recall (valid statements accepted), false positives (invalid statements wrongly accepted), and display round-trip stability. The other dialects have no runnable engine, so their statements count as provenance-valid and the metric is simply the acceptance rate. Across all dialects, we captured speed as a per-statement parse-time distribution over every accepted statement, and memory as the peak and retained bytes per statement under a counting allocator. A batch axis additionally parses each parser's whole accepted set as a single script, showing what bulk parsing amortizes, and a time machine benchmarks the historical releases of every pure-Rust parser (59 versions in total, including every sqlparser-rs minor since January 2023), so each parser page also charts how coverage, speed, and memory evolved across releases.").into_iter()}
             }
             p { class: "blurb",
                 {rich_text("On their home dialect the reference bindings are exact by construction, so the more telling comparison is among the pure-Rust parsers. There, [sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs) is the most broadly capable, the permissive parsers such as [polyglot-sql](https://github.com/tobilg/polyglot) accept the most statements but pay for it with a high false-positive rate, and the stricter parsers reject more in exchange for precision. Speed spans more than an order of magnitude, from well under a microsecond per statement for the fastest parsers to the low single-digit microseconds for most, with [polyglot-sql](https://github.com/tobilg/polyglot) a clear outlier at roughly fifteen. No parser leads on every axis, so the right choice comes down to what a given project values most: broad coverage, few false positives, or raw speed.").into_iter()}
             }
         }
+        div { class: "section-head",
+            h2 {
+                Icon { width: 18, height: 18, fill: "currentColor".to_string(), class: "h2-ico".to_string(), icon: FaRankingStar }
+                "Overall ranking"
+            }
+        }
+        p { class: "table-cap",
+            "A single composite score, 0 to 100, blending every dimension: correctness 45 percent, robustness 20, project health 15, speed 12, and memory 8. Each parser is judged only on the dialects it models, never penalised for dialects it never claimed, and breadth is not itself rewarded. Correctness and health are absolute; speed and memory are ranked against the field within each dialect, then averaged. Click any column to sort. A dimension that does not apply (memory for the FFI bindings) shows n/a and its weight is redistributed."
+        }
+        {score_leaderboard()}
+
         div { class: "section-head",
             h2 {
                 Icon { width: 18, height: 18, fill: "currentColor".to_string(), class: "h2-ico".to_string(), icon: FaDatabase }
@@ -666,10 +678,10 @@ pub fn ParserView(name: String) -> Element {
         "accept / recall",
         "false pos",
         "round-trip",
-        "fidelity",
         "median ns",
         "p90 ns",
         "mean ns",
+        "batch ok%",
         "batch ns/stmt",
     ]
     .iter()
@@ -693,10 +705,10 @@ pub fn ParserView(name: String) -> Element {
                 })),
                 Cell::pct(m.and_then(|m| m.false_positive_pct)),
                 Cell::pct(m.and_then(|m| m.roundtrip_pct)),
-                Cell::pct(m.and_then(|m| m.fidelity_pct)),
                 Cell::ns(p.map(|p| p.median)),
                 Cell::ns(p.map(|p| p.p90)),
                 Cell::ns(p.map(|p| p.mean)),
+                Cell::pct(batch_of(d, &parser).and_then(|x| x.accuracy_pct)),
                 Cell::ns(batch_of(d, &parser).and_then(|x| x.ns_per_stmt)),
             ],
         })
@@ -710,6 +722,9 @@ pub fn ParserView(name: String) -> Element {
                     h1 { "{parser}" }
                     p { class: "hero-stats",
                         span { class: "stat", strong { "{rows.len()}" } " of {b.dialects.len()} dialects" }
+                        if let Some(sc) = crate::score::parser_score(&parser) {
+                            span { class: "stat", strong { "{sc.overall:.0}" } " / 100 score" }
+                        }
                     }
                 }
                 {parser_meta_pills(&parser)}
@@ -718,6 +733,8 @@ pub fn ParserView(name: String) -> Element {
 
         {blurb(crate::descriptions::parser_blurb(&parser))}
 
+        {parser_score_section(&parser)}
+
         if has_charts {
             section { class: "block",
                 h2 {
@@ -749,7 +766,7 @@ pub fn ParserView(name: String) -> Element {
                 "Results by dialect"
             }
             p { class: "table-cap",
-                "One row per dialect. \"accept / recall\" is recall where a reference parser exists, otherwise the acceptance rate. \"false pos\" is the share of invalid statements wrongly accepted (lower is better). \"round-trip\" is the share of accepted statements that re-parse unchanged, \"fidelity\" the share whose printed form matches the original. \"median ns\" and \"p90 ns\" are per-statement parse times (lower is faster), \"mean ns\" the per-statement average, and \"batch ns/stmt\" the whole accepted set parsed as one script divided by its statement count, so compare it to the adjacent mean (blank where not measured or no batch entry point)."
+                "One row per dialect. \"accept / recall\" is recall where a reference parser exists, otherwise the acceptance rate. \"false pos\" is the share of invalid statements wrongly accepted (lower is better). \"round-trip\" is the share of accepted statements that re-parse unchanged. \"median ns\" and \"p90 ns\" are per-statement parse times (lower is faster), \"mean ns\" the per-statement average. \"batch ok%\" is the share of 200 random 128-statement scripts (built from statements the parser accepts) that reparse to the exact count, and \"batch ns/stmt\" is the per-statement time over the scripts that did, so compare it to the adjacent mean."
             }
             SortTable {
                 caption: format!("Per-dialect results for {}", parser),
@@ -971,8 +988,8 @@ fn VersionHistory(parser: String) -> Element {
         "accept / recall",
         "false pos",
         "round-trip",
-        "fidelity",
         "mean ns",
+        "batch ok%",
         "batch ns/stmt",
     ]
     .iter()
@@ -999,8 +1016,8 @@ fn VersionHistory(parser: String) -> Element {
                     })),
                     Cell::pct(m.and_then(|m| m.false_positive_pct)),
                     Cell::pct(m.and_then(|m| m.roundtrip_pct)),
-                    Cell::pct(m.and_then(|m| m.fidelity_pct)),
                     Cell::ns(d.perf.as_ref().map(|p| p.mean)),
+                    Cell::pct(d.batch.as_ref().and_then(|b| b.accuracy_pct)),
                     Cell::ns(d.batch.as_ref().and_then(|b| b.ns_per_stmt)),
                 ],
             }
@@ -1147,7 +1164,7 @@ fn parser_memory_section(b: &viz::Bundle, parser: &str) -> Element {
                 "Memory by dialect"
             }
             p { class: "table-cap",
-                "One row per dialect, bytes per statement. \"peak\" is the high-water mark of live memory during a parse, \"retained\" what the produced AST keeps alive afterwards. \"peak mean\" and \"retained mean\" are the per-statement averages, and \"batch peak/stmt\" and \"batch ret/stmt\" are the same over the whole accepted set parsed as one script divided by its statement count, so compare each batch column to the adjacent mean (blank where not measured or no batch entry point)."
+                "One row per dialect, bytes per statement. \"peak\" is the high-water mark of live memory during a parse, \"retained\" what the produced AST keeps alive afterwards. \"peak mean\" and \"retained mean\" are the per-statement averages, and \"batch peak/stmt\" and \"batch ret/stmt\" are the same over random 128-statement scripts, averaged over the ones that reparsed correctly, so compare each batch column to the adjacent mean."
             }
             div { class: "charts",
                 {chart_figure(&format!("chart-{}-mempeak-ecdf", slug(parser)), &peak_ecdf, &format!("Empirical CDF of {parser} peak memory, one curve per dialect."), "Peak live memory per parse, one curve per dialect. Further left is leaner (log scale).", &format!("{}-peak-memory-ecdf", slug(parser)))}
@@ -1426,6 +1443,90 @@ fn dialect_meta_pills(dir: &str) -> Element {
 /// Repository and crate metadata pills for a parser, shown inside the parser
 /// hero banner. Renders nothing for a parser with no recorded metadata. Figures
 /// are a dated snapshot (see `metadata::SNAPSHOT`).
+/// The overall-score leaderboard for the overview: one row per parser, ranked
+/// by the composite score, with the five sub-scores as sortable columns.
+fn score_leaderboard() -> Element {
+    let columns = vec![
+        "overall".to_string(),
+        "correctness".to_string(),
+        "robustness".to_string(),
+        "speed".to_string(),
+        "memory".to_string(),
+        "health".to_string(),
+    ];
+    let cell = |v: Option<f64>| {
+        v.map_or_else(
+            || Cell::with("n/a".to_string(), None),
+            |x| Cell::with(format!("{x:.0}"), Some(x)),
+        )
+    };
+    let mut scored: Vec<(&String, &crate::score::ParserScore)> =
+        crate::score::all_scores().iter().collect();
+    scored.sort_by(|a, b| {
+        b.1.overall
+            .partial_cmp(&a.1.overall)
+            .unwrap_or(Ordering::Equal)
+    });
+    let rows: Vec<Row> = scored
+        .into_iter()
+        .map(|(name, s)| Row {
+            key: name.clone(),
+            head: Head::Parser(name.clone()),
+            cells: vec![
+                Cell::with(format!("{:.0}", s.overall), Some(s.overall)),
+                cell(s.correctness),
+                cell(s.robustness),
+                cell(s.speed),
+                cell(s.memory),
+                cell(s.health),
+            ],
+        })
+        .collect();
+    rsx! {
+        SortTable {
+            caption: "Overall parser score, ranked".to_string(),
+            corner: "parser".to_string(),
+            columns,
+            rows,
+            footer: None,
+        }
+    }
+}
+
+/// The per-parser score block on the parser page: the overall number plus the
+/// five sub-scores as pills, so the composite is never an opaque figure.
+fn parser_score_section(parser: &str) -> Element {
+    let Some(s) = crate::score::parser_score(parser) else {
+        return rsx! {};
+    };
+    rsx! {
+        section { class: "block",
+            h2 {
+                Icon { width: 17, height: 17, fill: "currentColor".to_string(), class: "h2-ico".to_string(), icon: FaRankingStar }
+                "Overall score"
+            }
+            p { class: "table-cap",
+                "A composite of every dimension, 0 to 100, weighting correctness 45 percent, robustness 20, project health 15, speed 12, and memory 8. Computed only over the dialects this parser models. Speed and memory are ranked against the other parsers on each dialect; correctness and health are absolute."
+            }
+            p { class: "hero-stats",
+                span { class: "stat", strong { "{s.overall:.0}" } " / 100 overall" }
+            }
+            div { class: "meta-grid",
+                {meta_item(rsx! { Icon { width: 12, height: 12, fill: "currentColor".to_string(), icon: FaBullseye } }, "correctness", fmt_score(s.correctness), "Correctness sub-score (0 to 100): recall or acceptance, false-positive avoidance, and round-trip, averaged over the dialects this parser models.".to_string())}
+                {meta_item(rsx! { Icon { width: 12, height: 12, fill: "currentColor".to_string(), icon: FaShieldHalved } }, "robustness", fmt_score(s.robustness), "Robustness sub-score (0 to 100): empirical panic rate on the real corpus, recursion-depth guarding, unsafe surface, and static panic discipline.".to_string())}
+                {meta_item(rsx! { Icon { width: 12, height: 12, fill: "currentColor".to_string(), icon: FaGaugeHigh } }, "speed", fmt_score(s.speed), "Speed sub-score (0 to 100): median parse time ranked against the other parsers within each dialect on a log scale, then averaged.".to_string())}
+                {meta_item(rsx! { Icon { width: 12, height: 12, fill: "currentColor".to_string(), icon: FaMicrochip } }, "memory", fmt_score(s.memory), "Memory sub-score (0 to 100): peak and retained per-statement footprints ranked against the field within each dialect. Shown n/a for FFI parsers, whose C-side allocations are not measured.".to_string())}
+                {meta_item(rsx! { Icon { width: 12, height: 12, fill: "currentColor".to_string(), icon: FaHeartPulse } }, "health", fmt_score(s.health), "Project-health sub-score (0 to 100): maintenance, tests, benches, fuzzing, sanitizers, supply-chain gates, licensing, release cadence, and contributor depth. Excludes popularity proxies.".to_string())}
+            }
+        }
+    }
+}
+
+/// Format a sub-score as a rounded number, or "n/a" when it does not apply.
+fn fmt_score(v: Option<f64>) -> String {
+    v.map_or_else(|| "n/a".to_string(), |x| format!("{x:.0}"))
+}
+
 fn parser_meta_pills(parser: &str) -> Element {
     use crate::metadata::{parser_meta, SNAPSHOT};
     let Some(m) = parser_meta(parser) else {
@@ -1890,12 +1991,17 @@ fn col_help(name: &str) -> Option<&'static str> {
     Some(match name {
         "parser" => "The SQL parser library under test.",
         "dialect" => "The SQL dialect the row reports on.",
+        "overall" => "Overall score (0 to 100): correctness 45 percent, robustness 20, project health 15, speed 12, memory 8, computed only over the dialects the parser models. Higher is better.",
+        "correctness" => "Correctness sub-score (0 to 100): recall or acceptance, false-positive avoidance, and round-trip, averaged over the parser's dialects. Higher is better.",
+        "robustness" => "Robustness sub-score (0 to 100): empirical panic rate, recursion-depth guarding, unsafe surface, and static panic discipline. Higher is better.",
+        "speed" => "Speed sub-score (0 to 100): median parse time ranked against the field within each dialect on a log scale, then averaged. Higher is better.",
+        "memory" => "Memory sub-score (0 to 100): peak and retained per-statement footprints ranked against the field within each dialect. n/a for FFI parsers. Higher is better.",
+        "health" => "Project-health sub-score (0 to 100): maintenance, tests, fuzzing, sanitizers, supply-chain gates, licensing, cadence, and contributor depth. Higher is better.",
         "recall" => "Recall: of the statements the reference parser treats as valid, the share this parser also accepted. It measures agreement with the reference on what counts as valid SQL, not whether the parser runs. Higher is better.",
         "accept" => "Acceptance rate: the share of the corpus this parser accepted. Used where there is no reference parser, so every corpus statement is treated as expected-valid.",
         "accept / recall" => "Recall where a reference parser exists (agreement with it on valid statements), otherwise the plain acceptance rate. Higher is better.",
         "false pos" => "False positives: of the statements the reference parser rejects as invalid, the share this parser wrongly accepted. Lower is better.",
         "round-trip" | "RT %" => "Round-trip rate: of the statements it accepted, the share that print back to SQL and re-parse unchanged. Shown as n/a when the parser cannot print. Higher is better.",
-        "fidelity" => "Fidelity: of the accepted statements, the share whose printed form is semantically identical to the original under the reference parser's canonical form. Higher is better.",
         "missed %" => "Missed: the share of statements the parser was expected to accept but did not. On reference dialects this is one minus recall, elsewhere the unaccepted fraction. Lower is better.",
         "median ns" => "Median parse time per accepted statement, in nanoseconds: half of statements parse faster than this.",
         "p90 ns" => "90th-percentile parse time per accepted statement, in nanoseconds: nine in ten statements parse faster than this.",
@@ -1903,6 +2009,8 @@ fn col_help(name: &str) -> Option<&'static str> {
         "peak p90" => "90th-percentile peak live memory per statement: nine in ten statements stay under this high-water mark.",
         "retained p50" => "Median retained memory per statement: the bytes the produced AST (plus the scaffolding it keeps alive) holds after parsing. Half of statements retain less.",
         "retained p90" => "90th-percentile retained memory per statement: the AST footprint nine in ten statements stay under.",
+        "batch ok%" => "Batch parse rate: of 200 random 128-statement scripts built from statements this parser accepts individually, the share it reparsed to the exact statement count. Below 100% means it mishandles a statement boundary (for example swallowing the terminator) in some multi-statement scripts. Higher is better.",
+        "batch ns/stmt" => "Per-statement parse time inside a multi-statement script, averaged over the batches that parsed correctly. Compare with mean ns to see what bulk parsing amortizes. Blank only when no sampled batch parsed correctly.",
         _ => return None,
     })
 }
@@ -2029,6 +2137,7 @@ fn perf_table(d: &DialectData) -> Element {
         "median ns",
         "p90 ns",
         "mean ns",
+        "batch ok%",
         "batch ns/stmt",
         "missed %",
         "RT %",
@@ -2046,6 +2155,7 @@ fn perf_table(d: &DialectData) -> Element {
                 Cell::ns(Some(p.median)),
                 Cell::ns(Some(p.p90)),
                 Cell::ns(Some(p.mean)),
+                Cell::pct(batch_of(d, &p.parser).and_then(|x| x.accuracy_pct)),
                 Cell::ns(batch_of(d, &p.parser).and_then(|x| x.ns_per_stmt)),
                 Cell::with(missed_pct(d, p), missed_val(d, p)),
                 Cell::pct(p.roundtrip_pct),
@@ -2059,7 +2169,7 @@ fn perf_table(d: &DialectData) -> Element {
                 "Speed"
             }
             p { class: "table-cap",
-                "One row per parser. \"median ns\" and \"p90 ns\" are per-statement parse times in nanoseconds (lower is faster). \"mean ns\" is the per-statement average, and \"batch ns/stmt\" is the whole accepted set parsed as one script divided by its statement count, so comparing those two adjacent averages shows what bulk parsing saves or costs (batch blank where not measured or no batch entry point). \"missed %\" is the share of expected statements not accepted, \"RT %\" the round-trip rate, the share of accepted statements that re-parse unchanged."
+                "One row per parser. \"median ns\" and \"p90 ns\" are per-statement parse times in nanoseconds (lower is faster). \"mean ns\" is the per-statement average. \"batch ok%\" is the share of 200 random 128-statement scripts (built from statements the parser accepts) that reparse to the exact count, and \"batch ns/stmt\" is the per-statement time over the scripts that did, so comparing it to the mean shows what bulk parsing saves or costs. \"missed %\" is the share of expected statements not accepted, \"RT %\" the round-trip rate, the share of accepted statements that re-parse unchanged."
             }
             SortTable {
                 caption: format!("Per-parser parse time in nanoseconds for {}", d.display_name),
@@ -2134,7 +2244,7 @@ fn memory_table(d: &DialectData) -> Element {
                 "Memory"
             }
             p { class: "table-cap",
-                "Bytes per statement, measured with a counting allocator. \"peak\" is the high-water mark of live memory during the parse, \"retained\" what the produced AST keeps alive afterwards. \"peak mean\" and \"retained mean\" are the per-statement averages, and \"batch peak/stmt\" and \"batch ret/stmt\" are the same over the whole accepted set parsed as one script divided by its statement count, so compare each batch column to the adjacent mean (batch retained is higher when every statement's AST is held at once, blank where not measured or no batch entry point). The libpg_query bindings are omitted (they parse in C, invisible to the Rust allocator)."
+                "Bytes per statement, measured with a counting allocator. \"peak\" is the high-water mark of live memory during the parse, \"retained\" what the produced AST keeps alive afterwards. \"peak mean\" and \"retained mean\" are the per-statement averages, and \"batch peak/stmt\" and \"batch ret/stmt\" are the same over random 128-statement scripts, averaged over the ones that reparsed correctly, so compare each batch column to the adjacent mean (batch retained is higher when every statement's AST is held at once). The libpg_query bindings are omitted (they parse in C, invisible to the Rust allocator)."
             }
             div { class: "charts",
                 {chart_figure(&format!("chart-{}-mempeak-ecdf", d.dir_name), &peak_ecdf, &format!("Empirical CDF of peak memory for {}, one curve per parser.", d.display_name), "Peak live memory per parse, one curve per parser. Further left is leaner (log scale).", &format!("{}-peak-memory-ecdf", d.dir_name))}
@@ -2156,7 +2266,7 @@ fn memory_table(d: &DialectData) -> Element {
 fn correctness_table(d: &DialectData) -> Element {
     let reference = d.has_reference;
     let columns: Vec<String> = if reference {
-        ["recall", "false pos", "round-trip", "fidelity"]
+        ["recall", "false pos", "round-trip"]
             .iter()
             .map(ToString::to_string)
             .collect()
@@ -2177,7 +2287,6 @@ fn correctness_table(d: &DialectData) -> Element {
                     Cell::pct(m.recall_pct),
                     Cell::pct(m.false_positive_pct),
                     Cell::pct(m.roundtrip_pct),
-                    Cell::pct(m.fidelity_pct),
                 ]
             } else {
                 vec![Cell::pct(m.accept_pct), Cell::pct(m.roundtrip_pct)]
@@ -2192,7 +2301,7 @@ fn correctness_table(d: &DialectData) -> Element {
             }
             p { class: "table-cap",
                 if reference {
-                    "One row per parser, graded against this dialect's reference parser. \"recall\" is the share of reference-valid statements accepted (agreement with the reference on valid SQL, not whether the parser runs). \"false pos\" is the share of invalid statements wrongly accepted (lower is better). \"round-trip\" is the share of accepted statements that re-parse unchanged, \"fidelity\" the share whose printed form matches the original."
+                    "One row per parser, graded against this dialect's reference parser. \"recall\" is the share of reference-valid statements accepted (agreement with the reference on valid SQL, not whether the parser runs). \"false pos\" is the share of invalid statements wrongly accepted (lower is better). \"round-trip\" is the share of accepted statements that re-parse unchanged."
                 } else {
                     "One row per parser. With no reference parser here, every statement counts as expected-valid. \"accept\" is the share of the corpus accepted, \"round-trip\" the share of accepted statements that re-parse unchanged."
                 }
diff --git a/web/src/descriptions.rs b/web/src/descriptions.rs
index fedbdb2..9b44f0a 100644
--- a/web/src/descriptions.rs
+++ b/web/src/descriptions.rs
@@ -33,14 +33,14 @@ pub fn dialect_blurb(dir: &str) -> &'static str {
 #[must_use]
 pub fn parser_blurb(name: &str) -> &'static str {
     match name {
-        "sqlparser-rs" => "[sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs) (crate [sqlparser](https://crates.io/crates/sqlparser)) is a pure-Rust, hand-written parser maintained under [Apache DataFusion](https://github.com/apache/datafusion). It models many dialects via a pluggable Dialect trait and prints its AST back to SQL, so it is graded for round-trip and fidelity. It is the most widely used SQL parser in Rust.",
+        "sqlparser-rs" => "[sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs) (crate [sqlparser](https://crates.io/crates/sqlparser)) is a pure-Rust, hand-written parser maintained under [Apache DataFusion](https://github.com/apache/datafusion). It models many dialects via a pluggable Dialect trait and prints its AST back to SQL, so it is graded for round-trip. It is the most widely used SQL parser in Rust.",
         "pg_query.rs" => "[pg_query.rs](https://github.com/pganalyze/pg_query.rs) (crate [pg_query](https://crates.io/crates/pg_query)) wraps [libpg_query](https://github.com/pganalyze/libpg_query), the PostgreSQL server's own C parser, through Rust FFI. As the very code PostgreSQL runs, it is the reference for the PostgreSQL dialect, and it can deparse back to SQL. It models only PostgreSQL.",
         "pg_query (summary)" => "pg_query (summary) runs the same [libpg_query](https://github.com/pganalyze/libpg_query) parse as [pg_query.rs](https://github.com/pganalyze/pg_query.rs) (crate [pg_query](https://crates.io/crates/pg_query)) but returns a compact C-side summary instead of the full parse tree, which is much faster. It shows the parse-only throughput of the reference parser. It models only PostgreSQL.",
         "qusql-parse" => "[qusql-parse](https://crates.io/crates/qusql-parse) is a pure-Rust, zero-copy parser with dialect-aware options and ranked diagnostics rather than first-error failure. It runs here in PostgreSQL, MariaDB, and SQLite modes. Its coverage of complex SELECT statements is partial.",
-        "polyglot-sql" => "[polyglot-sql](https://github.com/tobilg/polyglot) (crate [polyglot-sql](https://crates.io/crates/polyglot-sql)) is a young pure-Rust parser and transpiler covering many dialects from one grammar. It regenerates its AST as SQL, so it is graded for round-trip and fidelity. Its per-call setup cost amortizes over large batches.",
+        "polyglot-sql" => "[polyglot-sql](https://github.com/tobilg/polyglot) (crate [polyglot-sql](https://crates.io/crates/polyglot-sql)) is a young pure-Rust parser and transpiler covering many dialects from one grammar. It regenerates its AST as SQL, so it is graded for round-trip. Its per-call setup cost amortizes over large batches.",
         "databend-common-ast" => "[databend-common-ast](https://crates.io/crates/databend-common-ast) is the SQL front end of [Databend](https://github.com/datafuselabs/databend), a Rust cloud data warehouse. It pairs a zero-copy tokenizer with a custom Pratt parser, offers PostgreSQL, MySQL, and Hive modes, and prints its AST back to SQL.",
-        "orql" => "[orql](https://codeberg.org/xitep/orql) is a pure-Rust Oracle SQL parser focused on SELECT statements, added at its author's request. It models only the Oracle dialect, so it appears on the Oracle results alone. It does not pretty-print, so round-trip and fidelity are n/a.",
-        "sqlglot-rust" => "[sqlglot-rust](https://crates.io/crates/sqlglot-rust) is a Rust parser and transpiler in the spirit of Python's [SQLGlot](https://github.com/tobymao/sqlglot), covering many dialects. It regenerates its AST as SQL, so it is graded for round-trip and fidelity across the dialects it models.",
+        "orql" => "[orql](https://codeberg.org/xitep/orql) is a pure-Rust Oracle SQL parser focused on SELECT statements, added at its author's request. It models only the Oracle dialect, so it appears on the Oracle results alone. It does not pretty-print, so round-trip is n/a.",
+        "sqlglot-rust" => "[sqlglot-rust](https://crates.io/crates/sqlglot-rust) is a Rust parser and transpiler in the spirit of Python's [SQLGlot](https://github.com/tobymao/sqlglot), covering many dialects. It regenerates its AST as SQL, so it is graded for round-trip across the dialects it models.",
         "sqlite3-parser" => "[sqlite3-parser](https://crates.io/crates/sqlite3-parser) (also known as [lemon-rs](https://github.com/gwenn/lemon-rs)) is a pure-Rust streaming lexer and LALR parser reimplementing SQLite's grammar. It models only SQLite and provides the SQLite reference. It can reprint statements, so it is graded for round-trip on SQLite.",
         "turso_parser" => "[turso_parser](https://crates.io/crates/turso_parser) is the SQL front end of [Turso](https://github.com/tursodatabase/turso), a from-scratch Rust rewrite of SQLite (formerly Limbo). It pairs a lemon-generated token table with a hand-written recursive-descent parser for SQLite's grammar, so unlike sqlite3-parser's LALR tables the parsing is hand-rolled. It models only SQLite and can reprint statements, so it is graded for round-trip on SQLite.",
         _ => "",
diff --git a/web/src/main.rs b/web/src/main.rs
index 901cdc3..7366c74 100644
--- a/web/src/main.rs
+++ b/web/src/main.rs
@@ -14,6 +14,7 @@ mod descriptions;
 mod dialect_meta;
 mod logos;
 mod metadata;
+mod score;
 
 use components::{DialectView, Overview, ParserView, Shell};
 
diff --git a/web/src/score.rs b/web/src/score.rs
new file mode 100644
index 0000000..001794d
--- /dev/null
+++ b/web/src/score.rs
@@ -0,0 +1,347 @@
+//! A single composite "overall score" per parser, on a 0 to 100 scale, that
+//! folds every dimension the benchmark measures into one correctness-first
+//! number, alongside the five sub-scores it is built from.
+//!
+//! Methodology (kept deliberately transparent: the sub-scores are always shown):
+//!
+//! - In-scope only. A parser is judged solely on the dialects it actually
+//!   models. It is never zeroed for a dialect it never claimed to support, and
+//!   breadth is not itself rewarded: a focused parser that masters its scope can
+//!   outrank a broad one that is mediocre everywhere.
+//! - Five sub-scores, each 0 to 100: correctness, robustness, speed, memory, and
+//!   project health. The overall is their weighted blend,
+//!   `0.45 correctness + 0.20 robustness + 0.15 health + 0.12 speed + 0.08 memory`.
+//!   Any sub-score that does not apply to a parser (for example memory for an FFI
+//!   binding, whose allocations are invisible to the Rust allocator) is dropped
+//!   and the remaining weights are renormalized, so nothing is penalized for a
+//!   dimension that cannot be measured.
+//! - Correctness and health are absolute: they read straight off the measured
+//!   rates and the recorded project facts. Speed and memory are relative to the
+//!   field within each dialect (parse times and footprints span orders of
+//!   magnitude), so a parser is ranked against the peers it competes with on each
+//!   dialect, then averaged over its dialects.
+
+use crate::cadence::Cadence;
+use crate::data::{bundle, panic_totals, parser_depth, parser_features};
+use crate::metadata::{license_ok, maintained, parser_meta, Fuzz};
+use std::collections::BTreeMap;
+use std::sync::OnceLock;
+
+/// Weights of the five sub-scores in the overall blend. Correctness leads,
+/// safety next, then project health, with speed and memory as tiebreakers.
+const W_CORRECTNESS: f64 = 0.45;
+const W_ROBUSTNESS: f64 = 0.20;
+const W_HEALTH: f64 = 0.15;
+const W_SPEED: f64 = 0.12;
+const W_MEMORY: f64 = 0.08;
+
+/// One parser's composite score and the sub-scores behind it, each 0 to 100.
+/// A sub-score is `None` when the dimension does not apply to the parser.
+#[derive(Clone, Copy, PartialEq)]
+pub struct ParserScore {
+    /// Weighted blend of the available sub-scores, 0 to 100.
+    pub overall: f64,
+    pub correctness: Option<f64>,
+    pub robustness: Option<f64>,
+    pub speed: Option<f64>,
+    pub memory: Option<f64>,
+    pub health: Option<f64>,
+}
+
+/// The composite score for one parser by display name, if it can be scored.
+#[must_use]
+pub fn parser_score(name: &str) -> Option<&'static ParserScore> {
+    all_scores().get(name)
+}
+
+/// Every parser's score, computed once. Speed and memory need the whole field
+/// (they are relative within each dialect), so all parsers are scored together.
+#[must_use]
+pub fn all_scores() -> &'static BTreeMap<String, ParserScore> {
+    static CACHE: OnceLock<BTreeMap<String, ParserScore>> = OnceLock::new();
+    CACHE.get_or_init(compute_all)
+}
+
+fn compute_all() -> BTreeMap<String, ParserScore> {
+    let b = bundle();
+    let mut out = BTreeMap::new();
+    for parser in &b.parsers {
+        let correctness = correctness_score(parser);
+        let robustness = robustness_score(parser);
+        let speed = speed_score(parser);
+        let memory = memory_score(parser);
+        let health = health_score(parser);
+
+        // Weighted blend over the sub-scores that apply, weights renormalized.
+        let parts = [
+            (correctness, W_CORRECTNESS),
+            (robustness, W_ROBUSTNESS),
+            (health, W_HEALTH),
+            (speed, W_SPEED),
+            (memory, W_MEMORY),
+        ];
+        let mut sum = 0.0;
+        let mut wsum = 0.0;
+        for (v, w) in parts {
+            if let Some(v) = v {
+                sum += v * w;
+                wsum += w;
+            }
+        }
+        let overall = if wsum > 0.0 { sum / wsum } else { 0.0 };
+
+        out.insert(
+            parser.clone(),
+            ParserScore {
+                overall,
+                correctness,
+                robustness,
+                speed,
+                memory,
+                health,
+            },
+        );
+    }
+    out
+}
+
+/// Correctness, 0 to 100: per dialect, blend the measured rates (primary
+/// recall or acceptance, plus false-positive avoidance and round-trip where
+/// they exist), then average over the dialects the parser models.
+fn correctness_score(parser: &str) -> Option<f64> {
+    let mut per_dialect = Vec::new();
+    for d in &bundle().dialects {
+        let Some(m) = d.correctness.iter().find(|m| m.parser == parser) else {
+            continue;
+        };
+        // Primary signal: recall on reference dialects, acceptance elsewhere.
+        let Some(primary) = m.recall_pct.or(m.accept_pct) else {
+            continue;
+        };
+        let mut num = 0.5 * (primary / 100.0);
+        let mut den = 0.5;
+        if let Some(fp) = m.false_positive_pct {
+            num += 0.2 * (1.0 - fp / 100.0);
+            den += 0.2;
+        }
+        if let Some(rt) = m.roundtrip_pct {
+            num += 0.15 * (rt / 100.0);
+            den += 0.15;
+        }
+        per_dialect.push(num / den);
+    }
+    mean(&per_dialect).map(|v| v * 100.0)
+}
+
+/// Robustness, 0 to 100: how safely the parser behaves. Blends the observed
+/// panic rate on the real corpus (weighted most, it is behavior not a proxy),
+/// recursion-depth guarding, the unsafe surface, and static panic discipline.
+fn robustness_score(parser: &str) -> Option<f64> {
+    let mut num = 0.0;
+    let mut den = 0.0;
+
+    // Empirical panic safety: 1 minus the share of statements that panicked.
+    if let Some((panicked, attempted)) = panic_totals(parser) {
+        if attempted > 0 {
+            num += 0.40 * (1.0 - panicked as f64 / attempted as f64);
+            den += 0.40;
+        }
+    }
+
+    // Recursion depth: full credit when the parser never overflows the stack up
+    // to the probe ceiling, otherwise partial credit scaled by how deep it got
+    // before crashing (a crash at 5000 is far less alarming than one at 200).
+    if let Some(depth) = parser_depth(parser) {
+        let v = match depth.crash_depth {
+            None => 1.0,
+            Some(c) => 0.5 * (c as f64 / depth.ceil.max(1) as f64).min(1.0),
+        };
+        num += 0.25 * v;
+        den += 0.25;
+    }
+
+    if let Some(f) = parser_features(parser) {
+        // Unsafe surface: clean when it forbids unsafe or has none, else it
+        // decays with the count of unsafe blocks, fns, and impls.
+        let unsafe_total = f.counts.unsafe_total();
+        let unsafe_v = if f.forbids_unsafe || unsafe_total == 0 {
+            1.0
+        } else {
+            (1.0 - unsafe_total as f64 / 50.0).max(0.0)
+        };
+        num += 0.20 * unsafe_v;
+        den += 0.20;
+
+        // Static panic discipline: full credit when the crate bans the panicking
+        // lints by design, otherwise it decays with the density of panic-prone
+        // constructs per thousand non-test lines.
+        let banned = f.lints.is_banned("unwrap_used")
+            || f.lints.is_banned("panic")
+            || f.lints.is_banned("expect_used");
+        let disc_v = if banned {
+            1.0
+        } else {
+            let prone = (f.counts.hard_panics() + f.counts.unwrap + f.counts.expect) as f64;
+            let per_kloc = prone / f.counts.code_loc.max(1) as f64 * 1000.0;
+            (1.0 - per_kloc / 20.0).max(0.0)
+        };
+        num += 0.15 * disc_v;
+        den += 0.15;
+    }
+
+    (den > 0.0).then(|| num / den * 100.0)
+}
+
+/// Speed, 0 to 100: the parser's median parse time ranked against the field
+/// within each dialect on a log scale, averaged over its dialects.
+fn speed_score(parser: &str) -> Option<f64> {
+    relative_score(parser, |perf| (perf.n_accepted > 0).then_some(perf.median))
+}
+
+/// Memory, 0 to 100: peak and retained per-statement footprints, each ranked
+/// against the field within each dialect on a log scale and averaged. `None`
+/// for FFI parsers, whose C-side allocations the Rust allocator never sees.
+fn memory_score(parser: &str) -> Option<f64> {
+    let b = bundle();
+    let mut per_dialect = Vec::new();
+    for d in &b.dialects {
+        if d.memory.iter().all(|m| m.parser != parser) {
+            continue;
+        }
+        let peak: Vec<f64> = d.memory.iter().map(|m| m.peak.median).collect();
+        let retained: Vec<f64> = d.memory.iter().map(|m| m.retained.median).collect();
+        let mine = d.memory.iter().find(|m| m.parser == parser).unwrap();
+        let rp = relative_log(mine.peak.median, &peak);
+        let rr = relative_log(mine.retained.median, &retained);
+        match (rp, rr) {
+            (Some(a), Some(c)) => per_dialect.push((a + c) / 2.0),
+            (Some(a), None) | (None, Some(a)) => per_dialect.push(a),
+            (None, None) => {}
+        }
+    }
+    mean(&per_dialect).map(|v| v * 100.0)
+}
+
+/// Project health, 0 to 100: an unweighted average of engineering-practice
+/// indicators (maintenance, testing, fuzzing, sanitizers, supply-chain gates,
+/// licensing, release cadence, contributor depth). Deliberately excludes
+/// popularity proxies like stars and downloads: this is a merit signal.
+fn health_score(parser: &str) -> Option<f64> {
+    let m = parser_meta(parser)?;
+    let fuzz = match m.fuzz {
+        Fuzz::Yes => 1.0,
+        Fuzz::Upstream => 0.7,
+        Fuzz::No => 0.0,
+    };
+    let cadence = match m.cadence {
+        Cadence::Rolling | Cadence::Monthly => 1.0,
+        Cadence::Quarterly => 0.8,
+        Cadence::Yearly => 0.5,
+        Cadence::Irregular => 0.4,
+        Cadence::Multiyear => 0.3,
+        Cadence::Dormant => 0.0,
+    };
+    let indicators = [
+        f64::from(maintained(m.last_release)),
+        fuzz,
+        f64::from(m.tests),
+        f64::from(m.benches),
+        f64::from(license_ok(m.license)),
+        f64::from(m.crates_io),
+        f64::from(!m.sanitizers.is_empty()),
+        f64::from(m.cargo_audit),
+        f64::from(m.cargo_deny),
+        f64::from(m.cargo_mutants),
+        cadence,
+        // Bus-factor proxy: ten or more distinct contributors is full credit.
+        (f64::from(m.contributors) / 10.0).min(1.0),
+    ];
+    mean(&indicators).map(|v| v * 100.0)
+}
+
+/// Average a parser's per-dialect relative rank for a timing field (lower is
+/// better), over the dialects where it and at least one peer have the field.
+fn relative_score(parser: &str, pick: impl Fn(&viz::ParserPerf) -> Option<f64>) -> Option<f64> {
+    let b = bundle();
+    let mut per_dialect = Vec::new();
+    for d in &b.dialects {
+        let field: Vec<f64> = d.perf.iter().filter_map(&pick).collect();
+        let Some(mine) = d.perf.iter().find(|p| p.parser == parser).and_then(&pick) else {
+            continue;
+        };
+        if let Some(v) = relative_log(mine, &field) {
+            per_dialect.push(v);
+        }
+    }
+    mean(&per_dialect).map(|v| v * 100.0)
+}
+
+/// Position of `value` within `field` on a log scale, where the smallest value
+/// in the field scores 1.0 and the largest 0.0 (lower is better). Returns `None`
+/// when the field has no spread (single competitor), so it does not distort the
+/// average with an arbitrary 1.0.
+fn relative_log(value: f64, field: &[f64]) -> Option<f64> {
+    let lo = field.iter().copied().fold(f64::INFINITY, f64::min);
+    let hi = field.iter().copied().fold(f64::NEG_INFINITY, f64::max);
+    if !(lo.is_finite() && hi.is_finite()) || hi <= lo {
+        return None;
+    }
+    let l = |x: f64| x.max(1.0).ln();
+    Some(((l(hi) - l(value)) / (l(hi) - l(lo))).clamp(0.0, 1.0))
+}
+
+/// Mean of a slice, or `None` when empty.
+fn mean(xs: &[f64]) -> Option<f64> {
+    (!xs.is_empty()).then(|| xs.iter().sum::<f64>() / xs.len() as f64)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn every_parser_scores_in_range() {
+        for (name, s) in all_scores() {
+            assert!(
+                (0.0..=100.0).contains(&s.overall),
+                "{name} overall {} out of range",
+                s.overall
+            );
+            for v in [s.correctness, s.robustness, s.speed, s.memory, s.health]
+                .into_iter()
+                .flatten()
+            {
+                assert!(
+                    (0.0..=100.0).contains(&v),
+                    "{name} sub-score {v} out of range"
+                );
+            }
+            // Correctness and health apply to every benchmarked parser.
+            assert!(s.correctness.is_some(), "{name} has no correctness score");
+            assert!(s.health.is_some(), "{name} has no health score");
+        }
+    }
+
+    #[test]
+    fn print_leaderboard() {
+        let mut rows: Vec<_> = all_scores().iter().collect();
+        rows.sort_by(|a, b| b.1.overall.partial_cmp(&a.1.overall).unwrap());
+        let f = |o: Option<f64>| o.map_or_else(|| "  n/a".to_string(), |v| format!("{v:5.1}"));
+        println!(
+            "\n{:<22} {:>7} {:>6} {:>6} {:>6} {:>6} {:>6}",
+            "parser", "overall", "corr", "robust", "speed", "mem", "health"
+        );
+        for (name, s) in rows {
+            println!(
+                "{:<22} {:>7.1} {} {} {} {} {}",
+                name,
+                s.overall,
+                f(s.correctness),
+                f(s.robustness),
+                f(s.speed),
+                f(s.memory),
+                f(s.health)
+            );
+        }
+    }
+}
diff --git a/web/static/failures/clickhouse__polyglot_sql.tsv.zst b/web/static/failures/clickhouse__polyglot_sql.tsv.zst
index 608ee18..c1a6cd1 100644
Binary files a/web/static/failures/clickhouse__polyglot_sql.tsv.zst and b/web/static/failures/clickhouse__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/clickhouse__sqlglot_rust.tsv.zst b/web/static/failures/clickhouse__sqlglot_rust.tsv.zst
index db85330..e2b4b46 100644
Binary files a/web/static/failures/clickhouse__sqlglot_rust.tsv.zst and b/web/static/failures/clickhouse__sqlglot_rust.tsv.zst differ
diff --git a/web/static/failures/hive__sqlglot_rust.tsv.zst b/web/static/failures/hive__sqlglot_rust.tsv.zst
index 50560fd..8b78ffd 100644
Binary files a/web/static/failures/hive__sqlglot_rust.tsv.zst and b/web/static/failures/hive__sqlglot_rust.tsv.zst differ
diff --git a/web/static/failures/multi__polyglot_sql.tsv.zst b/web/static/failures/multi__polyglot_sql.tsv.zst
index e5727a2..2fcf40d 100644
Binary files a/web/static/failures/multi__polyglot_sql.tsv.zst and b/web/static/failures/multi__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/multi__sqlparser_rs.tsv.zst b/web/static/failures/multi__sqlparser_rs.tsv.zst
index b347e5a..2623839 100644
Binary files a/web/static/failures/multi__sqlparser_rs.tsv.zst and b/web/static/failures/multi__sqlparser_rs.tsv.zst differ
diff --git a/web/static/failures/oracle__orql.tsv.zst b/web/static/failures/oracle__orql.tsv.zst
index 7098179..52f48e9 100644
Binary files a/web/static/failures/oracle__orql.tsv.zst and b/web/static/failures/oracle__orql.tsv.zst differ
diff --git a/web/static/failures/oracle__polyglot_sql.tsv.zst b/web/static/failures/oracle__polyglot_sql.tsv.zst
index 3026bef..9cb0972 100644
Binary files a/web/static/failures/oracle__polyglot_sql.tsv.zst and b/web/static/failures/oracle__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/oracle__sqlglot_rust.tsv.zst b/web/static/failures/oracle__sqlglot_rust.tsv.zst
index fe720af..1a6fcbc 100644
Binary files a/web/static/failures/oracle__sqlglot_rust.tsv.zst and b/web/static/failures/oracle__sqlglot_rust.tsv.zst differ
diff --git a/web/static/failures/oracle__sqlparser_rs.tsv.zst b/web/static/failures/oracle__sqlparser_rs.tsv.zst
index 133f105..1c97162 100644
Binary files a/web/static/failures/oracle__sqlparser_rs.tsv.zst and b/web/static/failures/oracle__sqlparser_rs.tsv.zst differ
diff --git a/web/static/failures/postgresql__databend_common_ast.tsv.zst b/web/static/failures/postgresql__databend_common_ast.tsv.zst
index 4e60086..a8c0fdc 100644
Binary files a/web/static/failures/postgresql__databend_common_ast.tsv.zst and b/web/static/failures/postgresql__databend_common_ast.tsv.zst differ
diff --git a/web/static/failures/postgresql__pg_query__summary_.tsv.zst b/web/static/failures/postgresql__pg_query__summary_.tsv.zst
index b13ae2d..2e09d1e 100644
Binary files a/web/static/failures/postgresql__pg_query__summary_.tsv.zst and b/web/static/failures/postgresql__pg_query__summary_.tsv.zst differ
diff --git a/web/static/failures/postgresql__pg_query_rs.tsv.zst b/web/static/failures/postgresql__pg_query_rs.tsv.zst
index b13ae2d..2e09d1e 100644
Binary files a/web/static/failures/postgresql__pg_query_rs.tsv.zst and b/web/static/failures/postgresql__pg_query_rs.tsv.zst differ
diff --git a/web/static/failures/postgresql__polyglot_sql.tsv.zst b/web/static/failures/postgresql__polyglot_sql.tsv.zst
index 35c4875..dc41d33 100644
Binary files a/web/static/failures/postgresql__polyglot_sql.tsv.zst and b/web/static/failures/postgresql__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/postgresql__qusql_parse.tsv.zst b/web/static/failures/postgresql__qusql_parse.tsv.zst
index 6681fbb..7f2f591 100644
Binary files a/web/static/failures/postgresql__qusql_parse.tsv.zst and b/web/static/failures/postgresql__qusql_parse.tsv.zst differ
diff --git a/web/static/failures/postgresql__sqlglot_rust.tsv.zst b/web/static/failures/postgresql__sqlglot_rust.tsv.zst
index 8050096..d6cb5b4 100644
Binary files a/web/static/failures/postgresql__sqlglot_rust.tsv.zst and b/web/static/failures/postgresql__sqlglot_rust.tsv.zst differ
diff --git a/web/static/failures/postgresql__sqlparser_rs.tsv.zst b/web/static/failures/postgresql__sqlparser_rs.tsv.zst
index ec37e0f..e956e2b 100644
Binary files a/web/static/failures/postgresql__sqlparser_rs.tsv.zst and b/web/static/failures/postgresql__sqlparser_rs.tsv.zst differ
diff --git a/web/static/failures/redshift__polyglot_sql.tsv.zst b/web/static/failures/redshift__polyglot_sql.tsv.zst
index b7dc2ea..72a3959 100644
Binary files a/web/static/failures/redshift__polyglot_sql.tsv.zst and b/web/static/failures/redshift__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/redshift__sqlparser_rs.tsv.zst b/web/static/failures/redshift__sqlparser_rs.tsv.zst
index 9da85de..3bd0d92 100644
Binary files a/web/static/failures/redshift__sqlparser_rs.tsv.zst and b/web/static/failures/redshift__sqlparser_rs.tsv.zst differ
diff --git a/web/static/failures/spark_sql__polyglot_sql.tsv.zst b/web/static/failures/spark_sql__polyglot_sql.tsv.zst
index f4f1ff5..0966a60 100644
Binary files a/web/static/failures/spark_sql__polyglot_sql.tsv.zst and b/web/static/failures/spark_sql__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/spark_sql__sqlglot_rust.tsv.zst b/web/static/failures/spark_sql__sqlglot_rust.tsv.zst
index 1a1ec19..2b4748e 100644
Binary files a/web/static/failures/spark_sql__sqlglot_rust.tsv.zst and b/web/static/failures/spark_sql__sqlglot_rust.tsv.zst differ
diff --git a/web/static/failures/spark_sql__sqlparser_rs.tsv.zst b/web/static/failures/spark_sql__sqlparser_rs.tsv.zst
index 76f014f..0899370 100644
Binary files a/web/static/failures/spark_sql__sqlparser_rs.tsv.zst and b/web/static/failures/spark_sql__sqlparser_rs.tsv.zst differ
diff --git a/web/static/failures/sqlite__polyglot_sql.tsv.zst b/web/static/failures/sqlite__polyglot_sql.tsv.zst
index 79e0009..4b434a1 100644
Binary files a/web/static/failures/sqlite__polyglot_sql.tsv.zst and b/web/static/failures/sqlite__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/sqlite__qusql_parse.tsv.zst b/web/static/failures/sqlite__qusql_parse.tsv.zst
index ab1c619..96664a4 100644
Binary files a/web/static/failures/sqlite__qusql_parse.tsv.zst and b/web/static/failures/sqlite__qusql_parse.tsv.zst differ
diff --git a/web/static/failures/sqlite__sqlglot_rust.tsv.zst b/web/static/failures/sqlite__sqlglot_rust.tsv.zst
index 514a5ca..c814825 100644
Binary files a/web/static/failures/sqlite__sqlglot_rust.tsv.zst and b/web/static/failures/sqlite__sqlglot_rust.tsv.zst differ
diff --git a/web/static/failures/sqlite__sqlite3_parser.tsv.zst b/web/static/failures/sqlite__sqlite3_parser.tsv.zst
index ac3a0f0..66de257 100644
Binary files a/web/static/failures/sqlite__sqlite3_parser.tsv.zst and b/web/static/failures/sqlite__sqlite3_parser.tsv.zst differ
diff --git a/web/static/failures/sqlite__sqlparser_rs.tsv.zst b/web/static/failures/sqlite__sqlparser_rs.tsv.zst
index 4fae0d7..bde381a 100644
Binary files a/web/static/failures/sqlite__sqlparser_rs.tsv.zst and b/web/static/failures/sqlite__sqlparser_rs.tsv.zst differ
diff --git a/web/static/failures/sqlite__turso_parser.tsv.zst b/web/static/failures/sqlite__turso_parser.tsv.zst
index a32b62f..9522cdc 100644
Binary files a/web/static/failures/sqlite__turso_parser.tsv.zst and b/web/static/failures/sqlite__turso_parser.tsv.zst differ
diff --git a/web/static/failures/tsql__polyglot_sql.tsv.zst b/web/static/failures/tsql__polyglot_sql.tsv.zst
index a2e8fd5..2f97fe6 100644
Binary files a/web/static/failures/tsql__polyglot_sql.tsv.zst and b/web/static/failures/tsql__polyglot_sql.tsv.zst differ
diff --git a/web/static/failures/tsql__sqlglot_rust.tsv.zst b/web/static/failures/tsql__sqlglot_rust.tsv.zst
index 848859f..bab4acb 100644
Binary files a/web/static/failures/tsql__sqlglot_rust.tsv.zst and b/web/static/failures/tsql__sqlglot_rust.tsv.zst differ
diff --git a/web/static/failures/tsql__sqlparser_rs.tsv.zst b/web/static/failures/tsql__sqlparser_rs.tsv.zst
index 274b024..4ecac18 100644
Binary files a/web/static/failures/tsql__sqlparser_rs.tsv.zst and b/web/static/failures/tsql__sqlparser_rs.tsv.zst differ