bug(cli): --threshold compares mean score instead of per-test score

## Problem

`--threshold` compares the **mean score** against the threshold, but `RESULT:` uses per-test pass/fail (score >= 0.8). This produces contradictory output and exit codes:

```
RESULT: FAIL  (28/31 passed, mean score: 0.927)
Suite score: 0.93 (threshold: 0.80) — PASS     ← exit code 0
```

The output says FAIL but the exit code is 0. Users expect `--threshold 0.8` to mean "each test must score >= 0.8" — matching the per-test requirement.

### Observed in

[WiseTechGlobal/sdd CI run](https://github.com/WiseTechGlobal/sdd/actions/runs/23787163785/job/69314553318) — 28/31 passed, mean 0.927, `--threshold 0.8`

### Root cause

- `formatEvaluationSummary()` — per-test pass/fail (score >= hardcoded 0.8)
- `formatThresholdSummary()` — mean score comparison
- Exit code follows threshold (mean-based), not RESULT (per-test)

## Fix

PR #885 — `--threshold` now overrides the per-test score requirement:

- `calculateEvaluationSummary()` recomputes passed/failed using the threshold
- RESULT line shows the threshold: `28/31 scored >= 0.8`
- Exit code matches RESULT verdict
- Removed separate `formatThresholdSummary()` — one unified output line

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(cli): --threshold compares mean score instead of per-test score #882

Problem

Observed in

Root cause

Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug(cli): --threshold compares mean score instead of per-test score #882

Description

Problem

Observed in

Root cause

Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions