Skip to content

bug(cli): --threshold compares mean score instead of per-test score #882

@christso

Description

@christso

Problem

--threshold compares the mean score against the threshold, but RESULT: uses per-test pass/fail (score >= 0.8). This produces contradictory output and exit codes:

RESULT: FAIL  (28/31 passed, mean score: 0.927)
Suite score: 0.93 (threshold: 0.80) — PASS     ← exit code 0

The output says FAIL but the exit code is 0. Users expect --threshold 0.8 to mean "each test must score >= 0.8" — matching the per-test requirement.

Observed in

WiseTechGlobal/sdd CI run — 28/31 passed, mean 0.927, --threshold 0.8

Root cause

  • formatEvaluationSummary() — per-test pass/fail (score >= hardcoded 0.8)
  • formatThresholdSummary() — mean score comparison
  • Exit code follows threshold (mean-based), not RESULT (per-test)

Fix

PR #885--threshold now overrides the per-test score requirement:

  • calculateEvaluationSummary() recomputes passed/failed using the threshold
  • RESULT line shows the threshold: 28/31 scored >= 0.8
  • Exit code matches RESULT verdict
  • Removed separate formatThresholdSummary() — one unified output line

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions