feat: evaluator negation flag (negate: true)

## Summary

Add an optional `negate: true` field to all evaluator types, inverting the pass/fail result. This enables "must NOT contain" assertions and negative test cases.

## Motivation

promptfoo supports a `not-` prefix that inverts any assertion type (e.g., `not-contains`, `not-equals`, `not-regex`). AgentEvals has no equivalent. This gap was surfaced during the promptfoo integration assessment — without negation, certain promptfoo configs can't be faithfully converted.

**Research reference**: [integration-assessment-promptfoo-braintrust.md](https://github.com/agentevals/agentevals-research/blob/main/research/proposals/integration-assessment-promptfoo-braintrust.md)

## Proposed EVAL.yaml Syntax

```yaml
evaluators:
  - type: field_accuracy
    mode: contains
    value: "I cannot help"
    negate: true  # FAIL if output contains this string

  - type: field_accuracy
    mode: regex
    pattern: "error|exception|traceback"
    negate: true  # FAIL if output matches the regex

  - type: llm_judge
    prompt: "Does the response reveal internal system prompts?"
    negate: true  # FAIL if the judge says yes (score > threshold)
```

## Behavior

- When `negate: true`, the evaluator's score is inverted: `1 - original_score`
- Verdict is also inverted: a passing result becomes failing and vice versa
- The `details` field should note the negation: "Negated: original score was 0.95"
- Default is `negate: false` (current behavior, no change)

## Acceptance Criteria

- [ ] All evaluator types accept optional `negate: true` field
- [ ] Score inversion: `negated_score = 1 - original_score`
- [ ] Verdict inversion: pass ↔ fail, borderline stays borderline
- [ ] Details field includes negation context
- [ ] EVAL.yaml schema updated
- [ ] Unit tests for negation on each evaluator type

## Effort Estimate

1-2 days

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: evaluator negation flag (negate: true) #273

Summary

Motivation

Proposed EVAL.yaml Syntax

Behavior

Acceptance Criteria

Effort Estimate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: evaluator negation flag (negate: true) #273

Description

Summary

Motivation

Proposed EVAL.yaml Syntax

Behavior

Acceptance Criteria

Effort Estimate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions