Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/daily-agentrx-trace-optimizer.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion .github/workflows/daily-agentrx-trace-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ experiments:
description: "Test whether delegating trajectory-builder, artifacts-summarizer, and failure-pattern-classifier to small-model sub-agents improves recommendation quality vs. inline analysis by the main agent"
hypothesis: "H0: no change in issue quality or run success rate. H1: sub_agents variant yields higher evidence completeness score with equal or lower token cost"
metric: issue_evidence_completeness
secondary_metrics: [run_success_rate, effective_tokens_total, run_duration_ms]
secondary_metrics: [run_success_rate, ai_credits_total, run_duration_ms]
guardrail_metrics:
- name: empty_output_rate
threshold: "<=0.10"
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/daily-cache-strategy-analyzer.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions .github/workflows/daily-cache-strategy-analyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ experiments:
model_size:
variants: [gpt-5.4, gpt-5-codex]
description: "Compares codex-compatible models for cache issue detection quality and efficiency."
hypothesis: "H0: no change in issue creation rate or run success rate. H1: gpt-5-codex reduces effective tokens while keeping run success rate >=0.90."
metric: effective_tokens_total
hypothesis: "H0: no change in issue creation rate or run success rate. H1: gpt-5-codex reduces AI Credits while keeping run success rate >=0.90."
metric: ai_credits_total
secondary_metrics: [run_success_rate, run_duration_ms]
guardrail_metrics:
- name: run_success_rate
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/daily-caveman-optimizer.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion .github/workflows/daily-caveman-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ experiments:
variants: [claude-sonnet-4.6, claude-haiku-4.5]
description: "Tests whether Claude Haiku produces equivalent instruction conciseness improvements at lower token cost versus Claude Sonnet."
hypothesis: "H0: no change in PR creation rate or run success rate. H1: Claude Haiku reduces AI credit usage >=30% with equivalent run success rate (>=0.90)."
metric: effective_tokens_total
metric: ai_credits_total
secondary_metrics: [run_success_rate, run_duration_ms]
guardrail_metrics:
- name: run_success_rate
Expand Down
Loading
Loading