AI diff evaluation and merge gate for code written by AI coding agents.
PRForge is not a general coding chatbot. It is a review board that evaluates whether an AI-generated patch should be approved, revised, or blocked before merge.
Requirement + acceptance criteria + AI diff
-> Intent-Match Agent
-> Diff-Scope Agent
-> Fake Test Detector
-> Security Regression Gate
-> System Test Planner
-> GitHub PR Bot output
-> AI Coder Scoreboard
-> merge verdict
PRForge has completed the AI-code-evaluation MVP through the scoreboard layer.
- Phase 1: AI Diff Evaluation Core complete.
- Phase 2: Fake Test Detector complete.
- Phase 3: Security Regression Gate complete.
- Phase 4: System Test Planner complete.
- Phase 6: GitHub PR Bot / CI Gate output complete.
- Phase 7: AI Coder Scoreboard complete.
The recommended product route is still:
AI diff evaluation
-> fake test detection
-> security regression gate
-> real command evidence
-> GitHub CI gate
-> long-term AI coder scoreboard
Inputs:
- original requirement,
- acceptance criteria,
- AI-generated diff or patch,
- changed files,
- repo path,
- AI tool/source,
- requirement type.
Outputs:
intent_match_scorediff_scope_riskai_code_scoremerge_verdictrequired_fixestest_truth_scorefake_test_risksmissing_test_casesrequired_test_fixessecurity_scoresecurity_findingsrisk_levelsafe_validation_planblock_merge_reasonsystem_test_matrixmust_run_checksmissing_system_testsrelease_riskpr_review_commentcheck_run_summaryinline_annotationsmerge_gate_resultai_coder_scoreboard
- Intent-Match Agent: checks whether the patch actually solves the request.
- Diff-Scope Agent: flags unrelated or risky file changes.
- Merge-Judge Agent: produces
approve,request_changes, orblock_merge. - Evidence Board: stores structured findings used by the verdict.
- Test-Truth Agent: identifies tests without meaningful assertions.
- Coverage-Gap Agent: detects missing tests for new logic.
- Regression-Test Agent: flags happy-path-only, over-mocked, snapshot-heavy, or misleading tests.
- Security-Regression Agent: catches secrets, injection risks, unsafe files, logging leaks.
- Dependency-Risk Agent: flags dependency changes and package risk.
- Auth/Permission Agent: catches auth bypass and permission regressions.
- Safe-Validation Agent: outputs defensive validation steps only.
High or critical security regressions can directly block merge.
Generates merge-readiness checks:
- API contract test,
- E2E happy path,
- E2E negative path,
- database migration check,
- backward compatibility check,
- permission/auth regression test,
- performance smoke test,
- rollback readiness check,
- observability/logging check.
Produces:
- PR review comment,
- check run summary,
- inline annotations,
- merge gate result.
Tracks quality by AI source:
- Claude Code,
- Codex,
- Cursor Agent,
- OpenHands,
- unknown/custom tools.
Tracked dimensions include pass rate, rework rate, security findings, fake-test rate, average fix time, and final merge rate.
Run from:
cd D:\ir\PRForge-main\PRForge-mainCreate a venv if dependencies are needed:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install fastapi uvicorn pydanticStart the API:
$env:PYTHONPATH="backend"
python -m uvicorn app.main:app --reload --host 127.0.0.1 --port 8000cd D:\ir\PRForge-main\PRForge-main\frontend
npm install
npm run devThe UI is a review workbench for pasting AI diffs, criteria, changed files, GitHub PR metadata, and AI tool source.
GET /healthPOST /api/v1/runPOST /api/v1/github/pr-reviewGET /api/v1/scoreboard
{
"source_type": "ai_diff",
"title": "Evaluate AI generated diff",
"description": "Judge whether an AI patch matches the request and is safe to merge.",
"original_requirement": "Add replay timeline to council output.",
"acceptance_criteria": ["Return timeline", "Return evidence"],
"changed_files": ["backend/app/engine.py"],
"repo_path": ".",
"ai_tool_source": "Codex",
"requirement_type": "api",
"ai_diff": "diff --git a/backend/app/engine.py b/backend/app/engine.py ..."
}The scoreboard module keeps historical AI-coder quality records under backend data storage when evaluations are run.
Verified locally with backend compilation and manual council runs. pytest was not installed in the local environment during validation, so unit tests require installing test dependencies first.