PRForge

AI diff evaluation and merge gate for code written by AI coding agents.

PRForge is not a general coding chatbot. It is a review board that evaluates whether an AI-generated patch should be approved, revised, or blocked before merge.

Requirement + acceptance criteria + AI diff
  -> Intent-Match Agent
  -> Diff-Scope Agent
  -> Fake Test Detector
  -> Security Regression Gate
  -> System Test Planner
  -> GitHub PR Bot output
  -> AI Coder Scoreboard
  -> merge verdict

Current Status

PRForge has completed the AI-code-evaluation MVP through the scoreboard layer.

Phase 1: AI Diff Evaluation Core complete.
Phase 2: Fake Test Detector complete.
Phase 3: Security Regression Gate complete.
Phase 4: System Test Planner complete.
Phase 6: GitHub PR Bot / CI Gate output complete.
Phase 7: AI Coder Scoreboard complete.

The recommended product route is still:

AI diff evaluation
  -> fake test detection
  -> security regression gate
  -> real command evidence
  -> GitHub CI gate
  -> long-term AI coder scoreboard

What PRForge Evaluates

Inputs:

original requirement,
acceptance criteria,
AI-generated diff or patch,
changed files,
repo path,
AI tool/source,
requirement type.

Outputs:

intent_match_score
diff_scope_risk
ai_code_score
merge_verdict
required_fixes
test_truth_score
fake_test_risks
missing_test_cases
required_test_fixes
security_score
security_findings
risk_level
safe_validation_plan
block_merge_reason
system_test_matrix
must_run_checks
missing_system_tests
release_risk
pr_review_comment
check_run_summary
inline_annotations
merge_gate_result
ai_coder_scoreboard

Agent Modules

AI Diff Evaluation Core

Intent-Match Agent: checks whether the patch actually solves the request.
Diff-Scope Agent: flags unrelated or risky file changes.
Merge-Judge Agent: produces approve, request_changes, or block_merge.
Evidence Board: stores structured findings used by the verdict.

Fake Test Detector

Test-Truth Agent: identifies tests without meaningful assertions.
Coverage-Gap Agent: detects missing tests for new logic.
Regression-Test Agent: flags happy-path-only, over-mocked, snapshot-heavy, or misleading tests.

Security Regression Gate

Security-Regression Agent: catches secrets, injection risks, unsafe files, logging leaks.
Dependency-Risk Agent: flags dependency changes and package risk.
Auth/Permission Agent: catches auth bypass and permission regressions.
Safe-Validation Agent: outputs defensive validation steps only.

High or critical security regressions can directly block merge.

System Test Planner

Generates merge-readiness checks:

API contract test,
E2E happy path,
E2E negative path,
database migration check,
backward compatibility check,
permission/auth regression test,
performance smoke test,
rollback readiness check,
observability/logging check.

GitHub PR Bot / CI Gate

Produces:

PR review comment,
check run summary,
inline annotations,
merge gate result.

AI Coder Scoreboard

Tracks quality by AI source:

Claude Code,
Codex,
Cursor Agent,
OpenHands,
unknown/custom tools.

Tracked dimensions include pass rate, rework rate, security findings, fake-test rate, average fix time, and final merge rate.

Backend

Run from:

cd D:\ir\PRForge-main\PRForge-main

Create a venv if dependencies are needed:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install fastapi uvicorn pydantic

Start the API:

$env:PYTHONPATH="backend"
python -m uvicorn app.main:app --reload --host 127.0.0.1 --port 8000

Frontend

cd D:\ir\PRForge-main\PRForge-main\frontend
npm install
npm run dev

The UI is a review workbench for pasting AI diffs, criteria, changed files, GitHub PR metadata, and AI tool source.

API

GET /health
POST /api/v1/run
POST /api/v1/github/pr-review
GET /api/v1/scoreboard

Example Payload

{
  "source_type": "ai_diff",
  "title": "Evaluate AI generated diff",
  "description": "Judge whether an AI patch matches the request and is safe to merge.",
  "original_requirement": "Add replay timeline to council output.",
  "acceptance_criteria": ["Return timeline", "Return evidence"],
  "changed_files": ["backend/app/engine.py"],
  "repo_path": ".",
  "ai_tool_source": "Codex",
  "requirement_type": "api",
  "ai_diff": "diff --git a/backend/app/engine.py b/backend/app/engine.py ..."
}

Persistence

The scoreboard module keeps historical AI-coder quality records under backend data storage when evaluations are run.

Validation

Verified locally with backend compilation and manual council runs. pytest was not installed in the local environment during validation, so unit tests require installing test dependencies first.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
examples		examples
frontend		frontend
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRForge

Current Status

What PRForge Evaluates

Agent Modules

AI Diff Evaluation Core

Fake Test Detector

Security Regression Gate

System Test Planner

GitHub PR Bot / CI Gate

AI Coder Scoreboard

Backend

Frontend

API

Example Payload

Persistence

Validation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PRForge

Current Status

What PRForge Evaluates

Agent Modules

AI Diff Evaluation Core

Fake Test Detector

Security Regression Gate

System Test Planner

GitHub PR Bot / CI Gate

AI Coder Scoreboard

Backend

Frontend

API

Example Payload

Persistence

Validation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages