A LangGraph-based multi-agent system for researching benefit programs, validating eligibility criteria, generating test cases, and creating implementation tickets.
This tool automates the research phase of adding new benefit programs to the MyFriendBen platform. It follows the process outlined in plans/qa/AI_PROGRAM_QA_PROCESS.md but runs before implementation begins, ensuring research drives development.
┌─────────────────────────────────────────────────────────────────────────────┐
│ LangGraph State Machine │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Gather Links → Read Fields → Extract Criteria → QA Validate Research │
│ ↓ │
│ [Fix Loop if needed] │
│ ↓ │
│ Generate Tests → QA Validate Tests → [Fix Loop] → Convert JSON │
│ ↓ │
│ QA Validate JSON → [Fix Loop] → Generate Program Config │
│ ↓ │
│ Create Ticket → END │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
- Researcher Agent: Gathers documentation, extracts eligibility criteria, maps to screener fields, generates test cases
- PDF Vision Processing: Automatically converts PDFs to images and uses Claude's vision to extract structured data (headings, bullets, dollar amounts) with 10x better accuracy than text extraction
- QA Agent: Adversarial reviewer that validates research accuracy and test coverage
- Iterative Loops: QA issues trigger fixes until quality threshold met (max 3 iterations)
- Program Config Generation: Auto-generates Django admin import configuration
- JSON Output: Test cases formatted for the benefits-api test_case_schema.json
- Linear Integration: Creates implementation tickets with acceptance criteria (optional)
# Navigate into this repo (whatever you named it locally)
cd program-researcher
# Install Python dependencies
pip install -r requirements.txt
# Install system dependency for PDF processing (macOS)
brew install poppler
# For Linux:
# sudo apt-get install poppler-utilsOr install as an editable package:
pip install -e ".[dev]"
brew install poppler # Still need system dependencyCore dependencies:
langgraph,langchain,langchain-anthropic- AI frameworkpydantic- Data validationhttpx,beautifulsoup4- Web scrapingpdf2image,PyPDF2- NEW: PDF vision processingclick,rich- CLI interface
Note: pdf2image requires the poppler system library for PDF rendering. Install it with your system package manager before running the tool.
Set your Anthropic API key as an environment variable:
export RESEARCH_AGENT_ANTHROPIC_API_KEY=sk-ant-...Optional: Override the schema URL (defaults to the GitHub raw URL for benefits-api):
export RESEARCH_AGENT_SCHEMA_URL=https://raw.githubusercontent.com/MyFriendBen/benefits-api/main/validations/management/commands/import_validations/test_case_schema.jsonOptional: For Linear ticket creation, also set:
export RESEARCH_AGENT_LINEAR_API_KEY=lin_api_...
export RESEARCH_AGENT_LINEAR_TEAM_ID=your-team-id
export RESEARCH_AGENT_LINEAR_PROJECT_ID=your-project-idYou can also create a .env file in the repo root with these values.
Important: Run all commands from within this repo directory using run.py. This ensures imports work regardless of what you named your local clone.
# Navigate into the repo first
cd /path/to/your-repo-name # whatever you named it locally
# Research a program
python run.py research \
--program "CSFP" \
--state "il" \
--white-label "il" \
--source-url "https://www.fns.usda.gov/csfp" \
--source-url "https://www.dhs.state.il.us/page.aspx?item=30513"
# Preview workflow without executing (no API key required)
python run.py research --dry-run \
--program "CSFP" \
--state "il" \
--white-label "il" \
--source-url "https://www.fns.usda.gov/csfp"
# Show graph structure
python run.py graph
# Get help
python run.py --help
python run.py research --helpRun the example script from within the repo:
cd /path/to/your-repo-name
python examples/research_csfp.pyOr create your own script (must be run from within the repo directory):
import asyncio
from program_research_agent.graph import run_research
async def main():
state = await run_research(
program_name="CSFP",
state_code="il",
white_label="il",
source_urls=[
"https://www.fns.usda.gov/csfp",
"https://www.dhs.state.il.us/page.aspx?item=30513",
],
max_iterations=3,
)
print(f"Status: {state.status}")
print(f"Test cases generated: {len(state.json_test_cases)}")
print(f"Linear ticket: {state.linear_ticket_url}")
asyncio.run(main())The tool produces outputs at each step and saves them to timestamped directories for debugging and auditing.
- Link Catalog: All documentation URLs discovered and categorized
- Screener Fields: Available fields from Django models
- Field Mapping: Eligibility criteria mapped to screener fields, with data gaps identified
- QA Results: Validation results at each QA step (with iteration numbers)
- Human Test Cases: 10-15 scenarios for manual QA testing
- JSON Test Cases: Test data in benefits-api
test_case_schema.jsonformat - Program Config: Django admin import configuration (ready to use)
- Linear Ticket: Implementation ticket with acceptance criteria (if Linear configured)
- Workflow Log: Complete execution log
- Summary: Markdown summary of the research run
Each research run creates a timestamped directory:
output/
└── il_csfp_20240115_143022/ # Timestamped run directory
├── ticket_content/ # Files for ticket/review
│ ├── il_csfp_initial_config.json # Django admin config
│ ├── il_csfp_test_cases.json # JSON test cases
│ └── il_csfp_ticket.md # Ticket markdown
├── SUMMARY.md # High-level summary with metrics
├── workflow_log.txt # Complete execution log
├── gather_links.json # Link catalog
├── screener_fields.json # Available screener fields
├── extract_criteria.json # Eligibility criteria and field mapping
├── qa_research_iter1.json # QA validation results (per iteration)
├── generate_tests.json # Human-readable test scenarios
├── qa_tests_iter1.json # Test case QA results
├── convert_json.json # JSON test cases
├── qa_json_iter1.json # JSON QA results
└── generate_program_config.json # Program config generation output
If the workflow fails, SUMMARY.md will include:
- Error details: The specific error message and context
- Next steps: Actionable guidance on how to fix the issue
- Last 15 workflow messages: Full context of what happened
The summary is always saved, even when the workflow fails, so you can diagnose issues.
To run without saving outputs (e.g., for quick testing):
python run.py research --no-save \
--program "CSFP" \
--state "il" \
--white-label "il" \
--source-url "https://www.fns.usda.gov/csfp"The researcher uses Claude's vision capabilities to read PDFs, which dramatically improves accuracy for extracting structured data.
Traditional text extraction loses structure:
"ASSET Household assets may not exceed $75,000 Households..."
Everything becomes a text blob with no formatting cues.
Vision-based extraction preserves layout:
- Sees section headings (ALL CAPS, bold, large text)
- Understands bullet points and indentation
- Identifies emphasized values (bold dollar amounts)
- Reads tables and structured lists naturally
- ✅ Asset limits: "$75,000" (not "$100,000" or missed entirely)
- ✅ Preference criteria: "12 points for residency, 5 points for employment"
- ✅ Age exceptions: "$150,000 for households where all members are 62+"
- ✅ Screening requirements: "credit check, CORI background check, references"
- Text extraction: ~$0.015 per 5-page PDF
- Vision extraction: ~$0.035 per 5-page PDF (+$0.02)
- Result: +133% cost but ~10x better accuracy
For typical research (1-2 PDFs): adds ~$0.04 per program run.
- Fetches provided source URLs
- For PDFs: Converts to PNG images for vision processing
- Extracts all hyperlinks from HTML content
- Identifies legislative citations (U.S. Code, CFR, state statutes)
- Categorizes and titles each link
- Parses Django models from
benefits-be/screener/models.py - Extracts available fields, types, and valid values
- Identifies helper methods for calculations
- Reviews all source documentation
- For PDFs: Uses Claude's vision to read structure (headings, bullets, emphasis)
- Extracts eligibility criteria with citations (including dollar amounts, point values)
- Maps each criterion to screener fields
- Identifies data gaps
- Independent review by QA agent
- Verifies criteria accuracy against sources
- Checks for missed requirements
- Validates field mappings
- Generates test cases one at a time to prevent response truncation
- Creates 14 scenarios across categories (happy path, income thresholds, age thresholds, exclusions, multi-member)
- Includes exact form values and expected outcomes
- Resilient to individual failures - continues generating remaining test cases
- Reviews test coverage
- Validates boundary conditions
- Checks expected outcomes
- Transforms to benefits-api
test_case_schema.jsonformat (fetched from GitHub) - Validates against schema
- Calculates ages from birth dates
- Compares JSON to human-readable source
- Verifies schema compliance
- Checks for data mismatches
- Creates Django admin import configuration
- Extracts official program name and description from research
- Identifies application links and required documents
- Generates config ready for human review
- Formats acceptance criteria
- Includes source documentation
- Embeds program configuration
- Attaches test case files
Tests must be run from the parent directory (e.g., mfb/), not from inside program-researcher/. This is required because pyproject.toml sets where = [".."], so Python resolves the program_research_agent package relative to the parent.
# From mfb/ (parent directory)
python -m pytest program-researcher/tests/ruff check .
mypy .program-researcher/
├── run.py # Entry point script (handles module aliasing)
├── __init__.py # Package exports
├── graph.py # Main LangGraph definition
├── state.py # Pydantic state models
├── config.py # Configuration management
├── cli.py # CLI commands
├── nodes/ # Graph node implementations
│ ├── gather_links.py
│ ├── read_screener_fields.py
│ ├── extract_criteria.py
│ ├── qa_research.py
│ ├── generate_tests.py
│ ├── qa_tests.py
│ ├── convert_json.py
│ ├── qa_json.py
│ └── linear_ticket.py
├── tools/ # Utility tools
│ ├── web_research.py # Web fetching and PDF handling
│ ├── pdf_vision.py # NEW: PDF to image conversion
│ ├── vision_helper.py # NEW: Vision message formatting
│ ├── screener_fields.py
│ ├── schema_validator.py
│ └── output_saver.py # Step output and summary generation
├── prompts/ # Agent system prompts
│ ├── researcher.py
│ └── qa_agent.py
├── tests/ # Test suite
├── examples/ # Example scripts
│ └── research_csfp.py
└── output/ # Generated files (gitignored)
| Environment Variable | Description | Default |
|---|---|---|
RESEARCH_AGENT_ANTHROPIC_API_KEY |
Anthropic API key | Required |
RESEARCH_AGENT_LINEAR_API_KEY |
Linear API key | Optional |
RESEARCH_AGENT_LINEAR_TEAM_ID |
Linear team ID | Optional |
RESEARCH_AGENT_LINEAR_PROJECT_ID |
Linear project ID | Optional |
RESEARCH_AGENT_RESEARCHER_MODEL |
Model for researcher agent | claude-sonnet-4-20250514 |
RESEARCH_AGENT_QA_MODEL |
Model for QA agent | claude-sonnet-4-20250514 |
RESEARCH_AGENT_MAX_QA_ITERATIONS |
Default max QA iterations | 3 |
Internal tool for MyFriendBen platform.