A general-purpose multi-agent research system built on top of the Claude Code plugin. It focuses on decomposing open-ended research tasks, executing sub-agent research, propagating failure information, and synthesizing traceable final reports.
The core design principle is trustworthiness over completeness: the system is designed to express evidence boundaries honestly rather than always producing a confident-sounding answer.
The system is composed of the following layers:
| Layer | Description |
|---|---|
| Orchestrator / Skill | Dynamically plans sub-agent roles and task boundaries based on the research goal and constraints |
| Hooks | Validate sub-agent input/output and persist failure context before and after each execution |
| Schemas | Define the structure for research briefs, agent results, and final reports |
| Runtime state | Persist failure context so the orchestrator can incorporate it into the final synthesis |
| Tests | Cover dynamic role planning, empty result handling, failure propagation, and conflict preservation |
- Dynamic role planning — Sub-agent roles and count are determined per task, not from a fixed template.
- Explicit context passing — Each sub-agent receives
question,purpose,scope, andsearch_strategyexplicitly; no implicit context sharing. - Web-first evidence — Research is grounded in real web search and page retrieval, not local mock data.
- Structured failure propagation — Failure, partial success, and empty results are distinguished and surfaced upstream.
- Conflict preservation — Contradictions between sources are preserved in
conflicting_findings; the final report may returninsufficient_evidence.
metasearch/
├── README.md # This file (English)
├── README_CN.md # Chinese version
├── requirements.txt
├── .claude-plugin/ # Claude Code plugin manifest
├── hooks/
│ ├── hooks.json # Hook configuration
│ └── scripts/
│ ├── pre-run-subagent.py # Validates sub-agent input context
│ ├── post-run-subagent.py # Validates sub-agent output structure
│ └── persist-failure-context.py # Persists failure context to runtime
├── runtime/
│ └── failures.json # Runtime failure log
├── schemas/
│ ├── research-brief.json # Input schema for a research task
│ ├── agent-result.json # Sub-agent output schema
│ └── final-report.json # Final synthesized report schema
├── skills/
│ └── research-orchestrator/
│ └── SKILL.md # Orchestrator workflow definition
└── test/
├── acceptance.md # Manual acceptance scenarios
├── functional_test.py
└── test_hooks.py
Provide a research question, decision goal, and constraints (e.g. allowed source types, time range, region):
{
"question": "Does service X meet enterprise compliance requirements?",
"decision_goal": "Decide whether to integrate X into our internal platform.",
"constraints": {
"allowed_source_types": ["official", "policy", "pricing"],
"time_range": "2024-01-01/2025-12-31",
"region": "CN"
},
"planned_agents": [...]
}The orchestrator reads the brief and generates a set of non-overlapping sub-agent roles, each with a distinct scope and search_strategy.
Each sub-agent receives its full task context explicitly and performs real web searches. It records:
- Actual queries executed
- Sources retrieved (with retrieval status)
- Structured findings (with source URL, evidence quote, confidence)
- Gaps within scope that could not be found
- Errors (timeout, access blocked, validation failure)
pre-run-subagent.py checks that required context fields are present before execution.
post-run-subagent.py checks that the result structure is valid after execution.
persist-failure-context.py writes failure summaries to runtime/failures.json.
The orchestrator reads all agent results and the failure log, then produces a final report that explicitly includes:
confirmed_findings— findings backed by retrievable evidenceconflicting_findings— contradictions between sources, preserved as-isopen_questions— questions raised but not resolvedfailed_agents— agents that errored, with affected scopeempty_agents— agents that returned no resultsrecommendation— one ofadopt / pilot / defer / reject / insufficient_evidencecoverage— an assessment of how well the research covered the question
Research Brief — schemas/research-brief.json
| Field | Required | Description |
|---|---|---|
question |
✓ | The research question |
decision_goal |
✓ | The decision this research supports |
constraints |
✓ | Source types, time range, region |
planned_agents |
✓ | Array of agent definitions (agent, purpose, scope, search_strategy) |
Agent Result — schemas/agent-result.json
| Field | Required | Description |
|---|---|---|
agent |
✓ | Agent identifier |
status |
✓ | ok / partial / error / empty |
queries |
✓ | Search queries actually executed |
sources |
✓ | Retrieved sources with type and retrieval status |
findings |
✓ | Structured findings with evidence and confidence |
gaps |
✓ | Topics within scope that could not be found |
errors |
✓ | Structured error entries when status is error |
Final Report — schemas/final-report.json
| Field | Required | Description |
|---|---|---|
confirmed_findings |
✓ | Cross-agent confirmed findings |
conflicting_findings |
✓ | Contradictions preserved without forced resolution |
open_questions |
✓ | Unresolved questions |
failed_agents |
✓ | Agents that failed, with missing scope |
empty_agents |
✓ | Agents that returned no results |
recommendation.decision |
✓ | adopt / pilot / defer / reject / insufficient_evidence |
coverage |
✓ | Evidence coverage assessment |
- No mock fallback — If no evidence is found, the system must return an empty or insufficient-evidence result, not fabricated content.
- Empty ≠ Error —
emptymeans no results were found;errormeans execution failed. These must not be conflated. - Failure must propagate — Failure context is persisted to
runtime/failures.jsonand must be reflected in the final report. - Conflicts must be preserved — When sources disagree, both sides enter
conflicting_findings; the system must not silently resolve them. - Uncertainty is a valid output —
insufficient_evidenceis an explicit and acceptable recommendation when coverage is too low.
Install test dependencies and run:
python3 -m pip install -r requirements.txt
python3 -m pytest ./test/test_hooks.py ./test/functional_test.py -vtest/test_hooks.py— Tests pre/post hook behavior and failure persistence.test/functional_test.py— Tests end-to-end system behavior across key scenarios; it now supports both pytest collection and direct script execution.
test/acceptance.md defines 8 acceptance scenarios covering:
| ID | Scenario |
|---|---|
| AC-01 | Single-agent research with only official sources |
| AC-02 | Multi-agent research for an integration decision |
| AC-03 | No search results — insufficient public evidence |
| AC-04 | Page retrieval failure — source found but content unavailable |
| AC-05 | Conflicting sources — different claims about the same feature |
| AC-06 | Failure propagation — a sub-agent failure must appear in the final report |
| AC-07 | Mixed outcome — one agent succeeds, one is empty, one fails |
| AC-08 | Role boundary — sub-agents must not duplicate research coverage |
The acceptance bar requires at least 6 of 8 scenarios to pass, covering all major failure modes.
- Runtime: Python 3.11+ standard library for the project implementation
- Testing:
pytest>=8.0
See requirements.txt for the declared test dependency.