English Reference

These notes explain how to use the templates as a working harness instead of a loose pile of files.

Internal Reference Notes

method-map.md: map common long-running failure modes to the artifact or policy that addresses them first
initializer-agent-playbook.md: what the initializer should leave behind before feature work starts
coding-agent-startup-flow.md: fixed session-start flow for later coding runs
prompt-calibration.md: how to keep root instructions sharp without making them bloated and brittle

Primary Articles

This list is intentionally narrow. A harness means the execution system around the model: the agent loop, tool execution, sandboxing, state, context, verification, termination, orchestration, and observability. General prompt engineering or broad agent-framework articles do not belong in the primary list.

The original three articles remain the backbone of the course:

OpenAI: Harness engineering: leveraging Codex in an agent-first world (2026-02-11): agent-first repositories, repo-local context, custom linting, and structural guardrails.
Anthropic: Effective harnesses for long-running agents (2025-11-26): initializer agent, coding agent, feature list, progress log, and handoff across context windows.
Anthropic: Harness design for long-running application development (2026-03-24): planner / generator / evaluator roles, context resets, harness simplification, and stale assumptions.

Only a few highly relevant 2026 articles are added:

OpenAI: Unrolling the Codex agent loop (2026-01-23): the Codex runtime harness, tool calls, context growth, and loop termination.
Anthropic: Demystifying evals for AI agents (2026-01-09): evaluating the model and harness together, and distinguishing evaluation harnesses from agent harnesses.
LangChain: Improving Deep Agents with harness engineering (2026-02-17): holding the model fixed while improving system prompts, tools, middleware, tracing, and self-verification to move a coding agent from Top 30 to Top 5 on Terminal Bench 2.0.
Thoughtworks / Martin Fowler: Harness engineering for coding agent users (2026-04-02): coding-agent user harnesses as feedforward guides and feedback sensors, with deterministic and inferential controls.
Cursor: Continually improving our agent harness (2026-04-30): treating the harness as a continuously improved product system with offline evals, online metrics, tool-error taxonomy, model-specific tuning, and mid-chat model switching.

2026 Extended References

These are not core course sources, but they are useful when designing specific harness modules. This section only keeps sources whose body directly covers the agent loop, tool execution, context management, verification, sandboxing, control layers, or regression governance. Pure agent products, platform announcements, team case studies, and benchmarks are excluded.

OpenAI: Unlocking the Codex harness: how we built the App Server (2026-02-04): the harness as a reusable App Server protocol with thread lifecycle, resume, fork, diffs, and client integrations.
OpenAI Developers: Run long horizon tasks with Codex (2026-02-23): durable project memory, milestone validation, and done-when examples for long-running tasks.
OpenAI: The next evolution of the Agents SDK (2026-04-15): model-native harnesses, sandbox execution, and file/command execution.
OpenAI: An open-source spec for Codex orchestration: Symphony (2026-04-27): turning an issue tracker or Linear board into a multi-agent control plane.
Anthropic: Building a C compiler with a team of parallel Claudes (2026-02-05): parallel agent teams, task locks, git synchronization, container isolation, and autonomous loops.
Anthropic: Scaling Managed Agents: Decoupling the brain from the hands (2026-04-08): a meta-harness view that separates session, harness, and sandbox as swappable interfaces.
Anthropic: An update on recent Claude Code quality reports (2026-04-23): reasoning effort, context pruning, and system prompts as harness changes that need regression governance.
LangChain: Context Management for Deep Agents (2026-01-28): filesystem offloading, tool-call truncation, summarization, and targeted evals for context-management harnesses.
LangChain: Tuning Deep Agents to Work Well with Different Models (2026-04-29): model-specific harness profiles for prompts, tool names, middleware, and subagent configuration.
LangChain: Continual learning for AI agents (2026-04-05): splitting agent improvement into model, harness, and context layers, powered by traces.
Microsoft: Agent Harness in Agent Framework (2026-03-12): shell/filesystem harnesses, approval flow, hosted shell execution, and context compaction.
Google: Announcing ADK for Java 1.0.0 (2026-03-30): plugins, event compaction, HITL, session/memory services, and A2A as reusable harness primitives.
GitHub: Automate repository tasks with GitHub Agentic Workflows (2026-02-13): GitHub Actions as an agentic workflow runner with safe outputs, sandboxing, permissions, and review.
AWS: AI agents in enterprises: Best practices with Amazon Bedrock AgentCore (2026-02-03): enterprise harness layers across Runtime, Memory, Gateway, Identity/Policy, Observability, and Evaluations.
Stripe: Minions: Stripe's one-shot, end-to-end coding agents (2026-02-09) and Part 2 (2026-02-19): devbox isolation, custom agent harnesses, blueprint state machines, rule files, MCP tool curation, security controls, and pre-push/CI feedback loops.
Cognition: What We Learned Building Cloud Agents (2026-04-23): VM isolation, session snapshot/resume, orchestration, governance, audit logging, and integrations for cloud-agent runtimes.
Cognition: Multi-Agents: What's Actually Working (2026-04-22): generator-verifier loops, clean-context reviewers, smart-friend routing, manager-child coordination, and cross-agent communication boundaries.
Replit: Decision-Time Guidance: Keeping Replit Agent Reliable (2026-01-20, updated 2026-01-23): a lightweight classifier injects short situational guidance at the decision point instead of stuffing all rules into the system prompt.
Vercel: How we made v0 an effective coding agent (2026-01-07): dynamic system prompts, a streaming rewrite layer, and deterministic/model-driven autofixers.
Vercel: Introducing deepsec (2026-05-04): a security-focused coding-agent harness with scan, investigate, revalidate, enrich, export, plugin, and refusal-checker steps.
Sourcegraph: CodeScaleBench (2026-03-03): an eval/tooling harness reference covering MCP tool adoption, tool-use transcripts, benchmark QA, verifier/reproducibility gates, and prompt/preamble iteration.

Strictly 2025-only general references are excluded from the primary list. The original 2025 Anthropic harness article remains because it is a foundation source for the course.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

English Reference

Internal Reference Notes

Primary Articles

2026 Extended References

Suggested Reading Order

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

English Reference

Internal Reference Notes

Primary Articles

2026 Extended References

Suggested Reading Order