The open-source platform to develop, test, and observe your AI agents.
AgentMark is a complete platform for building reliable AI agents. Define prompts in Markdown, run them with any SDK, evaluate quality with datasets, and trace every call in production.
Prompt management — Write prompts as .prompt.mdx files with type-safe inputs, tool definitions, structured outputs, conditionals, loops, and reusable components.
Observability — Trace every LLM call with OpenTelemetry. View traces locally or forward them to AgentMark Cloud for dashboards, alerts, and collaboration.
Evaluations — Test prompts against datasets with built-in evals. Run experiments from the CLI and gate deployments on quality thresholds.
---
name: customer-support-agent
text_config:
model_name: anthropic/claude-sonnet-4-20250514
max_calls: 2
tools:
search_knowledgebase:
description: Search the knowledge base for shipping, warranty, and returns info.
parameters:
type: object
properties:
query:
type: string
required: [query]
test_settings:
props:
customer_question: "How long does shipping take?"
input_schema:
type: object
properties:
customer_question:
type: string
required: [customer_question]
---
<System>
You are a helpful customer service agent. Use the search_knowledgebase tool
when customers ask about shipping, warranty, or returns.
</System>
<User>{props.customer_question}</User>Run it:
agentmark run-prompt customer-support.prompt.mdxThat's it. The prompt is version-controlled, type-checked, and traceable.
# Scaffold a new project (interactive — picks your language and adapter)
npm create agentmark@latest
# Start the dev server (API + trace UI + hot reload)
agentmark dev
# Run a single prompt
agentmark run-prompt my-prompt.prompt.mdx
# Run an experiment against a dataset
agentmark run-experiment my-prompt.prompt.mdx| Feature | Description |
|---|---|
| Multimodal Generation | Generate text, structured objects, images, and speech from a single prompt file. |
| Tools and Agents | Define tools inline and build agentic loops with max_calls. |
| Structured Output | Type-safe JSON output via JSON Schema definitions. |
| Datasets & Evals | Test prompts against JSONL datasets with built-in and custom evaluators. |
| Tracing | OpenTelemetry-based tracing for every LLM call — local and cloud. |
| Type Safety | Auto-generate TypeScript types from your prompts. JSON Schema validation in your IDE. |
| Reusable Components | Import and compose prompt fragments across files. |
| Conditionals & Loops | Dynamic prompts with <If>, <ForEach>, props, and filter functions. |
| File Attachments | Attach images and documents for vision and document processing tasks. |
| MCP Servers | Call Model Context Protocol tools directly from prompts. |
| MCP Trace Server | Debug traces from Claude Code, Cursor, or any MCP client. |
AgentMark doesn't call LLM APIs directly. Instead, adapters format your prompt for the SDK you already use.
| Adapter | Language | Package |
|---|---|---|
| Vercel AI SDK v5 | TypeScript | @agentmark-ai/ai-sdk-v5-adapter |
| Vercel AI SDK v4 | TypeScript | @agentmark-ai/ai-sdk-v4-adapter |
| Mastra | TypeScript | @agentmark-ai/mastra-v0-adapter |
| Claude Agent SDK | TypeScript | @agentmark-ai/claude-agent-sdk-adapter |
| Claude Agent SDK | Python | agentmark-claude-agent-sdk |
| Pydantic AI | Python | agentmark-pydantic-ai |
| Fallback | TypeScript | @agentmark-ai/fallback-adapter |
Want another adapter? Open an issue.
| Language | Status |
|---|---|
| TypeScript / JavaScript | Supported |
| Python | Supported |
| Others | Open an issue |
| Package | Description |
|---|---|
@agentmark-ai/cli |
CLI for local development, prompt running, experiments, and building. |
@agentmark-ai/sdk |
SDK for tracing and cloud platform integration. |
@agentmark-ai/prompt-core |
Core prompt parsing and formatting engine. |
@agentmark-ai/templatedx |
MDX-based template engine with JSX components, conditionals, and loops. |
@agentmark-ai/mcp-server |
MCP server for trace debugging in Claude Code, Cursor, and more. |
@agentmark-ai/model-registry |
Centralized LLM model metadata and pricing. |
create-agentmark |
Project scaffolding tool. |
See the examples/ directory for complete, runnable examples:
- Hello World — Simplest possible prompt
- Structured Output — Extract typed JSON with a schema
- Tool Use — Agent with tool calling
- Reusable Components — Import and compose prompts
- Evaluations — Test prompts against datasets
- Production Tracing — Trace LLM calls with the SDK
AgentMark Cloud extends the open-source project with:
- Collaborative prompt editing and version history
- Persistent trace storage with search and filtering
- Dashboards for cost, latency, and quality metrics
- Annotations and human evaluation workflows
- Alerts for quality regressions, cost spikes, and latency
- Two-way Git sync
We welcome contributions! See our contribution guidelines.