AgentMark

The open-source platform to develop, test, and observe your AI agents.

AgentMark is a complete platform for building reliable AI agents. Define prompts in Markdown, run them with any SDK, evaluate quality with datasets, and trace every call in production.

Prompt management — Write prompts as .prompt.mdx files with type-safe inputs, tool definitions, structured outputs, conditionals, loops, and reusable components.

Observability — Trace every LLM call with OpenTelemetry. View traces locally or forward them to AgentMark Cloud for dashboards, alerts, and collaboration.

Evaluations — Test prompts against datasets with built-in evals. Run experiments from the CLI and gate deployments on quality thresholds.

What a prompt looks like

---
name: customer-support-agent
text_config:
  model_name: anthropic/claude-sonnet-4-20250514
  max_calls: 2
  tools:
    search_knowledgebase:
      description: Search the knowledge base for shipping, warranty, and returns info.
      parameters:
        type: object
        properties:
          query:
            type: string
        required: [query]
test_settings:
  props:
    customer_question: "How long does shipping take?"
input_schema:
  type: object
  properties:
    customer_question:
      type: string
  required: [customer_question]
---

<System>
You are a helpful customer service agent. Use the search_knowledgebase tool
when customers ask about shipping, warranty, or returns.
</System>

<User>{props.customer_question}</User>

Run it:

agentmark run-prompt customer-support.prompt.mdx

That's it. The prompt is version-controlled, type-checked, and traceable.

Quick Start

# Scaffold a new project (interactive — picks your language and adapter)
npm create agentmark@latest

# Start the dev server (API + trace UI + hot reload)
agentmark dev

# Run a single prompt
agentmark run-prompt my-prompt.prompt.mdx

# Run an experiment against a dataset
agentmark run-experiment my-prompt.prompt.mdx

Features

Feature	Description
Multimodal Generation	Generate text, structured objects, images, and speech from a single prompt file.
Tools and Agents	Define tools inline and build agentic loops with `max_calls`.
Structured Output	Type-safe JSON output via JSON Schema definitions.
Datasets & Evals	Test prompts against JSONL datasets with built-in and custom evaluators.
Tracing	OpenTelemetry-based tracing for every LLM call — local and cloud.
Type Safety	Auto-generate TypeScript types from your prompts. JSON Schema validation in your IDE.
Reusable Components	Import and compose prompt fragments across files.
Conditionals & Loops	Dynamic prompts with `<If>`, `<ForEach>`, props, and filter functions.
File Attachments	Attach images and documents for vision and document processing tasks.
MCP Servers	Call Model Context Protocol tools directly from prompts.
MCP Trace Server	Debug traces from Claude Code, Cursor, or any MCP client.

SDK Adapters

AgentMark doesn't call LLM APIs directly. Instead, adapters format your prompt for the SDK you already use.

Adapter	Language	Package
Vercel AI SDK v5	TypeScript	`@agentmark-ai/ai-sdk-v5-adapter`
Vercel AI SDK v4	TypeScript	`@agentmark-ai/ai-sdk-v4-adapter`
Mastra	TypeScript	`@agentmark-ai/mastra-v0-adapter`
Claude Agent SDK	TypeScript	`@agentmark-ai/claude-agent-sdk-adapter`
Claude Agent SDK	Python	`agentmark-claude-agent-sdk`
Pydantic AI	Python	`agentmark-pydantic-ai`
Fallback	TypeScript	`@agentmark-ai/fallback-adapter`

Want another adapter? Open an issue.

Language Support

Language	Status
TypeScript / JavaScript	Supported
Python	Supported
Others	Open an issue

Packages

Package	Description
`@agentmark-ai/cli`	CLI for local development, prompt running, experiments, and building.
`@agentmark-ai/sdk`	SDK for tracing and cloud platform integration.
`@agentmark-ai/prompt-core`	Core prompt parsing and formatting engine.
`@agentmark-ai/templatedx`	MDX-based template engine with JSX components, conditionals, and loops.
`@agentmark-ai/mcp-server`	MCP server for trace debugging in Claude Code, Cursor, and more.
`@agentmark-ai/model-registry`	Centralized LLM model metadata and pricing.
`create-agentmark`	Project scaffolding tool.

Examples

See the examples/ directory for complete, runnable examples:

Hello World — Simplest possible prompt
Structured Output — Extract typed JSON with a schema
Tool Use — Agent with tool calling
Reusable Components — Import and compose prompts
Evaluations — Test prompts against datasets
Production Tracing — Trace LLM calls with the SDK

Cloud Platform

AgentMark Cloud extends the open-source project with:

Collaborative prompt editing and version history
Persistent trace storage with search and filtering
Dashboards for cost, latency, and quality metrics
Annotations and human evaluation workflows
Alerts for quality regressions, cost spikes, and latency
Two-way Git sync

Contributing

We welcome contributions! See our contribution guidelines.

Community

GitHub Issues

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentMark

What a prompt looks like

Quick Start

Features

SDK Adapters

Language Support

Packages

Examples

Cloud Platform

Contributing

Community

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AgentMark

What a prompt looks like

Quick Start

Features

SDK Adapters

Language Support

Packages

Examples

Cloud Platform

Contributing

Community

License