Skip to content

Latest commit

 

History

History
176 lines (134 loc) · 7.62 KB

File metadata and controls

176 lines (134 loc) · 7.62 KB

AgentMark

AgentMark Logo

The open-source platform to develop, test, and observe your AI agents.

Homepage | Docs


AgentMark is a complete platform for building reliable AI agents. Define prompts in Markdown, run them with any SDK, evaluate quality with datasets, and trace every call in production.

Prompt management — Write prompts as .prompt.mdx files with type-safe inputs, tool definitions, structured outputs, conditionals, loops, and reusable components.

Observability — Trace every LLM call with OpenTelemetry. View traces locally or forward them to AgentMark Cloud for dashboards, alerts, and collaboration.

Evaluations — Test prompts against datasets with built-in evals. Run experiments from the CLI and gate deployments on quality thresholds.

What a prompt looks like

---
name: customer-support-agent
text_config:
  model_name: anthropic/claude-sonnet-4-20250514
  max_calls: 2
  tools:
    search_knowledgebase:
      description: Search the knowledge base for shipping, warranty, and returns info.
      parameters:
        type: object
        properties:
          query:
            type: string
        required: [query]
test_settings:
  props:
    customer_question: "How long does shipping take?"
input_schema:
  type: object
  properties:
    customer_question:
      type: string
  required: [customer_question]
---

<System>
You are a helpful customer service agent. Use the search_knowledgebase tool
when customers ask about shipping, warranty, or returns.
</System>

<User>{props.customer_question}</User>

Run it:

agentmark run-prompt customer-support.prompt.mdx

That's it. The prompt is version-controlled, type-checked, and traceable.

Quick Start

# Scaffold a new project (interactive — picks your language and adapter)
npm create agentmark@latest

# Start the dev server (API + trace UI + hot reload)
agentmark dev

# Run a single prompt
agentmark run-prompt my-prompt.prompt.mdx

# Run an experiment against a dataset
agentmark run-experiment my-prompt.prompt.mdx

Features

Feature Description
Multimodal Generation Generate text, structured objects, images, and speech from a single prompt file.
Tools and Agents Define tools inline and build agentic loops with max_calls.
Structured Output Type-safe JSON output via JSON Schema definitions.
Datasets & Evals Test prompts against JSONL datasets with built-in and custom evaluators.
Tracing OpenTelemetry-based tracing for every LLM call — local and cloud.
Type Safety Auto-generate TypeScript types from your prompts. JSON Schema validation in your IDE.
Reusable Components Import and compose prompt fragments across files.
Conditionals & Loops Dynamic prompts with <If>, <ForEach>, props, and filter functions.
File Attachments Attach images and documents for vision and document processing tasks.
MCP Servers Call Model Context Protocol tools directly from prompts.
MCP Trace Server Debug traces from Claude Code, Cursor, or any MCP client.

SDK Adapters

AgentMark doesn't call LLM APIs directly. Instead, adapters format your prompt for the SDK you already use.

Adapter Language Package
Vercel AI SDK v5 TypeScript @agentmark-ai/ai-sdk-v5-adapter
Vercel AI SDK v4 TypeScript @agentmark-ai/ai-sdk-v4-adapter
Mastra TypeScript @agentmark-ai/mastra-v0-adapter
Claude Agent SDK TypeScript @agentmark-ai/claude-agent-sdk-adapter
Claude Agent SDK Python agentmark-claude-agent-sdk
Pydantic AI Python agentmark-pydantic-ai
Fallback TypeScript @agentmark-ai/fallback-adapter

Want another adapter? Open an issue.

Language Support

Language Status
TypeScript / JavaScript Supported
Python Supported
Others Open an issue

Packages

Package Description
@agentmark-ai/cli CLI for local development, prompt running, experiments, and building.
@agentmark-ai/sdk SDK for tracing and cloud platform integration.
@agentmark-ai/prompt-core Core prompt parsing and formatting engine.
@agentmark-ai/templatedx MDX-based template engine with JSX components, conditionals, and loops.
@agentmark-ai/mcp-server MCP server for trace debugging in Claude Code, Cursor, and more.
@agentmark-ai/model-registry Centralized LLM model metadata and pricing.
create-agentmark Project scaffolding tool.

Examples

See the examples/ directory for complete, runnable examples:

Cloud Platform

AgentMark Cloud extends the open-source project with:

  • Collaborative prompt editing and version history
  • Persistent trace storage with search and filtering
  • Dashboards for cost, latency, and quality metrics
  • Annotations and human evaluation workflows
  • Alerts for quality regressions, cost spikes, and latency
  • Two-way Git sync

Contributing

We welcome contributions! See our contribution guidelines.

Community

License

MIT License