Skip to content

Commit 3c0aabc

Browse files
feat(init): LiveMCPBench init
0 parents  commit 3c0aabc

File tree

96 files changed

+918851
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+918851
-0
lines changed

.env_template

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# MCP Copilot Agent Configuration
2+
BASE_URL=
3+
OPENAI_API_KEY=
4+
MODEL=
5+
6+
# Tool Retrieval Configuration
7+
EMBEDDING_MODEL=
8+
EMBEDDING_BASE_URL=
9+
EMBEDDING_API_KEY=
10+
EMBEDDING_DIMENSIONS=1024
11+
TOP_SERVERS=5
12+
TOP_TOOLS=3
13+
# Abstract API Configuration (optional)
14+
ABSTRACT_MODEL=
15+
ABSTRACT_API_KEY=
16+
ABSTRACT_BASE_URL=
17+
18+
# lark report (optional)
19+
LARK_WEBHOOK_URL=

.gitignore

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Python-generated files
2+
__pycache__/
3+
*.py[oc]
4+
build/
5+
dist/
6+
wheels/
7+
*.egg-info
8+
9+
# Virtual environments
10+
.venv
11+
logs/
12+
.env
13+
14+
readme/
15+
*.log
16+
.gradio/
17+
test/*
18+
not_use/
19+
./annotated_data/git/

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.11

README.md

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
<a id="readme-top"></a>
2+
3+
<!-- PROJECT -->
4+
<br />
5+
<div align="center">
6+
<h3 align="center">LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?</h3>
7+
8+
<p align="center">
9+
Benchmarking the agent in real-world tasks within a large-scale MCP toolset.
10+
</p>
11+
</div>
12+
<p align="center">
13+
<a href="https://www.python.org/downloads/release/python-31113/"><img src="https://img.shields.io/badge/python-3.11-blue.svg" alt="Python 3.11"></a>
14+
<a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/badge/code%20style-ruff-000000.svg" alt="Code style: ruff"></a>
15+
</p>
16+
17+
<p align="center">
18+
🌐 <a href="https://icip-cas.github.io/LiveMCPBench" target="_blank">Website</a> &nbsp; | &nbsp;
19+
<!-- 📄 <a href="" target="_blank">Paper</a> &nbsp; | &nbsp; -->
20+
🤗 <a href="https://huggingface.co/datasets/hysdhlx/LiveMCPBench" target="_blank">Dataset</a> &nbsp; | &nbsp;
21+
🏆 <a href="https://docs.google.com/spreadsheets/d/1EXpgXq1VKw5A7l7-N2E9xt3w0eLJ2YPVPT-VrRxKZBw/edit?usp=sharing" target="_blank">Leaderboard</a>
22+
<!-- &nbsp; | &nbsp; -->
23+
<!-- 🙏 <a href="#citation" target="_blank">Citation</a> -->
24+
</p>
25+
26+
27+
![Overview](media/LiveMCPBench.png)
28+
## News
29+
* [8/3/2025] We release the LiveMCPBench.
30+
## Getting Started
31+
32+
### Prerequisites
33+
We will release our docker image soon, but if you want to run the code locally, you will need to install the following tools:
34+
* npm
35+
* uv
36+
### Installation
37+
1. sync python env
38+
39+
```bash
40+
uv sync
41+
```
42+
2. check the MCP tools
43+
44+
```bash
45+
bash ./tools/scripts/tool_check.sh
46+
```
47+
After running this command, you can check ./tools/test/tools.json to see the tools.
48+
49+
3. prepare the .env file
50+
51+
```bash
52+
cp .env_template .env
53+
```
54+
You can modify the .env file to set your own environment variables.
55+
```bash
56+
# MCP Copilot Agent Configuration
57+
BASE_URL=
58+
OPENAI_API_KEY=
59+
MODEL=
60+
61+
# Tool Retrieval Configuration
62+
EMBEDDING_MODEL=
63+
EMBEDDING_BASE_URL=
64+
EMBEDDING_API_KEY=
65+
EMBEDDING_DIMENSIONS=1024
66+
TOP_SERVERS=5
67+
TOP_TOOLS=3
68+
# Abstract API Configuration (optional)
69+
ABSTRACT_MODEL=
70+
ABSTRACT_API_KEY=
71+
ABSTRACT_BASE_URL=
72+
73+
# lark report (optional)
74+
LARK_WEBHOOK_URL=
75+
```
76+
77+
## Quick Start
78+
### MCP Copilot Agent
79+
#### Example Run
80+
You can run the MCP Copilot Agent with the following command:
81+
82+
```bash
83+
bash ./baseline/scripts/run_example.sh
84+
```
85+
This will run the agent with a simple example and save the results in `./baseline/output/`.
86+
87+
#### Full Run
88+
We default use /root dir to store our benchmark data.
89+
90+
1. Move the code repo and create a symbolic link
91+
92+
You should mv this code repo to `/LiveMCPBench/`, because we will link `/LiveMCPBench/annotated_data` to `/root/`.
93+
94+
```bash
95+
bash scripts/link_path.sh
96+
```
97+
98+
This will create a symbolic link from `/LiveMCPBench/annotated_data/dirs` to `/root/annotated_data`.
99+
100+
2. Run the MCP Copilot Agent
101+
102+
Be sure you have set the environment variables in the .env file.
103+
104+
````bash
105+
bash ./baseline/scripts/run_baselines.sh
106+
````
107+
3. Check the results
108+
109+
After running the agent, you can check the trajectories in `./baseline/output`.
110+
111+
### Evaluation using the LiveMCPEval
112+
1. Modify the .env to change evluation models
113+
114+
2. Run the evaluation script
115+
116+
```bash
117+
bash ./evaluator/scripts/run_baseline.sh
118+
```
119+
120+
3. Check the results
121+
122+
After running the evaluation, you can check the results in `./evaluator/output`.
123+
124+
4. Calculate the human agreement
125+
126+
```bash
127+
uv run ./evaluator/human_agreement.py
128+
```
129+
130+
This will calculate the human agreement for the evaluation results and save it in `./evaluator/output/human_agreement.json`.
131+
132+
## Project Structure
133+
```
134+
LiveMCPBench/
135+
├── annotated_data/ # Tasks and task files
136+
├── baseline/ # MCP Copilot Agent
137+
│ ├── scripts/ # Scripts for running the agent
138+
│ ├── output/ # Output for the agent
139+
│ └── mcp_copilot/ # Source code for the agent
140+
├── evaluator/ # LiveMCPEval
141+
│ ├── scripts/ # Scripts for evaluation
142+
│ └── output/ # Output for evaluation
143+
├── tools/ # LiveMCPTool
144+
│ ├── LiveMCPTool/ # Tool data
145+
│ └── scripts/ # Scripts for the tools
146+
├── scripts/ # Path prepare scripts
147+
├── utils/ # Utility functions
148+
└── .env_template # Template for environment
149+
```
150+
<!-- ## Citation
151+
152+
If you find this project helpful, please use the following to cite it:
153+
```bibtex
154+
155+
``` -->

__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)