You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* docs: add MLE-bench details to README
* docs: update README with revised MLE-bench description and leaderboard
* docs: update RD-Agent text and add trial info in README
* Update README.md
* Update README.md
* update by M
* update format
* Add documents
* docs: update RD-Agent references to R&D-Agent
* docs: update README with MLE-Bench complexity level details
| MLE-Bench Results Released | R&D-Agent currently leads as the [top-performing machine learning engineering agent](#-the-best-machine-learning-engineering-agent) on MLE-bench |
33
34
| Support LiteLLM Backend | We now fully support **[LiteLLM](https://github.com/BerriAI/litellm)** as a backend for integration with multiple LLM providers. |
34
35
| More General Data Science Agent | 🚀Coming soon! |
35
36
| Kaggle Scenario release | We release **[Kaggle Agent](https://rdagent.readthedocs.io/en/latest/scens/kaggle_agent.html)**, try the new features! |
36
37
| Official WeChat group release | We created a WeChat group, welcome to join! (🗪[QR Code](docs/WeChat_QR_code.jpg)) |
37
38
| Official Discord release | We launch our first chatting channel in Discord (🗪[](https://discord.gg/ybQ97B6Jjy)) |
38
-
| First release |**RDAgent** is released on GitHub |
39
+
| First release |**R&D-Agent** is released on GitHub |
40
+
41
+
# 🏆 The Best Machine Learning Engineering Agent!
42
+
43
+
[MLE-bench](https://github.com/openai/mle-bench) is a comprehensive benchmark evaluating the performance of AI agents on machine learning engineering tasks. Utilizing datasets from 75 Kaggle competitions, MLE-bench provides robust assessments of AI systems' capabilities in real-world ML engineering scenarios.
44
+
45
+
R&D-Agent currently leads as the top-performing machine learning engineering agent on MLE-bench:
46
+
47
+
| Agent | Low == Lite (%) | Medium (%) | High (%) | All (%) |
-**o3(R)+GPT-4.1(D)**: Combines Research Agent (o3) and Development Agent (GPT-4.1).
55
+
-**AIDE o1-preview**: Represents the previously best public result on MLE-bench as reported in the original MLE-bench paper.
56
+
- Results for R&D-Agent are based on single trials due to limited resources. We plan to provide more comprehensive, multi-trial results soon.
57
+
- According to MLE-Bench, the 75 competitions are categorized into three levels of complexity: **Low==Lite** if we estimate that an experienced ML engineer can produce a sensible solution in under 2 hours, excluding the time taken to train any models; **Medium** if it takes between 2 and 10 hours; and **High** if it takes more than 10 hours.
58
+
59
+
You can inspect the detailed runs of the above results online.
RDAgent aims to automate the most critical and valuable aspects of the industrial R&D process, and we begin with focusing on the data-driven scenarios to streamline the development of models and data.
71
+
R&D-Agent aims to automate the most critical and valuable aspects of the industrial R&D process, and we begin with focusing on the data-driven scenarios to streamline the development of models and data.
47
72
Methodologically, we have identified a framework with two key components: 'R' for proposing new ideas and 'D' for implementing them.
48
73
We believe that the automatic evolution of R&D will lead to solutions of significant industrial value.
49
74
50
75
51
76
<!-- Tag Cloud -->
52
-
R&D is a very general scenario. The advent of RDAgent can be your
77
+
R&D is a very general scenario. The advent of R&D-Agent can be your
- 🤖 **Data Mining Agent:** Iteratively proposing data & models ([🎥Demo Video 1](https://rdagent.azurewebsites.net/model_loop)|[▶️YouTube](https://www.youtube.com/watch?v=dm0dWL49Bc0&t=104s)) ([🎥Demo Video 2](https://rdagent.azurewebsites.net/dmm)|[▶️YouTube](https://www.youtube.com/watch?v=VIaSTZuoZg4)) and implementing them by gaining knowledge from data.
55
80
- 🦾 **Research Copilot:** Auto read research papers ([🎥Demo Video](https://rdagent.azurewebsites.net/report_model)|[▶️YouTube](https://www.youtube.com/watch?v=BiA2SfdKQ7o)) / financial reports ([🎥Demo Video](https://rdagent.azurewebsites.net/report_factor)|[▶️YouTube](https://www.youtube.com/watch?v=ECLTXVcSx-c)) and implement model structures or building datasets.
@@ -85,8 +110,8 @@ Ensure the current user can run Docker commands **without using sudo**. You can
85
110
conda activate rdagent
86
111
```
87
112
88
-
### 🛠️ Install the RDAgent
89
-
- You can directly install the RDAgent package from PyPI:
113
+
### 🛠️ Install the R&D-Agent
114
+
- You can directly install the R&D-Agent package from PyPI:
90
115
```sh
91
116
pip install rdagent
92
117
```
@@ -233,7 +258,7 @@ The **[🖥️ Live Demo](https://rdagent.azurewebsites.net/)** is implemented b
233
258
234
259
# 🏭 Scenarios
235
260
236
-
We have applied RD-Agent to multiple valuable data-driven industrial scenarios.
261
+
We have applied R&D-Agent to multiple valuable data-driven industrial scenarios.
237
262
238
263
239
264
## 🎯 Goal: Agent for Data-driven R&D
@@ -330,13 +355,13 @@ For more detail, please refer to our **[🖥️ Live Demo page](https://rdagent.
330
355
331
356
# 🤝 Contributing
332
357
333
-
We welcome contributions and suggestions to improve RD-Agent. Please refer to the [Contributing Guide](CONTRIBUTING.md) for more details on how to contribute.
358
+
We welcome contributions and suggestions to improve R&D-Agent. Please refer to the [Contributing Guide](CONTRIBUTING.md) for more details on how to contribute.
334
359
335
360
Before submitting a pull request, ensure that your code passes the automatic CI checks.
336
361
337
362
## 📝 Guidelines
338
363
This project welcomes contributions and suggestions.
339
-
Contributing to this project is straightforward and rewarding. Whether it's solving an issue, addressing a bug, enhancing documentation, or even correcting a typo, every contribution is valuable and helps improve RDAgent.
364
+
Contributing to this project is straightforward and rewarding. Whether it's solving an issue, addressing a bug, enhancing documentation, or even correcting a typo, every contribution is valuable and helps improve R&D-Agent.
340
365
341
366
To get started, you can explore the issues list, or search for`TODO:` commentsin the codebase by running the command`grep -r "TODO:"`.
342
367
@@ -346,7 +371,7 @@ To get started, you can explore the issues list, or search for `TODO:` comments
Before we released RD-Agent as an open-source project on GitHub, it was an internal project within our group. Unfortunately, the internal commit history was not preserved when we removed some confidential code. As a result, some contributions from our group members, including Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, and Jinhui Li, were not included in the public commits.
374
+
Before we released R&D-Agent as an open-source project on GitHub, it was an internal project within our group. Unfortunately, the internal commit history was not preserved when we removed some confidential code. As a result, some contributions from our group members, including Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, and Jinhui Li, were not included in the public commits.
350
375
351
376
# ⚖️ Legal disclaimer
352
377
<p style="line-height: 1; font-style: italic;">The RD-agent is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. The RD-agent is aimed to facilitate research and development process in the financial industry and not ready-to-use for any financial investment or advice. Users shall independently assess and test the risks of the RD-agent in a specific use scenario, ensure the responsible use of AI technology, including but not limited to developing and integrating risk mitigation measures, and comply with all applicable laws and regulations in all applicable jurisdictions. The RD-agent does not provide financial opinions or reflect the opinions of Microsoft, nor is it designed to replace the role of qualified financial professionals in formulating, assessing, and approving finance products. The inputs and outputs of the RD-agent belong to the users and users shall assume all liability under any theory of liability, whether in contract, torts, regulatory, negligence, products liability, or otherwise, associated with use of the RD-agent and any inputs and outputs thereof.</p>
0 commit comments