Skip to content

Commit 275f507

Browse files
authored
Update README.md (microsoft#875)
1 parent b3cc863 commit 275f507

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ R&D-Agent currently leads as the top-performing machine learning engineering age
3636
| AIDE o1-preview | 34.3 ± 2.4 | 8.8 ± 1.1 | 10.0 ± 1.9 | 16.9 ± 1.1 |
3737

3838
**Notes:**
39-
- **o3(R)+GPT-4.1(D)**: Combines Research Agent (o3) and Development Agent (GPT-4.1).
39+
- **o3(R)+GPT-4.1(D)**: o3(R)+GPT-4.1(D): This version is designed to both reduce average time per loop and leverage a cost-effective combination of backend LLMs by seamlessly integrating Research Agent (o3) with Development Agent (GPT-4.1).
4040
- **AIDE o1-preview**: Represents the previously best public result on MLE-bench as reported in the original MLE-bench paper.
4141
- Average and standard deviation results for R&D-Agent o1-preview is based on a independent of 5 seeds and for R&D-Agent o3(R)+GPT-4.1(D) is based on 6 seeds.
4242
- According to MLE-Bench, the 75 competitions are categorized into three levels of complexity: **Low==Lite** if we estimate that an experienced ML engineer can produce a sensible solution in under 2 hours, excluding the time taken to train any models; **Medium** if it takes between 2 and 10 hours; and **High** if it takes more than 10 hours.

0 commit comments

Comments
 (0)