Skip to content

Commit 2e6b149

Browse files
authored
docs: update guidance for running mle-bench (microsoft#896)
* update guidance for running mle-bench * ci issue * Update guidance for setting kaggle api
1 parent cabbbcd commit 2e6b149

File tree

3 files changed

+89
-0
lines changed

3 files changed

+89
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ You can inspect the detailed runs of the above results online.
5151
- [R&D-Agent o1-preview detailed runs](https://aka.ms/RD-Agent_MLE-Bench_O1-preview)
5252
- [R&D-Agent o3(R)+GPT-4.1(D) detailed runs](https://aka.ms/RD-Agent_MLE-Bench_O3_GPT41)
5353

54+
For running R&D-Agent on MLE-bench, refer to **[MLE-bench Guide: Running ML Engineering via MLE-bench](https://rdagent.readthedocs.io/en/latest/scens/data_science.html)**
55+
5456

5557
# 📰 News
5658
| 🗞️ News | 📝 Description |

docs/scens/catalog.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ The supported scenarios are listed below:
3434
:ref:`🤖Auto Kaggle Model Tuning <kaggle_agent>`
3535
- :ref:`🤖Auto Kaggle feature Engineering <kaggle_agent>`
3636

37+
:ref:`🤖 Data Science <data_science_agent>`
38+
3739

3840
.. toctree::
3941
:maxdepth: 1

docs/scens/data_science.rst

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,90 @@ The Data Science Agent is an agent that can automatically perform feature engine
5656

5757
- ``ds_data/eval/custom_data/submission_test.csv:`` (Optional) Competition test label file.
5858

59+
🔍 MLE-bench Guide: Running ML Engineering via MLE-bench
60+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61+
62+
- 📝 **MLE-bench Overview**
63+
64+
- MLE-bench is a comprehensive benchmark designed to evaluate the ML engineering capabilities of AI systems using real-world scenarios. The dataset comprises 75 Kaggle competitions. Since Kaggle does not provide held-out test sets for these competitions, the benchmark includes preparation scripts that split the publicly available training data into new training and test sets, and grading scripts are provided for each competition to accurately evaluate submission scores.
65+
66+
- 🔧 **Set up Environment for MLE-bench**
67+
68+
- Running R&D-Agent on MLE-bench is designed for full automation. There is no need for manual downloads and data preparation. Simply set the environment variable ``DS_IF_USING_MLE_DATA`` to True.
69+
70+
- At runtime, R&D-Agent will automatically build the Docker image specified at ``rdagent/scenarios/kaggle/docker/mle_bench_docker/Dockerfile``. This image is responsible for downloading the required datasets and grading files for MLE-bench.
71+
72+
- Note: The first run may take longer than subsequent runs as the Docker image and data are being downloaded and set up for the first time.
73+
74+
.. code-block:: sh
75+
76+
dotenv set DS_LOCAL_DATA_PATH <your local directory>/ds_data
77+
dotenv set DS_IF_USING_MLE_DATA True
78+
79+
- 🔨 **Configuring the Kaggle API**
80+
81+
- Downloading Kaggle competition data requires the Kaggle API. You can set up the Kaggle API by following these steps:
82+
83+
- Register and login on the `Kaggle <https://www.kaggle.com/>`_ website.
84+
85+
- Click on the avatar (usually in the top right corner of the page) -> ``Settings`` -> ``Create New Token``, A file called ``kaggle.json`` will be downloaded.
86+
87+
- Move ``kaggle.json`` to ``~/.config/kaggle/``
88+
89+
- Modify the permissions of the ``kaggle.json`` file.
90+
91+
.. code-block:: sh
92+
93+
chmod 600 ~/.config/kaggle/kaggle.json
94+
95+
- For more information about Kaggle API Settings, refer to the `Kaggle API <https://github.com/Kaggle/kaggle-api>`_.
96+
97+
98+
- 🔩 **Setting the Environment Variables for MLE-bench**
99+
100+
- In addition to auto-downloading the benchmark data, you must also configure the runtime environment for executing the competition code.
101+
- Use the environment variable ``DS_CODER_COSTEER_ENV_TYPE`` to select the execution mode:
102+
103+
• When set to docker (the default), RD-Agent utilizes the official Kaggle Docker image (``gcr.io/kaggle-gpu-images/python:latest``) to ensure that all required packages are available.
104+
• If you prefer to use a custom Docker setup, you can modify the configuration using ``DS_DOCKER_IMAGE`` or ``DS_DOCKERFILE_FOLDER_PATH``.
105+
• Alternatively, if your competition work only demands basic libraries, you may set ``DS_CODER_COSTEER_ENV_TYPE`` to conda. In this mode, you must create a local conda environment named “kaggle” and pre-install the necessary packages. RD-Agent will execute the competition code within this “kaggle” conda environment.
106+
107+
.. code-block:: sh
108+
109+
# Configure the runtime environment: choice between 'docker' (default) or 'conda'
110+
dotenv set DS_CODER_COSTEER_ENV_TYPE docker
111+
112+
- 🚀 **Run the Application**
113+
114+
- You can directly run the application by using the following command:
115+
116+
.. code-block:: sh
117+
118+
rdagent kaggle --competition <Competition ID>
119+
120+
- 📥 **Visualize the R&D Process**
121+
122+
- We provide a web UI to visualize the log. You just need to run:
123+
124+
.. code-block:: sh
125+
126+
streamlit run rdagent/log/ui/dsapp.py
127+
128+
- Then you can input the log path and visualize the R&D process.
129+
130+
- **Additional Guidance**
131+
132+
- **Combine different LLM Models at R&D Stage**
133+
134+
- You can combine different LLM models at the R&D stage.
135+
136+
- By default, when you set environment variable ``CHAT_MODEL``, it covers both R&D stages. When customizing the model for the development stage, you can set:
137+
138+
.. code-block:: sh
139+
140+
# This example sets the model to "o3-mini". For some models, the reasoning effort shoule be set to "None".
141+
dotenv set LITELLM_CHAT_MODEL_MAP '{"coding":{"model":"o3-mini","reasoning_effort":"high"},"running":{"model":"o3-mini","reasoning_effort":"high"}}'
142+
143+
59144
60145

0 commit comments

Comments
 (0)