docs: update guidance for running mle-bench (microsoft#896)

TPLin22 · web-flow · commit 2e6b1499200d · 2025-05-26T17:43:43.000+08:00
* update guidance for running mle-bench

* ci issue

* Update guidance for setting kaggle api
diff --git a/README.md b/README.md
@@ -51,6 +51,8 @@ You can inspect the detailed runs of the above results online.
 - [R&D-Agent o1-preview detailed runs](https://aka.ms/RD-Agent_MLE-Bench_O1-preview)
 - [R&D-Agent o3(R)+GPT-4.1(D) detailed runs](https://aka.ms/RD-Agent_MLE-Bench_O3_GPT41)
 
+For running R&D-Agent on MLE-bench, refer to **[MLE-bench Guide: Running ML Engineering via MLE-bench](https://rdagent.readthedocs.io/en/latest/scens/data_science.html)**
+
 
 # 📰 News
 | 🗞️ News        | 📝 Description                 |
diff --git a/docs/scens/catalog.rst b/docs/scens/catalog.rst
@@ -34,6 +34,8 @@ The supported scenarios are listed below:
         :ref:`🤖Auto Kaggle Model Tuning <kaggle_agent>`
       - :ref:`🤖Auto Kaggle feature Engineering <kaggle_agent>`
 
+        :ref:`🤖 Data Science <data_science_agent>`
+
 
 .. toctree::
     :maxdepth: 1
diff --git a/docs/scens/data_science.rst b/docs/scens/data_science.rst
@@ -56,5 +56,90 @@ The Data Science Agent is an agent that can automatically perform feature engine
 
       - ``ds_data/eval/custom_data/submission_test.csv:`` (Optional) Competition test label file.
 
+🔍 MLE-bench Guide: Running ML Engineering via MLE-bench
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- 📝 **MLE-bench Overview**
+
+  - MLE-bench is a comprehensive benchmark designed to evaluate the ML engineering capabilities of AI systems using real-world scenarios. The dataset comprises 75 Kaggle competitions. Since Kaggle does not provide held-out test sets for these competitions, the benchmark includes preparation scripts that split the publicly available training data into new training and test sets, and grading scripts are provided for each competition to accurately evaluate submission scores.
+
+- 🔧 **Set up Environment for MLE-bench**
+
+  - Running R&D-Agent on MLE-bench is designed for full automation. There is no need for manual downloads and data preparation. Simply set the environment variable ``DS_IF_USING_MLE_DATA`` to True.  
+
+  - At runtime, R&D-Agent will automatically build the Docker image specified at ``rdagent/scenarios/kaggle/docker/mle_bench_docker/Dockerfile``. This image is responsible for downloading the required datasets and grading files for MLE-bench.  
+  
+  - Note: The first run may take longer than subsequent runs as the Docker image and data are being downloaded and set up for the first time.
+
+    .. code-block:: sh
+
+        dotenv set DS_LOCAL_DATA_PATH <your local directory>/ds_data
+        dotenv set DS_IF_USING_MLE_DATA True
+
+- 🔨 **Configuring the Kaggle API**
+
+  - Downloading Kaggle competition data requires the Kaggle API. You can set up the Kaggle API by following these steps:
+  
+    - Register and login on the `Kaggle <https://www.kaggle.com/>`_ website.
+
+    - Click on the avatar (usually in the top right corner of the page) -> ``Settings`` -> ``Create New Token``, A file called ``kaggle.json`` will be downloaded.
+
+    - Move ``kaggle.json`` to ``~/.config/kaggle/``
+
+    - Modify the permissions of the ``kaggle.json`` file.
+
+      .. code-block:: sh
+
+        chmod 600 ~/.config/kaggle/kaggle.json
+
+  - For more information about Kaggle API Settings, refer to the `Kaggle API <https://github.com/Kaggle/kaggle-api>`_.
+
+
+- 🔩 **Setting the Environment Variables for MLE-bench**
+
+  - In addition to auto-downloading the benchmark data, you must also configure the runtime environment for executing the competition code.  
+  - Use the environment variable ``DS_CODER_COSTEER_ENV_TYPE`` to select the execution mode:
+    
+    • When set to docker (the default), RD-Agent utilizes the official Kaggle Docker image (``gcr.io/kaggle-gpu-images/python:latest``) to ensure that all required packages are available.  
+    • If you prefer to use a custom Docker setup, you can modify the configuration using ``DS_DOCKER_IMAGE`` or ``DS_DOCKERFILE_FOLDER_PATH``.  
+    • Alternatively, if your competition work only demands basic libraries, you may set ``DS_CODER_COSTEER_ENV_TYPE`` to conda. In this mode, you must create a local conda environment named “kaggle” and pre-install the necessary packages. RD-Agent will execute the competition code within this “kaggle” conda environment.
+
+    .. code-block:: sh
+
+      # Configure the runtime environment: choice between 'docker' (default) or 'conda'
+      dotenv set DS_CODER_COSTEER_ENV_TYPE docker
+
+- 🚀 **Run the Application**
+
+  - You can directly run the application by using the following command:
+    
+    .. code-block:: sh
+
+        rdagent kaggle --competition <Competition ID>
+
+- 📥 **Visualize the R&D Process**
+
+  - We provide a web UI to visualize the log. You just need to run:
+
+    .. code-block:: sh
+
+        streamlit run rdagent/log/ui/dsapp.py
+
+  - Then you can input the log path and visualize the R&D process.
+
+- **Additional Guidance**
+
+  - **Combine different LLM Models at R&D Stage**
+
+    - You can combine different LLM models at the R&D stage. 
+
+    - By default, when you set environment variable ``CHAT_MODEL``, it covers both R&D stages. When customizing the model for the development stage, you can set:
+    
+    .. code-block:: sh
+
+      # This example sets the model to "o3-mini". For some models, the reasoning effort shoule be set to "None".
+      dotenv set LITELLM_CHAT_MODEL_MAP '{"coding":{"model":"o3-mini","reasoning_effort":"high"},"running":{"model":"o3-mini","reasoning_effort":"high"}}'
+
+