Skip to content

Commit acc97a8

Browse files
authored
fix: add ensemble test, change to "use cross-validation if possible" in workflow spec (microsoft#634)
* change to "use cross-validation if possible" in workflow spec * Limit the evaluation indicator to only one * add metric tips * string change
1 parent edb552e commit acc97a8

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

rdagent/components/coder/data_science/ensemble/eval_tests/ensemble_test.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,5 +95,6 @@ assert model_set_in_scores == set({{model_names}}).union({"ensemble"}), (
9595
f"The scores dataframe does not contain the correct model names as index.\ncorrect model names are: {{model_names}} + ['ensemble']\nscore_df is:\n{score_df}"
9696
)
9797
assert score_df.index.is_unique, "The scores dataframe has duplicate model names."
98+
assert len(score_df.columns) == 1, f"The scores dataframe should have exactly one column for the scores of the evaluation indicator, but has these columns: {score_df.columns.tolist()}"
9899

99100
print("Ensemble test end.")

rdagent/components/coder/data_science/raw_data_loader/prompts.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ spec:
204204
- Verify that `val_label` is provided and matches the length of `val_preds_dict` predictions.
205205
- Handle empty or invalid inputs gracefully with appropriate error messages.
206206
- Metric Calculation and Storage:
207-
- Calculate the metric for each model and ensemble strategy, and save the results in `scores.csv`, e.g.:
207+
- Calculate the metric (mentioned in the evaluation section of the competition information) for each model and ensemble strategy, and save the results in `scores.csv`, e.g.:
208208
```python
209209
scores = {}
210210
for model_name, val_pred in val_preds_dict.items():
@@ -259,7 +259,7 @@ spec:
259259
260260
3. Dataset Splitting
261261
- The dataset returned by `load_data` is not split into training and testing sets, so the dataset splitting should happen after calling `feat_eng`.
262-
- Decide whether to use a **static train-test split** or **cross-validation**, based on what is most suitable given the `Competition Information`.
262+
- Use cross-validation if possible, as it provides a more robust evaluation of the model's performance.
263263
264264
4. Submission File:
265265
- Save the final predictions as `submission.csv`, ensuring the format matches the competition requirements (refer to `sample_submission` in the Folder Description for the correct structure).

0 commit comments

Comments
 (0)