You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: add metric name check for valid scores (microsoft#724)
* update metric_name
* fix some bugs
* add an evaluation in workflow
* add an evalution in runner
* fix ci
* test change
* fix CI
---------
Co-authored-by: TPLin22 <tplin2@163.com>
Co-authored-by: yuanteli <1957922024@qq.com>
f"The scores dataframe does not contain the correct model names as index.\ncorrect model names are: {{model_names}} + ['ensemble']\nscore_df is:\n{score_df}"
124
124
)
125
125
assert score_df.index.is_unique, "The scores dataframe has duplicate model names."
126
-
assert len(score_df.columns) == 1, f"The scores dataframe should have exactly one column for the scores of the evaluation indicator, but has these columns: {score_df.columns.tolist()}"
126
+
assert score_df.columns.tolist() == ["{{metric_name}}"], f"The column names of the scores dataframe should be ['{{metric_name}}'], but is '{score_df.columns.tolist()}'"
score_check_text+=f"\n[Error] The scores dataframe does not contain the correct model names as index.\ncorrect model names are: {model_set_in_folder.union({'ensemble'})}\nscore_df is:\n{score_df}"
score_check_text+=f"\n[Error] The scores dataframe does not contain the correct column names.\nCorrect columns is: ['{self.scen.metric_name}']\nBut got: {score_df.columns.tolist()}"
99
+
score_ret_code=1
100
+
91
101
exceptExceptionase:
92
102
score_check_text+=f"\n[Error] in checking the scores.csv file: {e}\nscores.csv's content:\n-----\n{score_fp.read_text()}\n-----"
93
103
score_ret_code=1
@@ -101,17 +111,6 @@ def evaluate(
101
111
)
102
112
stdout+="\n"+submission_check_out
103
113
104
-
# MLEBench Check
105
-
# !!! Since we are running on a sampled dataset, mlebench check is not required.
score_check_text+=f"\n[Error] The scores dataframe does not contain the correct model names as index.\ncorrect model names are: {model_set_in_folder.union({'ensemble'})}\nscore_df is:\n{score_df}"
score_check_text+=f"\n[Error] The scores dataframe does not contain the correct column names.\nCorrect columns is: ['{self.scen.metric_name}']\nBut got: {score_df.columns.tolist()}"
77
+
score_ret_code=1
78
+
71
79
exceptExceptionase:
72
80
logger.error(f"Error in checking the scores.csv file: {e}")
73
81
score_check_text+=f"\n[Error] in checking the scores.csv file: {e}\nscores.csv's content:\n-----\n{score_fp.read_text()}\n-----"
"Data Type": "The type of competition data, e.g., 'Tabular', 'Time Series', 'Text (Natural Language Processing)', 'Image (Computer Vision)', 'Audio', 'Video'",
34
34
"Brief Description": "A brief description of the competition",
35
35
"Dataset Description": "The dataset utilized in the competition is described based on two sources: the Competition Description, which provides contextual details about the original files, and the Processed Data folder description, which outlines the structure of the dataset after processing. While there may be differences—for instance, original files mentioned in the Competition Description (e.g., .zip files) may have been extracted or restructured—your task is to interpret the new file structure accurately (do not contain any file or folder that is not in Processed Data folder description) and reconcile it with the contextual information from the Competition Description to provide a clear and updated explanation.",
36
-
"Evaluation Description": "A description of the evaluation used in the competition.",
37
36
"Submission Specifications": "The submission specification & sample submission file descriptions for the model to output."
38
37
"Submission channel number to each sample": "The number of channels in the output for each sample, e.g., 1 for regression, N for N class classification with probabilities, etc. A Integer. If not specified, it is 1."
38
+
"Metric Evaluation Description": "A precise explanation of how the submissions are scored in this competition, including how the metric is calculated and any specific considerations.",
39
+
"Metric Name": "The name of the metric which this competition use for scoring the submission."
39
40
"Metric direction": True or False as True means bigger metric number is better, False means smaller is better.
40
41
}
41
42
user: |-
@@ -57,7 +58,7 @@ competition_background: |-
57
58
The data type used in this competition is {{ data_type }}.
58
59
Briefly, the competition involves: {{ brief_description }}.
59
60
The dataset used in this competition is: {{ dataset_description }}.
60
-
Your goal in this competition is to: {{ target_description }}.
61
+
The evaluation metric of this competition is: {{ metric_description }}.
61
62
62
63
rich_style_description: |-
63
64
### {{ name }} Agent: Automated Feature Engineering & Model Tuning Evolution
Copy file name to clipboardExpand all lines: rdagent/scenarios/kaggle/experiment/prompts.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ kg_description_template:
11
11
"Competition Features": "Two-line description of the overall features involved within the competition as background."
12
12
"Submission Specifications": "The submission specification & sample submission csv descriptions for the model to output."
13
13
"Submission channel number to each sample": "The number of channels in the output for each sample, e.g., 1 for regression, N for N class classification with probabilities, etc. A Integer. If not specified, it is 1."
14
-
"Evaluation Description": "A brief description of the metrics used in the evaluation. Please note that if `evaluation_metric_direction` is True, it indicates that higher values are better; if False, lower values are preferred."
14
+
"Metric Evaluation Description": "A brief description of the metrics used in the evaluation. Please note that if `evaluation_metric_direction` is True, it indicates that higher values are better; if False, lower values are preferred."
15
15
}
16
16
Since these might be very similar column names in data like one_hot_encoded columns, you can use some regex to group them together.
0 commit comments