Skip to content

Commit 9d6feed

Browse files
authored
fix: refine prompt to generate the most simple task in init stage (microsoft#546)
* refine prompt to generate the most simple task in init stage * feature test dtype check improve
1 parent 712d94a commit 9d6feed

File tree

2 files changed

+13
-6
lines changed

2 files changed

+13
-6
lines changed

rdagent/components/coder/data_science/feature/eval_tests/feature_test.txt

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,14 @@ if isinstance(X, pd.DataFrame) and isinstance(X_test, pd.DataFrame):
5757
assert get_column_list(X) == get_column_list(X_test), "Mismatch in column names of training and test data."
5858

5959
if isinstance(X, pd.DataFrame):
60-
assert sorted(X.dtypes.unique().tolist()) == sorted(
61-
X_loaded.dtypes.unique().tolist()
62-
), f"feature engineering has produced new data types which is not allowed, data loader data types are {X_loaded.dtypes.unique().tolist()} and feature engineering data types are {X.dtypes.unique().tolist()}"
60+
X_dtypes_unique_sorted = sorted(X.dtypes.unique().tolist())
61+
X_loaded_dtypes_unique_sorted = sorted(X_loaded.dtypes.unique().tolist())
62+
assert (
63+
len(X_loaded_dtypes_unique_sorted) == 1
64+
and (X_loaded_dtypes_unique_sorted[0] == np.float64 or X_loaded_dtypes_unique_sorted[0] == np.float32)
65+
) or (
66+
X_dtypes_unique_sorted == X_loaded_dtypes_unique_sorted
67+
), f"feature engineering has produced new data types which is not allowed, data loader data types are {X_loaded_dtypes_unique_sorted} and feature engineering data types are {X_dtypes_unique_sorted}"
6368

6469
print(
6570
"Feature Engineering test passed successfully. All checks including length, width, and data types have been validated."

rdagent/scenarios/data_science/proposal/prompts.yaml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ task_gen: # It is deprecated now, please refer to direct_exp_gen
7373
{% if hypothesis is not none %}
7474
The user is trying to generate new {{ targets }} based on the hypothesis generated in the previous step.
7575
{% else %}
76-
The user is trying to generate new {{ targets }} based on the information provided.
76+
The user is trying to generate a very simple new {{ targets }} based on the information provided.
7777
{% endif %}
7878
The {{ targets }} are used in certain scenario, the scenario is as follows:
7979
{{ scenario }}
@@ -84,7 +84,9 @@ task_gen: # It is deprecated now, please refer to direct_exp_gen
8484
Your task should adhere to the specification above.
8585
{% endif %}
8686
87-
{% if hypothesis is not none %}
87+
{% if hypothesis is none %}
88+
Since we are at the very beginning stage, we plan to start from a very simple task. To each component, please only generate the task to implement the most simple and basic function of the component. For example, the feature engineering should only implement the function which output the raw data without any transformation. The model component only uses the most basic and easy to implement model without any tuning. The ensemble component only uses the simplest ensemble method. The main focus at this stage is to build the first runnable version of the solution.
89+
{% else %}
8890
The user will use the {{ targets }} generated to do some experiments. The user will provide this information to you:
8991
1. The target hypothesis you are targeting to generate {{ targets }} for.
9092
2. The hypothesis generated in the previous steps and their corresponding feedbacks.
@@ -260,7 +262,7 @@ component_gen:
260262
261263
Please select the component you are going to improve the latest implementation or sota implementation.
262264
263-
Please generate the output following the format below:
265+
Please generate the output in JSON format following the format below:
264266
{{ component_output_format }}
265267
266268
user: |-

0 commit comments

Comments
 (0)