Skip to content

Commit 7850b80

Browse files
authored
fix: update new feature engineering code format (microsoft#272)
* update new feature engineering code format * fix CI
1 parent c4895de commit 7850b80

File tree

4 files changed

+51
-24
lines changed

4 files changed

+51
-24
lines changed

rdagent/components/coder/factor_coder/factor_execution_template.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@ import os
22

33
import numpy as np
44
import pandas as pd
5-
from factor import feat_eng
5+
from factor import feature_engineering_cls
66

77
if os.path.exists("valid.pkl"):
88
valid_df = pd.read_pickle("valid.pkl")
99
else:
1010
raise FileNotFoundError("No valid data found.")
1111

12-
new_feat = feat_eng(valid_df)
12+
cls = feature_engineering_cls()
13+
cls.fit(valid_df)
14+
new_feat = cls.transform(valid_df)
1315
new_feat.to_hdf("result.h5", key="data", mode="w")
Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,23 @@
11
import pandas as pd
22

33
"""
4-
Here is the feature engineering code for each task, with the function name specified as feat_eng.
5-
The file name should start with feat_, followed by the specific task name.
4+
Here is the feature engineering code for each task, with a class that has a fit and transform method.
5+
Remember
66
"""
77

88

9-
def feat_eng(X: pd.DataFrame):
10-
"""
11-
return the selected features
12-
"""
13-
return X
9+
class IdentityFeature:
10+
def fit(self, train_df: pd.DataFrame):
11+
"""
12+
Fit the feature engineering model to the training data.
13+
"""
14+
pass
15+
16+
def transform(self, X: pd.DataFrame):
17+
"""
18+
Transform the input data.
19+
"""
20+
return X
21+
22+
23+
feature_engineering_cls = IdentityFeature

rdagent/scenarios/kaggle/experiment/meta_tpl/train.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,11 @@ def import_module_from_path(module_name, module_path):
4444
X_test_l = []
4545

4646
for f in DIRNAME.glob("feature/feat*.py"):
47-
m = import_module_from_path(f.stem, f)
48-
X_train_f = m.feat_eng(X_train)
49-
X_valid_f = m.feat_eng(X_valid)
50-
X_test_f = m.feat_eng(X_test)
47+
cls = import_module_from_path(f.stem, f).feature_engineering_cls()
48+
cls.fit(X_train)
49+
X_train_f = cls.transform(X_train)
50+
X_valid_f = cls.transform(X_valid)
51+
X_test_f = cls.transform(X_test)
5152

5253
X_train_l.append(X_train_f)
5354
X_valid_l.append(X_valid_f)

rdagent/scenarios/kaggle/experiment/prompts.yaml

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -66,11 +66,13 @@ kg_background: |-
6666
kg_feature_interface: |-
6767
Your code should contain several parts:
6868
1. The import part: import the necessary libraries.
69-
2. A feat_eng() function that handles feature engineering for each task.
70-
The function should take the following arguments:
71-
- X: The features as a pandas DataFrame.
72-
The function should return the new features as a pandas DataFrame.
73-
The input to `feat_eng` will be a pandas DataFrame, which should be processed to return a new DataFrame containing only the engineered features.
69+
2. A class that contains the feature engineering logic.
70+
The class should have the following methods:
71+
- fit: This method should fit the feature engineering model to the training data.
72+
- transform: This method should transform the input data and return it.
73+
For some tasks like generating new features, the fit method may not be necessary. Please pass this function as a no-op.
74+
3. A variable called feature_engineering_cls that contains the class name.
75+
The input to 'fit' is the training data in pandas dataframe, and the input to 'transform' is the data to be transformed in pandas dataframe.
7476
The original columns should be excluded from the returned DataFrame.
7577
7678
Exception handling will be managed externally, so avoid using try-except blocks in your code. The user will handle any exceptions that arise and provide feedback as needed.
@@ -83,12 +85,24 @@ kg_feature_interface: |-
8385
```python
8486
import pandas as pd
8587
86-
def feat_eng(X: pd.DataFrame):
87-
"""
88-
return the selected features
89-
"""
90-
return X.mean(axis=1).to_frame("mean_feature") # Example feature engineering
91-
return X.fillna(0) # Example feature processing
88+
class FeatureEngineeringName:
89+
def fit(self, train_df: pd.DataFrame):
90+
"""
91+
Fit the feature engineering model to the training data.
92+
For example, for one hot encoding, this would involve fitting the encoder to the training data.
93+
For feature scaling, this would involve fitting the scaler to the training data.
94+
"""
95+
return self
96+
97+
def transform(self, X: pd.DataFrame):
98+
"""
99+
Transform the input data.
100+
"""
101+
return X
102+
return X.mean(axis=1).to_frame("mean_feature") # Example feature engineering
103+
return X.fillna(0) # Example feature processing
104+
105+
feature_engineering_cls = FeatureEngineeringName
92106
```
93107
94108
To Note:

0 commit comments

Comments
 (0)