fix: fix model input shape bug and costeer_model bug (microsoft#821)

WinstonLiyt · web-flow · commit ff99e4dc2c3e · 2025-04-24T12:48:25.000+08:00
* fix model input shape bug and costeer_model bug

* fix a bug
diff --git a/rdagent/components/coder/model_coder/evolving_strategy.py b/rdagent/components/coder/model_coder/evolving_strategy.py
@@ -52,7 +52,6 @@ def implement_one_task(
             if isinstance(queried_knowledge, CoSTEERQueriedKnowledgeV2)
             else queried_former_failed_knowledge
         )
-
         system_prompt = (
             Environment(undefined=StrictUndefined)
             .from_string(
@@ -61,7 +60,7 @@ def implement_one_task(
             .render(
                 scenario=self.scen.get_scenario_all_desc(filtered_tag=target_task.model_type),
                 queried_former_failed_knowledge=queried_former_failed_knowledge_to_render,
-                current_code=target_task.base_code,
+                current_code=workspace.file_dict.get("model.py"),
             )
         )
 
diff --git a/rdagent/components/coder/model_coder/model_execute_template_v1.txt b/rdagent/components/coder/model_coder/model_execute_template_v1.txt
@@ -16,8 +16,8 @@ if MODEL_TYPE == "Tabular":
     m = model_cls(num_features=input_shape[1])
     data = torch.full(input_shape, INPUT_VALUE)
 elif MODEL_TYPE == "TimeSeries":
-    input_shape = (BATCH_SIZE, NUM_FEATURES, NUM_TIMESTEPS)
-    m = model_cls(num_features=input_shape[1], num_timesteps=input_shape[2])
+    input_shape = (BATCH_SIZE, NUM_TIMESTEPS, NUM_FEATURES)
+    m = model_cls(num_features=input_shape[2], num_timesteps=input_shape[1])
     data = torch.full(input_shape, INPUT_VALUE)
 elif MODEL_TYPE == "Graph":
     node_feature = torch.randn(BATCH_SIZE, NUM_FEATURES)
diff --git a/rdagent/scenarios/qlib/experiment/prompts.yaml b/rdagent/scenarios/qlib/experiment/prompts.yaml
@@ -156,7 +156,7 @@ qlib_model_background: |-
   3. Architecture: The detailed architecture of the model, such as neural network layers or tree structures.
   4. Hyperparameters: The hyperparameters used in the model, such as learning rate, number of epochs, etc.
   5. ModelType: The type of the model, "Tabular" for tabular model and "TimeSeries" for time series model.
-  The model should provide clear and detailed documentation of its architecture and hyperparameters. One model should statically define one output with a fixed architecture and hyperparameters. For example, a model with an two GRU layer and a model with three GRU layer should be considered two different models.
+  The model should provide clear and detailed documentation of its architecture and hyperparameters. One model should statically define one output with a fixed architecture and hyperparameters.
 
 qlib_model_interface: |-
   Your python code should follow the interface to better interact with the user's system.
@@ -176,7 +176,7 @@ qlib_model_interface: |-
   model_cls = XXXModel
   ```
 
-  The model has two types, "Tabular" for tabular model and "TimeSeries" for time series model. The input shape to a tabular model is (batch_size, num_features) and the input shape to a time series model is (batch_size, num_features, num_timesteps). The output shape of the model should be (batch_size, 1).
+  The model has two types, "Tabular" for tabular model and "TimeSeries" for time series model. The input shape to a tabular model is (batch_size, num_features) and the input shape to a time series model is (batch_size, num_timesteps, num_features). The output shape of the model should be (batch_size, 1).
   The "batch_size" is a dynamic value which is determined by the input of forward function.
   The "num_features" and "num_timesteps" are static which will be provided to the model through init function.
   User will initialize the tabular model with the following code:
@@ -189,8 +189,6 @@ qlib_model_interface: |-
   ```
   No other parameters will be passed to the model so give other parameters a default value or just make them static.
 
-  When dealing with TimeSeries model, remember to permute the input tensor since the input tensor is in the shape of (batch_size, num_features, num_timesteps) and a normal time series model is expecting the input tensor in the shape of (batch_size, num_timesteps, num_features).
-
   Don't write any try-except block in your python code. The user will catch the exception message and provide the feedback to you. Also, don't write main function in your python code. The user will call the forward method in the model_cls to get the output tensor.
 
   Please notice that your model should only use current features as input. The user will provide the input tensor to the model's forward function.

Original file line number	Diff line number	Diff line change
`@@ -52,7 +52,6 @@ def implement_one_task(`
`52`	`52`	`if isinstance(queried_knowledge, CoSTEERQueriedKnowledgeV2)`
`53`	`53`	`else queried_former_failed_knowledge`
`54`	`54`	`)`
`55`		`-`
`56`	`55`	`system_prompt = (`
`57`	`56`	`Environment(undefined=StrictUndefined)`
`58`	`57`	`.from_string(`
`@@ -61,7 +60,7 @@ def implement_one_task(`
`61`	`60`	`.render(`
`62`	`61`	`scenario=self.scen.get_scenario_all_desc(filtered_tag=target_task.model_type),`
`63`	`62`	`queried_former_failed_knowledge=queried_former_failed_knowledge_to_render,`
`64`		`- current_code=target_task.base_code,`
	`63`	`+ current_code=workspace.file_dict.get("model.py"),`
`65`	`64`	`)`
`66`	`65`	`)`
`67`	`66`