You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/dev/prompts.yaml
+36-13Lines changed: 36 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -80,7 +80,7 @@ exp_feedback:
80
80
81
81
user: |-
82
82
We are currently in a process of validating hypotheses to iteratively improve our models for Kaggle competitions. Each round aims explicitly to confirm or reject hypotheses based on experiment results.
83
-
83
+
84
84
## SOTA Solution
85
85
{{ sota_desc }}
86
86
@@ -126,21 +126,22 @@ exp_feedback:
126
126
{{ feedback_desc or "There has not been any experiments yet." }}
127
127
Please refer to these hypotheses and feedback to help you recommend new experiment and hypothesis
128
128
129
+
129
130
Tips:
130
131
- Step 1: If submission format has issues, prioritize fixing them before proceeding. If the format is correct and it's the first valid submission ever (there has never been valid submissions in the past), set `"Replace Best Result": "yes"`. If the format is correct and this is not the first valid submission, proceed to Step 2.
131
132
- Step 2: If evaluation alignment issues are identified (validation approach does not follow competition requirements), address these methodological discrepancies immediately.
132
133
- Step 3: If new results significantly worse than SOTA, or repeated hyperparameter adjustments yield no improvement, it might be time to rethink or shift focus.
133
134
134
-
exp_feedback_v3:
135
+
exp_feedback_draft:
135
136
system: |-
136
137
You are an advanced assistant analyzing results in data-driven R&D.
137
138
138
139
Below is a detailed description of the current Kaggle competition scenario:
139
140
{{ scenario }}
140
141
141
-
Your task is to analyze the current experiment's hypothesis, implementation (code), and results, explicitly comparing them with previous experiments and the best previous result (SOTA).
142
+
Your task is to analyze the current experiment's hypothesis, implementation (code and its changes), and results, explicitly comparing them with previous best SOTA result step by step.
142
143
143
-
Step-by-step Analysis Process:
144
+
# Step-by-step Analysis Process:
144
145
145
146
Step 1: Verify Submission Format
146
147
- If the submission format check fails:
@@ -159,9 +160,11 @@ exp_feedback_v3:
159
160
- Consistent prediction methodologies between validation and test datasets.
160
161
- No shortcuts or fold-specific strategies applied inconsistently.
161
162
- Rigorous checks for corner-case consistency.
163
+
- If the validation score appears unreliable, provide concrete evidence from the scenario description or code implementation. Do not rely on assumptions without direct supporting evidence.
162
164
- Additionally, detect whether the setup introduces structural risks, such as overfitting-prone finetuning strategies or domain adaptation on insufficient data.
165
+
- If overfitting is detected, provide a detailed analysis explaining how and why it occurs, referencing scenario description, code implementation, and validation scores to support your findings.
163
166
- If such discrepancies or risks are found:
164
-
- Clearly document these issues in `Reasoning`.
167
+
- Clearly document these issues in `Reasoning`, referencing both scenario description and code implementation—not just validation scores.
165
168
- Set `"Evaluation Aligned With Task": "no"` and `"Replace Best Result": "no"`.
166
169
- Begin your `reasoning` with `[Evaluation error]`, explicitly stating the evaluation alignment issues causing experiment failure.
167
170
- If evaluation alignment passes, set `"Evaluation Aligned With Task": "yes"`, and then proceed to Step 3.
@@ -177,6 +180,7 @@ exp_feedback_v3:
177
180
- NOTES:
178
181
- The experiments focus on the comparison of the final ensemble results (Don't reject the results because they are still not perfect)
179
182
- If the `ensemble` score does not exceed the best individual mode or single fold, it is still acceptable unless the gap is significant.
183
+
180
184
Step 4: Analyze Code With Similar validation Results
181
185
- If the current `ensemble` validation score is similar to the SOTA `ensemble` validation score, give the decision based on the comparison between the current experiment and SOTA.
182
186
- The current code should replace the best result if the code is:
@@ -185,23 +189,39 @@ exp_feedback_v3:
185
189
- Interpretable and domain alignment. The code should be tied to solid domain knowledge and be interpretable.
186
190
- More resource efficiency. The code should be more efficient in terms of time and space complexity.
187
191
- Please examine the code carefully based on the above criteria and provide a detailed analysis of the code.
188
-
- Begin your `reasoning` with `[Code Analysis]`, clearly stating why the current code is better or worse than SOTA.
192
+
- Begin your `reasoning` with `[Code Analysis]`, clearly stating why the current code is better or worse than SOTA, based on the analysis of code implementation.
189
193
- If the current code is not better than SOTA, set `"Replace Best Result": "no"`. Otherwise, set `"Replace Best Result": "yes"`.
190
-
191
-
Provide detailed and constructive feedback structured as follows:
192
-
Example JSON Structure for Result Analysis:
194
+
195
+
Step 5: EDA improvement analysis (if needed)
196
+
- The user might provide Data Overview in EDA format which is the output of the EDA code. You should analyze the EDA result and provide feedback on how it can be improved.
197
+
- The improvement might include some addons or modifications or deletions to some part of the EDA code.
198
+
- You should provide your feedback based on the current code and SOTA code. Especially focus on the feature engineering part.
199
+
- For example, if the code truncate the line with N words, you can suggest to print the mean, median or quantile of the length of the line for better understanding of the data in the next rounds of experiments.
200
+
201
+
Provide detailed and constructive feedback structured as follows without anything else:
193
202
{
194
203
"Submission Format Check": "yes or no",
195
204
"First Valid Submission": "yes or no",
196
-
"Observations": "Clearly summarize current and SOTA ensemble results with exact scores and notable patterns. Limit to no more than three concise, data-focused sentences.",
205
+
"Code Change Summary": "Clearly summarize the changes made to the code (please cover the most important changes while being concise); during development, extra modifications may be made beyond the intent of the hypothesis, so these changes should also be included to provide complete information",
206
+
"Observations": "Clearly summarize current and SOTA ensemble results with exact scores and notable patterns. Limit to no more than three concise, data-focused sentences. Your observation must be grounded by explicit evidence from scenario description or code implementation, not just validation scores.",
197
207
"Feedback for Hypothesis": Explicitly confirm or refute the hypothesis based on specific data points or performance trends. Limit to two sentences.",
198
208
"Evaluation Aligned With Task": "yes or no",
199
209
"Replace Best Result": "yes or no",
200
-
"Reasoning": "Clearly explain the reason for success or failure of the experiment. Begin explicitly with [Submission format error], [Evaluation error], [Experiment Analysis] or [Code Analysis] depending on the step at which issues arose. Reference specific scores and methodological differences with SOTA. Limit to three sentences."
210
+
"Refine Decision": "yes or no",
211
+
"Reasoning": "Clearly explain the reason for success or failure of the experiment. Begin explicitly with [Submission format error], [Evaluation error], [Experiment Analysis] or [Code Analysis] depending on the step at which issues arose. Reference specific scores and methodological differences with SOTA. Limit to three sentences.",
212
+
"EDA Improvement": "improvement suggestion for EDA code, if needed, otherwise set to 'no'. If there is no EDA code, set to 'no'."
201
213
}
202
214
203
215
user: |-
204
216
We are currently in a process of validating hypotheses to iteratively improve our models for Kaggle competitions. Each round aims explicitly to confirm or reject hypotheses based on experiment results.
217
+
We prioritize minimal, incremental code changes that lead to measurable improvements.**
218
+
- Once a pipeline can run end-to-end and produce valid outputs with reasonable validation results, **future iterations should avoid large-scale rewrites**.
- Increasing `max_epoch` or adjusting early stopping to allow better convergence.
221
+
- Slightly modifying model architecture (e.g., unfreezing layers, switching backbone).
222
+
- Tuning hyperparameters like learning rate, batch size, or dropout.
223
+
- Introducing one new augmentation or feature at a time.
224
+
- This approach ensures that each change is **testable**, **traceable**, and **reversible**, and it avoids the risk of silently breaking a previously working pipeline.
205
225
206
226
## SOTA Solution
207
227
{{ sota_desc }}
@@ -227,8 +247,9 @@ exp_feedback_v3:
227
247
1. Pay close attention to the `ensemble` score, as it represents the final evaluation metric for this iteration.
228
248
2. If any individual model significantly outperforms the ensemble, this may indicate an issue in the ensemble method. But if the final `ensemble` score surpasses the current SOTA, you should update the SOTA record. However, it seems that there are noticeable issues in the ensemble component, be sure to highlight them explicitly.
229
249
230
-
Below are the results for this experiment:
231
-
{{ cur_exp.result }}
250
+
Below are the results and running time for this experiment:
Below is the comparison of the current `ensemble` performance with the SOTA results:
@@ -247,7 +268,9 @@ exp_feedback_v3:
247
268
{{ feedback_desc or "There has not been any experiments yet." }}
248
269
Please refer to these hypotheses and feedback to help you recommend new experiment and hypothesis
249
270
271
+
250
272
Tips:
251
273
- Step 1: If submission format has issues, prioritize fixing them before proceeding. If the format is correct and it's the first valid submission ever (there has never been valid submissions in the past), set `"Replace Best Result": "yes"`. If the format is correct and this is not the first valid submission, proceed to Step 2.
252
274
- Step 2: If evaluation alignment issues are identified (validation approach does not follow competition requirements), address these methodological discrepancies immediately.
253
275
- Step 3: If new results significantly worse than SOTA, or repeated hyperparameter adjustments yield no improvement, it might be time to rethink or shift focus.
276
+
- Step 4: If the result is only slightly better than the SOTA, but the code modifications are extensive (e.g., low modification score or too many critical changes), reject the update. Prefer small-step improvements with minimal changes. Set `"Replace Best Result": "no"` and explain in `"Reasoning"` starting with `[Code Change Too Large]`.
0 commit comments