You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/dev/prompts.yaml
+6-3Lines changed: 6 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -24,9 +24,11 @@ exp_feedback:
24
24
- Consistent prediction methodologies between validation and test datasets.
25
25
- No shortcuts or fold-specific strategies applied inconsistently.
26
26
- Rigorous checks for corner-case consistency.
27
+
- If the validation score appears unreliable, provide concrete evidence from the scenario description or code implementation. Do not rely on assumptions without direct supporting evidence.
27
28
- Additionally, detect whether the setup introduces structural risks, such as overfitting-prone finetuning strategies or domain adaptation on insufficient data.
29
+
- If overfitting is detected, provide a detailed analysis explaining how and why it occurs, referencing scenario description, code implementation, and validation scores to support your findings.
28
30
- If such discrepancies or risks are found:
29
-
- Clearly document these issues in `Reasoning`.
31
+
- Clearly document these issues in `Reasoning`, referencing both scenario description and code implementation—not just validation scores.
30
32
- Set `"Evaluation Aligned With Task": "no"` and `"Replace Best Result": "no"`.
31
33
- Begin your `reasoning` with `[Evaluation error]`, explicitly stating the evaluation alignment issues causing experiment failure.
32
34
- If evaluation alignment passes, set `"Evaluation Aligned With Task": "yes"`, and then proceed to Step 3.
@@ -42,6 +44,7 @@ exp_feedback:
42
44
- NOTES:
43
45
- The experiments focus on the comparison of the final ensemble results (Don't reject the results because they are still not perfect)
44
46
- If the `ensemble` score does not exceed the best individual mode or single fold, it is still acceptable unless the gap is significant.
47
+
45
48
Step 4: Analyze Code With Similar validation Results
46
49
- If the current `ensemble` validation score is similar to the SOTA `ensemble` validation score, give the decision based on the comparison between the current experiment and SOTA.
47
50
- The current code should replace the best result if the code is:
@@ -50,13 +53,13 @@ exp_feedback:
50
53
- Interpretable and domain alignment. The code should be tied to solid domain knowledge and be interpretable.
51
54
- More resource efficiency. The code should be more efficient in terms of time and space complexity.
52
55
- Please examine the code carefully based on the above criteria and provide a detailed analysis of the code.
53
-
- Begin your `reasoning` with `[Code Analysis]`, clearly stating why the current code is better or worse than SOTA.
56
+
- Begin your `reasoning` with `[Code Analysis]`, clearly stating why the current code is better or worse than SOTA, based on the analysis of code implementation.
54
57
- If the current code is not better than SOTA, set `"Replace Best Result": "no"`. Otherwise, set `"Replace Best Result": "yes"`.
55
58
56
59
Provide detailed and constructive feedback structured as follows:
57
60
Example JSON Structure for Result Analysis:
58
61
{
59
-
"Observations": "Clearly summarize current and SOTA ensemble results with exact scores and notable patterns. Limit to no more than three concise, data-focused sentences.",
62
+
"Observations": "Clearly summarize current and SOTA ensemble results with exact scores and notable patterns. Limit to no more than three concise, data-focused sentences. Your observation must be grounded by explicit evidence from scenario description or code implementation, not just validation scores.",
60
63
"Feedback for Hypothesis": Explicitly confirm or refute the hypothesis based on specific data points or performance trends. Limit to two sentences.",
0 commit comments