Skip to content

Commit 370adbf

Browse files
RolandMinruiXu
andauthored
fix: refine feedback prompt (microsoft#901)
* feedback observation must base on evidence * avoid too strong constrain --------- Co-authored-by: Xu <v-xuminrui@microsoft.com>
1 parent dbaf70c commit 370adbf

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

rdagent/scenarios/data_science/dev/prompts.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,11 @@ exp_feedback:
2424
- Consistent prediction methodologies between validation and test datasets.
2525
- No shortcuts or fold-specific strategies applied inconsistently.
2626
- Rigorous checks for corner-case consistency.
27+
- If the validation score appears unreliable, provide concrete evidence from the scenario description or code implementation. Do not rely on assumptions without direct supporting evidence.
2728
- Additionally, detect whether the setup introduces structural risks, such as overfitting-prone finetuning strategies or domain adaptation on insufficient data.
29+
- If overfitting is detected, provide a detailed analysis explaining how and why it occurs, referencing scenario description, code implementation, and validation scores to support your findings.
2830
- If such discrepancies or risks are found:
29-
- Clearly document these issues in `Reasoning`.
31+
- Clearly document these issues in `Reasoning`, referencing both scenario description and code implementation—not just validation scores.
3032
- Set `"Evaluation Aligned With Task": "no"` and `"Replace Best Result": "no"`.
3133
- Begin your `reasoning` with `[Evaluation error]`, explicitly stating the evaluation alignment issues causing experiment failure.
3234
- If evaluation alignment passes, set `"Evaluation Aligned With Task": "yes"`, and then proceed to Step 3.
@@ -42,6 +44,7 @@ exp_feedback:
4244
- NOTES:
4345
- The experiments focus on the comparison of the final ensemble results (Don't reject the results because they are still not perfect)
4446
- If the `ensemble` score does not exceed the best individual mode or single fold, it is still acceptable unless the gap is significant.
47+
4548
Step 4: Analyze Code With Similar validation Results
4649
- If the current `ensemble` validation score is similar to the SOTA `ensemble` validation score, give the decision based on the comparison between the current experiment and SOTA.
4750
- The current code should replace the best result if the code is:
@@ -50,13 +53,13 @@ exp_feedback:
5053
- Interpretable and domain alignment. The code should be tied to solid domain knowledge and be interpretable.
5154
- More resource efficiency. The code should be more efficient in terms of time and space complexity.
5255
- Please examine the code carefully based on the above criteria and provide a detailed analysis of the code.
53-
- Begin your `reasoning` with `[Code Analysis]`, clearly stating why the current code is better or worse than SOTA.
56+
- Begin your `reasoning` with `[Code Analysis]`, clearly stating why the current code is better or worse than SOTA, based on the analysis of code implementation.
5457
- If the current code is not better than SOTA, set `"Replace Best Result": "no"`. Otherwise, set `"Replace Best Result": "yes"`.
5558
5659
Provide detailed and constructive feedback structured as follows:
5760
Example JSON Structure for Result Analysis:
5861
{
59-
"Observations": "Clearly summarize current and SOTA ensemble results with exact scores and notable patterns. Limit to no more than three concise, data-focused sentences.",
62+
"Observations": "Clearly summarize current and SOTA ensemble results with exact scores and notable patterns. Limit to no more than three concise, data-focused sentences. Your observation must be grounded by explicit evidence from scenario description or code implementation, not just validation scores.",
6063
"Feedback for Hypothesis": Explicitly confirm or refute the hypothesis based on specific data points or performance trends. Limit to two sentences.",
6164
"Evaluation Aligned With Task": "yes or no",
6265
"Replace Best Result": "yes or no",

0 commit comments

Comments
 (0)