-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Open
Labels
Description
-
Did you update?
pip install --upgrade unsloth unsloth_zoo
Yes -
ColaborKaggleor local / cloud
Colab -
Number GPUs used, use
nvidia-smi
1 A100 80GB -
Which notebook? Please link!
Code -
Which Unsloth version, TRL version, transformers version, PyTorch version?
unsloth==2025.12.6
unsloth_zoo==2025.12.5
trl==0.25.0, 0.24.0, 0.23.1 (all tested)
torch==2.8.0
torchao=0.13.0
vllm==0.11.0
flashinfer-python==0.3.1.post1.
- Which trainer?
SFTTrainer,GRPOTraineretc
GRPO
This basic GRPO script reproduces the issue
Hi. When using GRPO with wandb, the eval rewards are logged with the train logs. This makes it extremely hard to track evaluation progress.
Running the script, the eval log is:
{'eval_loss': -0.0030808576848357916, .. 'rewards/dummy_reward/mean': 0.5854228019714356, ...}
But it is logged with the train reward:
Rewards are not logged in the evaluation pane:
Thanks!