Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents:
The Curious Case of LLMs as Your Coding Tutors
📃arXiv • 🤗 Huggingface
This work explores the potential of LLMs as coding tutors. We propose Trace-and-Verify (Traver), an effective agent workflow that incorporates knowledge tracing and turn-by-turn verification, to tackle key challenges in coding tutoring. While this work focuses on coding tutoring as an example, the proposed method extends beyond coding to other task-tutoring scenarios, where the tutor must adapt content to users' varying levels of background knowledge. We further introduce Dialogue for Coding Tutoring (DICT), a novel evaluation protocol combining student simulation and coding tests to assess tutor performance. Such automated evaluation is critical for developing task-tutoring agents as it supports a systematic development and evaluation cycle.
Under a controlled setup, simulated students at different levels demonstrate distinct abilities in completing target coding tasks. Our DICT protocol serves as a feasible proxy for human evaluation, offering its advantages of scalability and cost-effectiveness for evaluating tutor agents.
Our proposed Traver agent workflow with the trained verifier shows inference-time scaling for coding tutoring:
- Add detailed instructions for quick start
- Add shell scripts for training and evaluation
- Release checkpoints for the verifiers
Please refer to output for the released data and evaluation results.
Please refer to scripts/eval/ for the evaluation scripts.
If you find the resources in this repository useful for your work, please kindly cite our work as:
@article{wang2025training,
title={Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors},
author={Wang, Jian and Dai, Yinpei and Zhang, Yichi and Ma, Ziqiao and Li, Wenjie and Chai, Joyce},
journal={arXiv preprint arXiv:2502.13311},
url={https://arxiv.org/abs/2502.13311},
year={2025}
}


