Practical question: for people running this with a cheaper model as the task agent (cost reasons), does the teacher need to match the family to get the diagnostic benefit, or does any strong frontier model work equally well as teacher? The MarkTechPost writeup hints at same-model pairing mattering but the two runs used different harnesses so it's hard to tell. Would love to know the actual teacher models used and whether you've tested cross-family pairings.
Practical question: for people running this with a cheaper model as the task agent (cost reasons), does the teacher need to match the family to get the diagnostic benefit, or does any strong frontier model work equally well as teacher? The MarkTechPost writeup hints at same-model pairing mattering but the two runs used different harnesses so it's hard to tell. Would love to know the actual teacher models used and whether you've tested cross-family pairings.