Bobo Li1
Rui Wu2
Zibo Ji3
Meishan Zhang4
Hao Fei5*
Min Zhang4
Mong-Li Lee1
Wynne Hsu1
1National University of Singapore
2Sichuan University
3University of Minnesota Twin Cities
4Harbin Institute of Technology, Shenzhen
5University of Oxford
LLM agents exhibit Actor-Observer Asymmetry (AOA): as actors they blame external factors, as observers they blame internal faults. ReTAS (Reasoning via Thesis-Antithesis-Synthesis) mitigates this via dialectical SFT + GRPO.
curl -LsSf https://astral.sh/uv/install.sh | sh # if uv not installed
git clone https://github.com/unikcc/ReTAS.git && cd ReTAS
uv venv --python 3.10
source .venv/bin/activate
uv pip install -e ".[train]" # SFT + GRPO training + AFB + SalesArena
uv pip install -e ".[train,serve]" # also vLLM for local servingrequires-python = ">=3.10". Blackwell / CUDA 13: public vLLM wheels are cu12 and will downgrade torch to CPU, so skip .[serve] and serve via SGLang nightly cu13 or NGC instead.
Dataset lives on HuggingFace (gated, auto-approved).
- Visit the page above, click Agree and access.
hf auth loginwith a Read token from https://huggingface.co/settings/tokens.bash scripts/download_data.shto fetch and lay out FinQA + Spider under the paths the trainers expect.
Each directory has its own README with training / evaluation commands:
| Path | Description |
|---|---|
FinQA/ |
TAS-SFT + GRPO on FinQA (financial reasoning) |
Spider/ |
TAS-SFT + GRPO on Spider (text-to-SQL, 166 SQLite DBs) |
misc/AFB/ |
Ambiguous Failure Benchmark, 10 domains × 2 scenarios |
misc/SalesArena/ |
Multi-agent negotiation under 4 review mechanisms |
@inproceedings{li2026taming,
title={Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment},
author={Li, Bobo and Wu, Rui and Ji, Zibo and Zhang, Meishan and Fei, Hao and Zhang, Min and Lee, Mong-Li and Hsu, Wynne},
booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
year={2026}
}MIT.



