Taming Actor-Observer Asymmetry in Agents
via Dialectical Alignment

Bobo Li¹ Rui Wu² Zibo Ji³ Meishan Zhang⁴ Hao Fei^5*
Min Zhang⁴ Mong-Li Lee¹ Wynne Hsu¹

¹National University of Singapore ²Sichuan University ³University of Minnesota Twin Cities
⁴Harbin Institute of Technology, Shenzhen ⁵University of Oxford

Overview

LLM agents exhibit Actor-Observer Asymmetry (AOA): as actors they blame external factors, as observers they blame internal faults. ReTAS (Reasoning via Thesis-Antithesis-Synthesis) mitigates this via dialectical SFT + GRPO.

Setup

curl -LsSf https://astral.sh/uv/install.sh | sh   # if uv not installed

git clone https://github.com/unikcc/ReTAS.git && cd ReTAS
uv venv --python 3.10
source .venv/bin/activate

uv pip install -e ".[train]"         # SFT + GRPO training + AFB + SalesArena
uv pip install -e ".[train,serve]"   # also vLLM for local serving

requires-python = ">=3.10". Blackwell / CUDA 13: public vLLM wheels are cu12 and will downgrade torch to CPU, so skip .[serve] and serve via SGLang nightly cu13 or NGC instead.

Data

Dataset lives on HuggingFace (gated, auto-approved).

Visit the page above, click Agree and access.
hf auth login with a Read token from https://huggingface.co/settings/tokens.
bash scripts/download_data.sh to fetch and lay out FinQA + Spider under the paths the trainers expect.

Modules

Each directory has its own README with training / evaluation commands:

Path	Description
`FinQA/`	TAS-SFT + GRPO on FinQA (financial reasoning)
`Spider/`	TAS-SFT + GRPO on Spider (text-to-SQL, 166 SQLite DBs)
`misc/AFB/`	Ambiguous Failure Benchmark, 10 domains × 2 scenarios
`misc/SalesArena/`	Multi-agent negotiation under 4 review mechanisms

Results

Citation

@inproceedings{li2026taming,
  title={Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment},
  author={Li, Bobo and Wu, Rui and Ji, Zibo and Zhang, Meishan and Fei, Hao and Zhang, Min and Lee, Mong-Li and Hsu, Wynne},
  booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
  year={2026}
}

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
FinQA		FinQA
Spider		Spider
assets		assets
misc		misc
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taming Actor-Observer Asymmetry in Agents
via Dialectical Alignment

Overview

Setup

Data

Modules

Results

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Taming Actor-Observer Asymmetry in Agentsvia Dialectical Alignment

Overview

Setup

Data

Modules

Results

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Taming Actor-Observer Asymmetry in Agents
via Dialectical Alignment

Packages