EntroCoT

A modular, scalable algorithm for detecting and filtering unreliable reasoning paths in large-scale reasoning datasets.

What it does

Entropy-guided segmentation
Uses token-level entropy to locate the most ambiguous places inside a chain-of-thought.
Rollout-based accuracy curve
For each incremental segment it prompts the model n times and measures answer accuracy.
Automatic triage
Every sample is classified into exactly one bucket:
- ✅ reliable – accuracy is non-decreasing or recovers after a drop
- ❌ rejected – accuracy drops and never recovers (partial COT is stored for later recovery)
- ⚠️ all-zero – every segment gives 0 % accuracy (candidate for full rewrite/check whether it is too difficult)
Ready for recovery(optional)
Outputs three clean JSONL files so you can immediately run the companion DataRecovery process to fix rejected & all-zero samples.

File Layout

method_code/          
├── core.py              # ReliabilityFilter class 
├── entropy.py           # Token-level entropy calculation
├── rollout.py           # Concurrent rollout client
├── metrics.py           # Accuracy / answer-matching logic
├── prompts.py           # Prompt templates
├── answer_parser.py     # answer extraction
├── data_io.py           # for JSONL
├── models.py            # OpenAI client
├── constants.py         # Default hyper-parameters settings
└── logging_config.py    # logging config

data_recovery.py         # fixes rejected & all-zero samples (for reference)

Requirements

pip install numpy requests openai tqdm

Quick Start

from method_code import QwenReliabilityFilter
from method_code.data_io import load_jsonl

filter = QwenReliabilityFilter(
    api_url="http://your-qwen-endpoint/v1/chat/completions",
    api_key="your-key",                       # rollout API
    entropy_api_base="http://deepseek-endpoint/v1",
    entropy_api_key="your-key",               # entropy API
    entropy_model="deepseek-ai/DeepSeek-R1",
    max_workers=1000,
    max_segments=5,        
    request_timeout=10000,
)

# Any math JSONL with "conversations" format
data = load_jsonl("numina_train.jsonl")

filter.process_dataset_concurrent(
    data_list=data,
    output_file="entropy_report.json",   
    reliable_file="reliable.jsonl",      # reliable samples
    rejected_file="rejected.jsonl",      # truncate reasoning paths
    all_zero_file="all_zero.jsonl",      # rewrite later
    num_rollouts=8,      # accuracy estimated with 8 rollouts
    batch_size=1000,     
)

Performance

Model	Dataset	Size	Method	GSM8K	MATH	GaoKao	Odyssey	Olympiad	AMC23	Avg.
Meta-Llama3.1-8B	MetaMathQA	395k	Direct SFT	77.03	33.80	23.64	7.46	6.22	7.50	25.94
		332k	EntroCoT-random	74.83	31.40	23.90	6.68	5.93	5.00	24.62
		358k	EntroCoT-w/o-greedy	75.01	32.60	23.22	6.78	6.84	8.50	25.49
		344k	EntroCoT-full	76.89	35.80	27.01	7.97	6.81	15.00	28.25
Meta-Llama3.1-8B	NuminaMath	859k	Direct SFT	72.10	37.20	32.73	20.82	13.04	19.00	32.48
		515k	EntroCoT-random	71.34	39.24	36.67	19.69	12.86	19.00	33.13
		395k	EntroCoT-w/o-greedy	70.96	39.80	38.96	17.48	12.00	17.50	32.78
		480k	EntroCoT-full	76.00	41.20	40.00	19.54	14.37	20.00	35.19
Qwen2.5-Math-1.5B	MetaMathQA	395k	Direct SFT	48.60	33.84	33.72	17.12	10.28	7.50	25.18
		332k	EntroCoT-random	45.40	33.92	35.58	16.45	12.12	8.50	25.33
		358k	EntroCoT-w/o-greedy	47.43	34.44	35.12	16.20	10.67	8.50	25.39
		344k	EntroCoT-full	50.19	34.56	37.35	17.23	11.14	15.00	27.58
Qwen2.5-Math-1.5B	NuminaMath	859k	Direct SFT	70.90	54.64	46.07	21.44	19.73	32.50	40.88
		515k	EntroCoT-random	71.01	52.12	44.21	22.67	21.48	32.50	40.67
		395k	EntroCoT-w/o-greedy	73.09	48.20	40.52	20.31	18.07	35.00	39.20
		480k	EntroCoT-full	74.65	59.60	48.80	23.40	24.35	45.50	46.05

Optional: Data Recovery

After filtering, recovery process can be run:

python data_recovery.py

It will:

continue truncated COTs (rejected)
rewrite completely wrong solutions (all_zero)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
method_code		method_code
README.md		README.md
data_recovery.py		data_recovery.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EntroCoT

What it does

File Layout

Requirements

Quick Start

Performance

Optional: Data Recovery

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EntroCoT

What it does

File Layout

Requirements

Quick Start

Performance

Optional: Data Recovery

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages