CoT-Watchdog

Real-time chain-of-thought oversight for autonomous AI agents.

CoT Watchdog monitors an agent’s reasoning process live, detecting subtle failures before unsafe or unintended actions occur. The system runs fully on-device and introduces a human-in-the-loop checkpoint whenever suspicious reasoning behavior is detected.

Built during the NemoClaw NVIDIA x ASUS Hackathon at UCSC, on the ASUS Ascent GX10 using NVIDIA Nemotron Nano 3 and OpenClaw.

Features

Goal Drift Detection

Tracks whether the agent’s reasoning is still aligned with the original task objective using embedding similarity and cosine distance.

Example: Goal: “Summarize this paper” Drifted reasoning: “Here’s my opinion on the topic”

If reasoning diverges too far from the original goal, the system flags the step before execution continues.

Reasoning-Action Mismatch Detection

Compares what the agent says it is about to do against the actual tool call it makes.

Example: Reasoning: “I’ll search for supporting evidence” Actual action: calls unrelated tool

This catches silent inconsistencies between reasoning and execution.

Confidence Miscalibration

Tracks hedge density and uncertainty language throughout the reasoning trace.

If the chain-of-thought contains significant uncertainty while the final answer is overly confident, the monitor flags the discrepancy for human review.

Confidence Miscalibration Detection

Tracks hedge density and uncertainty language throughout the reasoning trace.

If the chain-of-thought contains significant uncertainty while the final answer is overly confident, the monitor flags the discrepancy for human review.

Human-in-the-Loop Oversight

Whenever a monitor triggers:

execution pauses,
the full reasoning trace is surfaced,
the operator must explicitly approve continuation.

All reasoning traces, alerts, and overrides are persisted for auditing.

Tech Stack

Python
OpenClaw
NVIDIA Nemotron Nano 3
ASUS Ascent GX10 (DGX Spark)
Sentence Transformers
Vector Embeddings
Persistent Trace Storage

Why On-Device Matters

Reasoning traces are one of the most sensitive internal states of an AI system.

CoT Watchdog keeps all of the following fully local to the GX10:

reasoning
monitoring
embeddings
auditing

Nothing leaves the machine.

Inspiration

This project was inspired by:

“Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” (Korbak et al., 2025)
“CoT Red-Handed” (2025)
Ataei et al., “Enhancing Autonomous Vehicle Test Scenario Reasoning in Language Models” (IEEE ITSC 2025)

The goal was to move chain-of-thought oversight from theory into a working real-time system.

Challenges

Threshold Tuning: Finding cosine similarity thresholds that catch genuine drift without overwhelming operators with false positives.

Parsing Intent: Extracting actionable intent from free-form reasoning text is difficult because agents rarely state actions explicitly.

Latency: Running multiple monitors plus embeddings on every reasoning step while maintaining interactive responsiveness on-device.

Confidence Scoring: Designing interpretable uncertainty scoring without direct token probability access.

Future Work

Learned per-task drift thresholds
Adversarial robustness testing
Retrieval grounding verification
Tool-sequence anomaly detection
Multi-agent oversight
Open-source monitoring framework

Vision

Modern AI systems increasingly operate autonomously across long reasoning chains and tool-use workflows.

CoT Watchdog explores a simple question:

If agents reason in human language, can we monitor that reasoning before we trust the action?

This project is an attempt to build that oversight layer in practice.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
demo		demo
detectors		detectors
examples		examples
policy		policy
tests		tests
timeline_ui		timeline_ui
.gitignore		.gitignore
README.md		README.md
agent_loop.py		agent_loop.py
dry_run.sh		dry_run.sh
fixed_setup_checkpoint.py		fixed_setup_checkpoint.py
human_io.py		human_io.py
initial_nemotron_setup.py		initial_nemotron_setup.py
memory.py		memory.py
requirements.txt		requirements.txt
setup_checkpoint.py		setup_checkpoint.py
tune_thresholds.py		tune_thresholds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoT-Watchdog

Features

Reasoning-Action Mismatch Detection

Confidence Miscalibration

Confidence Miscalibration Detection

Human-in-the-Loop Oversight

Tech Stack

Why On-Device Matters

Inspiration

Challenges

Future Work

Vision

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CoT-Watchdog

Features

Reasoning-Action Mismatch Detection

Confidence Miscalibration

Confidence Miscalibration Detection

Human-in-the-Loop Oversight

Tech Stack

Why On-Device Matters

Inspiration

Challenges

Future Work

Vision

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages