Skip to content

Integration proposal: ast-guard as static code analysis policy for reward hacking detection #375

@Nick-is-building

Description

@Nick-is-building

Hey team, following up from the conversation on r/AI_Agents where @Big_Wonder7834 suggested raising a PR to integrate ast-guard into failproofai's coding harnesses.

What ast-guard does:
ast-guard is a deterministic reward hacking detector for LLM-generated Python code. It analyzes code structurally via Python's AST before execution. No LLM, zero dependencies, pure static analysis. It catches hardcoded lookup tables, if/else chains for test inputs, forbidden system calls (eval, exec, os, subprocess), obfuscation attempts, and unexplained complexity collapses.

Repo: https://github.com/Nick-is-building/ast-guard

How it could fit into failproofai:
I see ast-guard working as a custom policy that hooks into PreToolUse events for Write and Edit tool calls. When an agent writes or modifies a Python file, the policy would compare the original file content against the new content using ast-guard's scan() function and return deny() if structural cheating is detected, allow() with context if warnings are found, or allow() if the code is clean.
This would add a layer that failproofai currently doesn't cover: not just what the agent is doing (running sudo, deleting files, leaking secrets), but whether the code the agent produces is structurally honest.

What I'd need from you:
Some guidance on the best integration approach. I see two options: a standalone custom policy file that users drop into .failproofai/policies/, or a deeper integration as a built-in policy. Happy to start with option 1 and iterate from there.
Would also be great to know if there are any conventions for policies that shell out to Python (since ast-guard is Python-based and failproofai is Node/TypeScript).
Looking forward to collaborating on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions