Skip to content

feat(scripts): semantic config-drift detector + weekly CI#1

Merged
ANcpLua merged 2 commits into
mainfrom
feat/config-drift-detector
May 18, 2026
Merged

feat(scripts): semantic config-drift detector + weekly CI#1
ANcpLua merged 2 commits into
mainfrom
feat/config-drift-detector

Conversation

@ANcpLua
Copy link
Copy Markdown
Owner

@ANcpLua ANcpLua commented May 18, 2026

Semantic drift detection for shared configs across the framework repos. Parses each file per type (JSON/YAML/XML/INI/line-based) and clusters by equivalence — whitespace/key-order/flow-style noise collapses, only real content drift surfaces.

Watchlist defaults to 16 repos × 33 paths. Weekly cron Monday 06:00 UTC; opens/updates a config-drift issue on drift.

Local test result: 8 paths fully clean, 21 drifted — including legitimate variation (renovate.json anti-self-bump per repo, .coderabbit.yaml per-repo policies). renovate.json was 16 byte-unique → only 5 semantic clusters (12 of 16 identical after normalisation). Exactly the noise reduction we wanted.

Adds scripts/drift_check.py which audits a watchlist of shared config
files across all framework repos and reports semantic drift (JSON
key-order, YAML flow style, .gitignore line order, XML attribute order
all collapse to one equivalence class).

Watchlist + repo list live in scripts/drift-policy.yaml (16 repos,
33 paths today; extend as repos and files are added).

.github/workflows/drift-check.yml runs the detector every Monday 06:00
UTC and opens (or updates) a config-drift labelled issue with the
report when drift is found.

Verified locally: 8 of 33 paths fully clean. The 21 drifted paths
include legitimate per-repo variation (anti-self-bump renovate.json
rules, .coderabbit.yaml policies) which still surface for human
review. Notable: renovate.json had 16 byte-unique versions but only 5
semantic clusters — 12 of 16 collapse to identical content after JSON
normalisation.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Semantic configuration drift detection across multiple repositories with clustering-based reports.
    • Scheduled weekly and manual checks via CI, automatic issue creation/updates when drift is found.
    • Generates human-readable drift reports and machine-readable manifests as retained artifacts.
  • Documentation

    • Added user-facing docs describing detector behavior, configurable watchlist/normalizers, local usage, exit codes, and interpreting drift reports.
  • Chores

    • Added a default watchlist/policy file to extend monitored paths.

Walkthrough

Adds a semantic configuration drift detector CLI with per-file-type normalizers and clustering, a policy YAML listing repos/paths to watch, README documentation, and a GitHub Actions workflow that runs the detector, uploads artifacts, and opens/updates a config-drift issue when drift is found.

Changes

Configuration Drift Detection System

Layer / File(s) Summary
Module header and dependency guard
scripts/drift_check.py
Module docstring, CLI contract, imports, and runtime PyYAML check that exits with code 2 on failure.
GitHub API file retrieval
scripts/drift_check.py
gh_api and fetch_file use gh api to read repository file contents, map 404→None, decode base64 content, and surface API/parsing errors.
Content normalizers
scripts/drift_check.py
Bytes→string normalizers for raw, lines_set, lines_ordered, JSON, YAML, XML, and INI; NORMALISERS dispatch table and filename/suffix-based normaliser resolution.
Clustering and per-path checks
scripts/drift_check.py
Stable short hashing, cluster_versions, and check_path which fetches per-repo files, normalizes content, groups clusters, records errors/presence, and flags drift when multiple clusters appear.
Report generation and CLI entry
scripts/drift_check.py
write_markdown and write_manifest produce drift-report.md and drift-manifest.json; main() parses policy, runs checks, and exits with 0/1/2 codes.
Policy configuration
scripts/drift-policy.yaml
YAML watchlist declaring repos and watched file paths with comments documenting available normalizers and overrides.
CI workflow
.github/workflows/drift-check.yml
Scheduled (cron) and manual workflow that sets up Python, installs PyYAML, runs scripts/drift_check.py with a policy input, uploads artifacts, and conditionally creates/updates a config-drift issue using gh.
User documentation
README.md
Docs describing semantic vs byte-level drift, normalizers, local usage (gh + pyyaml), exit codes, CI behavior, how to extend the watchlist, and interpreting semantic clusters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: a semantic config-drift detector script plus its CI integration.
Description check ✅ Passed The description is directly related to the changeset, explaining semantic drift detection, normalizers, test results, and CI behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/config-drift-detector
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/config-drift-detector

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 18, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 88 complexity · 0 duplication

Metric Results
Complexity 88
Duplication 0

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

Run reviewer

TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown

@codacy-production codacy-production Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

While this PR introduces a valuable 'semantic' drift detector, there are several critical issues that should prevent merging in its current state. Most notably, the XML normalization logic is vulnerable to XML External Entity (XXE) attacks and incorrectly handles character encoding by manually decoding bytes before parsing.

Additionally, the CI workflow incorrectly flags any script failure (such as authentication or syntax errors) as a 'drift detected' event, which will lead to false positives. The core logic also fails to account for 'partial presence' as a form of drift, potentially missing cases where configuration files are deleted from a subset of repositories. Finally, the script introduces high logic complexity without any unit tests for the various file normalizers, posing a significant maintenance risk.

About this PR

  • The script's heavy reliance on the 'gh' CLI and authenticated environments makes local development and debugging difficult. Furthermore, the error handling in 'gh_api' depends on string-matching '404' within stderr, which is fragile and may break across different environments or CLI versions. Consider using a more robust method for status code verification.

Test suggestions

  • Missing: JSON normalizer ignores key ordering and whitespace differences
  • Missing: YAML normalizer collapses flow style into block style and sorts keys
  • Missing: XML normalizer sorts attributes and ignores insignificant whitespace
  • Missing: INI normalizer handles section/key sorting and comment stripping
  • Missing: Line-based normalizer (.gitignore) dedupes and sorts entries
  • Missing: Clustering logic correctly identifies the majority (canonical) set vs drift variants
  • Missing: CI workflow correctly identifies and updates existing drift issues instead of creating duplicates
  • Missing: Verify XML parser is protected against XXE (XML External Entity) attacks
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Missing: JSON normalizer ignores key ordering and whitespace differences
2. Missing: YAML normalizer collapses flow style into block style and sorts keys
3. Missing: XML normalizer sorts attributes and ignores insignificant whitespace
4. Missing: INI normalizer handles section/key sorting and comment stripping
5. Missing: Line-based normalizer (.gitignore) dedupes and sorts entries
6. Missing: Clustering logic correctly identifies the majority (canonical) set vs drift variants
7. Missing: CI workflow correctly identifies and updates existing drift issues instead of creating duplicates
8. Missing: Verify XML parser is protected against XXE (XML External Entity) attacks

TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback

python scripts/drift_check.py \
--policy "${{ inputs.policy || 'scripts/drift-policy.yaml' }}" \
--output drift-report.md \
--manifest drift-manifest.json || echo "DRIFT_DETECTED=1" >> $GITHUB_ENV
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HIGH RISK

The workflow incorrectly sets DRIFT_DETECTED=1 for any non-zero exit code. Since the script uses exit code 1 for drift and 2+ for system/config errors, this conflates actual drift with infrastructure failures (e.g., GITHUB_TOKEN expiration). Update the 'Run drift detector' step to only set the flag when the exit code is exactly 1.

Comment thread scripts/drift_check.py

def norm_xml(data: bytes) -> str:
"""Parse XML, normalise whitespace and attribute order, dump."""
root = ET.fromstring(data.decode("utf-8"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HIGH RISK

The XML normalization logic has several critical issues:

  1. Security: Using xml.etree.ElementTree makes the script vulnerable to XML External Entity (XXE) attacks. Use defusedxml to safely parse untrusted configuration files.
  2. Encoding: Do not manually decode data to UTF-8. Passing raw bytes to the parser allows it to correctly detect and handle encoding declarations (e.g., UTF-16) specified in the XML prolog.

Recommended fix: Use defusedxml.ElementTree.fromstring(data).

Comment thread scripts/drift_check.py
"absent_in": [r for r in repos if r not in repo_versions and r not in repo_errors],
"clusters": clusters,
"errors": repo_errors,
"drift": len(clusters) > 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

A file is currently considered 'Clean' if all repositories that contain it share the same content. However, if a file exists in only a subset of repositories (e.g., 1 out of 16), this 'partial presence' should also be flagged as drift in a consistency tool. Consider updating the logic: "drift": len(clusters) > 1 or (0 < len(repo_versions) < len(repos)).

Comment thread scripts/drift_check.py
@@ -0,0 +1,386 @@
#!/usr/bin/env python3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Suggestion: This script introduces high logic complexity (83) for semantic normalization across JSON, YAML, XML, and INI, yet has 0% test coverage. Given the risk of regressions in drift detection logic, comprehensive unit tests for the NORMALISERS are required.

See Complexity in Codacy
See Coverage in Codacy

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/drift-check.yml:
- Around line 31-42: Replace the blanket "|| echo DRIFT_DETECTED=1" with
explicit exit-code handling: run python scripts/drift_check.py (the detector),
capture its exit code, and if the code is exactly 1 write "DRIFT_DETECTED=1" to
$GITHUB_ENV; if the code is 2 re-exit with code 2 so the job fails (propagate
failures), otherwise exit/continue normally for code 0. Target the Run drift
detector step and the invocation of drift_check.py, and ensure you reference
DRIFT_DETECTED and $GITHUB_ENV when setting the environment variable.

In `@README.md`:
- Line 171: Update the README entry for the `raw` normalizer to accurately
reflect `norm_raw`'s behavior: state that `norm_raw` uses Python's `.strip()` to
remove both leading and trailing whitespace (not just trailing), and adjust the
table cell text from "Trailing whitespace" to something like "Removes leading
and trailing whitespace" so docs match the actual `norm_raw` implementation.

In `@scripts/drift_check.py`:
- Around line 244-249: In check_path(), instead of returning an error dict when
norm_func is falsy (unknown normaliser_name), raise a specific exception (e.g.,
ValueError or custom ConfigError) that includes the path and normaliser_name;
then update main() to catch that exception and call sys.exit(2) for
configuration/authentication errors, ensuring other drift logic and exit codes
remain unchanged and references to check_path(), main(), norm_func, and
normaliser_name are used to locate the changes.
- Around line 262-268: The early return when repo_versions is empty should
include the same keys as the normal return: add "errors": repo_errors (or [] if
None) and "drift": [] alongside "path", "normalizer"/normaliser_name,
"present_in": 0, and "clusters" so parse-only failures aren't dropped; update
the block that builds the result for the repo in the function handling
repo_versions to return these fields (reference repo_versions, repo_errors,
normaliser_name, path). Also change the downstream reporting filter that builds
missing files (the list comprehension using findings and f.get("present_in", 0)
== 0) to distinguish truly missing files from parse-failed files by checking the
"errors" field (e.g., treat a file as missing only if present_in == 0 and not
f.get("errors")), so errored-but-present paths are reported separately.
- Around line 51-54: The subprocess.run call that invokes "gh api" (the res =
subprocess.run([...]) call in scripts/drift_check.py) needs a timeout to avoid
indefinite hangs; add a timeout argument (e.g., timeout=30 or a defined
GH_API_TIMEOUT constant) to the subprocess.run invocation and wrap the call in a
try/except catching subprocess.TimeoutExpired, and on timeout exit the process
with exit code 2 (configuration/tool error), optionally logging the timeout and
command context before exiting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: a4b500e5-f009-46bd-a251-fcde26d14ad5

📥 Commits

Reviewing files that changed from the base of the PR and between 86d0a5d and 75ee30b.

📒 Files selected for processing (4)
  • .github/workflows/drift-check.yml
  • README.md
  • scripts/drift-policy.yaml
  • scripts/drift_check.py
📜 Review details
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis

Comment on lines +31 to +42
- name: Run drift detector
id: drift
env:
GH_TOKEN: ${{ secrets.DRIFT_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
run: |
# The script reads gh CLI which is preinstalled on ubuntu-latest.
# GH_TOKEN above gives it auth.
python scripts/drift_check.py \
--policy "${{ inputs.policy || 'scripts/drift-policy.yaml' }}" \
--output drift-report.md \
--manifest drift-manifest.json || echo "DRIFT_DETECTED=1" >> $GITHUB_ENV
echo "----- report start -----"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# Locate drift_check.py
find . -name "drift_check.py" -o -name "*drift*" -type f | head -20

Repository: ANcpLua/github-settings-automation

Length of output: 167


🏁 Script executed:

# Look for the script
git ls-files | grep -i drift

Repository: ANcpLua/github-settings-automation

Length of output: 161


🏁 Script executed:

cat -n scripts/drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 15777


Differentiate detector exit code 1 (drift) from 2 (failure).

The || echo "DRIFT_DETECTED=1" handler maps every non-zero exit to drift, including auth/config/script failures. The detector explicitly returns exit code 2 for configuration or authentication errors (documented in the script's docstring), but the workflow conflates this with drift detection. This causes real breakage to be silently misclassified as drift and allows the workflow to continue instead of failing.

Capture the exit code and only set DRIFT_DETECTED=1 when the exit code is exactly 1; propagate exit code 2 to fail the workflow.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/drift-check.yml around lines 31 - 42, Replace the blanket
"|| echo DRIFT_DETECTED=1" with explicit exit-code handling: run python
scripts/drift_check.py (the detector), capture its exit code, and if the code is
exactly 1 write "DRIFT_DETECTED=1" to $GITHUB_ENV; if the code is 2 re-exit with
code 2 so the job fails (propagate failures), otherwise exit/continue normally
for code 0. Target the Run drift detector step and the invocation of
drift_check.py, and ensure you reference DRIFT_DETECTED and $GITHUB_ENV when
setting the environment variable.

Comment thread README.md
| `*.xml`, `*.props`, `*.targets`, `nuget.config`, `Directory.Build.props` | `xml` | Insignificant whitespace, attribute order |
| `.editorconfig`, `.gitmodules`, `.npmrc`, `.globalconfig` | `ini` | Comments, section ordering, key ordering |
| `.gitignore`, `.dockerignore`, `.gitattributes`, `.markdownlintignore` | `lines` | Comments, blank lines, duplicate lines, ordering |
| `LICENSE`, `build.sh`, `build.cmd` | `raw` | Trailing whitespace |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align raw normalizer docs with actual behavior.

norm_raw uses .strip(), so it drops leading and trailing whitespace, not just trailing whitespace. Document this precisely to avoid report misinterpretation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 171, Update the README entry for the `raw` normalizer to
accurately reflect `norm_raw`'s behavior: state that `norm_raw` uses Python's
`.strip()` to remove both leading and trailing whitespace (not just trailing),
and adjust the table cell text from "Trailing whitespace" to something like
"Removes leading and trailing whitespace" so docs match the actual `norm_raw`
implementation.

Comment thread scripts/drift_check.py
Comment on lines +51 to +54
res = subprocess.run(
["gh", "api", endpoint],
capture_output=True, text=True
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

fd -t f "drift_check.py" | head -5

Repository: ANcpLua/github-settings-automation

Length of output: 101


🏁 Script executed:

wc -l scripts/drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 105


🏁 Script executed:

sed -n '45,65p' scripts/drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 883


🏁 Script executed:

# Check the entire gh_api function and exit code patterns
sed -n '40,100p' scripts/drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 2241


🏁 Script executed:

# Check exit code usage in main function
rg "sys.exit\(" scripts/drift_check.py -B 2 -A 1

Repository: ANcpLua/github-settings-automation

Length of output: 896


🏁 Script executed:

# Check for other subprocess.run calls in the file
rg "subprocess.run" scripts/drift_check.py -B 2 -A 2

Repository: ANcpLua/github-settings-automation

Length of output: 299


🏁 Script executed:

# Double-check for all subprocess usage
rg "subprocess\." scripts/drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 104


🏁 Script executed:

# Verify the exit code pattern is consistent
grep -n "sys.exit" scripts/drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 237


🏁 Script executed:

# Check imports to confirm subprocess is imported
head -20 scripts/drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 774


Add timeout to gh api calls to prevent workflow stalls.

The subprocess.run() call lacks a timeout, allowing indefinite hangs that stall scheduled CI. Timeout should exit with code 2 (configuration/tool error).

Proposed fix
 def gh_api(endpoint: str) -> dict | None:
     """Call `gh api`, return parsed JSON. None on 404. Exits on other errors."""
-    res = subprocess.run(
-        ["gh", "api", endpoint],
-        capture_output=True, text=True
-    )
+    try:
+        res = subprocess.run(
+            ["gh", "api", endpoint],
+            capture_output=True,
+            text=True,
+            timeout=30,
+        )
+    except subprocess.TimeoutExpired:
+        print(f"gh api {endpoint} timed out", file=sys.stderr)
+        sys.exit(2)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drift_check.py` around lines 51 - 54, The subprocess.run call that
invokes "gh api" (the res = subprocess.run([...]) call in
scripts/drift_check.py) needs a timeout to avoid indefinite hangs; add a timeout
argument (e.g., timeout=30 or a defined GH_API_TIMEOUT constant) to the
subprocess.run invocation and wrap the call in a try/except catching
subprocess.TimeoutExpired, and on timeout exit the process with exit code 2
(configuration/tool error), optionally logging the timeout and command context
before exiting.

Comment thread scripts/drift_check.py
Comment on lines +244 to +249
if not norm_func:
return {
"path": path,
"normalizer": normaliser_name,
"error": f"unknown normaliser: {normaliser_name}",
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd scripts && wc -l drift_check.py && head -50 drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 1583


🏁 Script executed:

cd scripts && sed -n '240,255p' drift_check.py && echo "---" && sed -n '370,385p' drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 1240


🏁 Script executed:

cd scripts && grep -n "def main" drift_check.py && grep -n "sys.exit" drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 261


🏁 Script executed:

cd scripts && cat -n drift_check.py | grep -A 30 "def main"

Repository: ANcpLua/github-settings-automation

Length of output: 1540


🏁 Script executed:

cd scripts && sed -n '360,390p' drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 1039


🏁 Script executed:

cd scripts && grep -B 5 -A 5 "unknown normaliser" drift_check.py

Repository: ANcpLua/github-settings-automation

Length of output: 408


🏁 Script executed:

cd scripts && sed -n '1,50p' drift_check.py | grep -A 10 "Exit codes"

Repository: ANcpLua/github-settings-automation

Length of output: 351


Unknown normalizer must exit with code 2 for configuration error, not code 0/1 based on drift.

The documented exit-code contract specifies code 2 for "configuration or authentication error". Returning an error dict allows the program to exit based solely on drift_count, ignoring configuration failures. Raise the exception in check_path() and catch it in main() with sys.exit(2).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drift_check.py` around lines 244 - 249, In check_path(), instead of
returning an error dict when norm_func is falsy (unknown normaliser_name), raise
a specific exception (e.g., ValueError or custom ConfigError) that includes the
path and normaliser_name; then update main() to catch that exception and call
sys.exit(2) for configuration/authentication errors, ensuring other drift logic
and exit codes remain unchanged and references to check_path(), main(),
norm_func, and normaliser_name are used to locate the changes.

Comment thread scripts/drift_check.py
Comment on lines +262 to +268
if not repo_versions:
return {
"path": path,
"normalizer": normaliser_name,
"present_in": 0,
"clusters": [],
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

❓ Verification inconclusive

Script executed:

wc -l scripts/drift_check.py

Repository: ANcpLua/github-settings-automation


Repository: ANcpLua/github-settings-automation
Exit code: 0

stdout:

386 scripts/drift_check.py


Script executed:

head -n 330 scripts/drift_check.py | tail -n 80

Repository: ANcpLua/github-settings-automation


Repository: ANcpLua/github-settings-automation
Exit code: 0

stdout:

    repo_versions: dict[str, str] = {}
    repo_errors: dict[str, str] = {}
    for repo in repos:
        data = fetch_file(repo, path)
        if data is None:
            continue
        try:
            repo_versions[repo] = norm_func(data)
        except Exception as e:
            repo_errors[repo] = f"{type(e).__name__}: {e}"

    if not repo_versions:
        return {
            "path": path,
            "normalizer": normaliser_name,
            "present_in": 0,
            "clusters": [],
        }

    clusters = cluster_versions(repo_versions)
    return {
        "path": path,
        "normalizer": normaliser_name,
        "present_in": len(repo_versions),
        "absent_in": [r for r in repos if r not in repo_versions and r not in repo_errors],
        "clusters": clusters,
        "errors": repo_errors,
        "drift": len(clusters) > 1,
    }


# ---------------------------------------------------------------------------
# Output renderers
# ---------------------------------------------------------------------------

def write_markdown(findings: list[dict], path: str) -> int:
    """Write markdown report. Return drift count."""
    lines: list[str] = ["# Config Drift Report\n"]
    total = len(findings)
    drift = [f for f in findings if f.get("drift")]
    clean = [f for f in findings if not f.get("drift") and f.get("present_in", 0) > 0]
    missing = [f for f in findings if f.get("present_in", 0) == 0]

    lines.append(f"- Paths checked: **{total}**")
    lines.append(f"- Drifted: **{len(drift)}**")
    lines.append(f"- Clean (semantic 1-cluster): **{len(clean)}**")
    lines.append(f"- Not present in any repo: **{len(missing)}**\n")

    if drift:
        lines.append("## Drifted paths\n")
        for f in drift:
            lines.append(f"### `{f['path']}` (normaliser: `{f['normalizer']}`)")
            lines.append(f"_{f['present_in']} repos, {len(f['clusters'])} semantic clusters_\n")
            for i, c in enumerate(f["clusters"]):
                marker = "**canonical** (majority)" if i == 0 else f"drift #{i}"
                short_repos = ", ".join(r.split("/")[-1] for r in c["repos"])
                lines.append(f"- {marker} `{c['hash']}` ({c['size']}× ): {short_repos}")
            if f.get("errors"):
                lines.append("\n_Normalisation errors:_")
                for repo, err in f["errors"].items():
                    lines.append(f"  - `{repo.split('/')[-1]}`: {err}")
            lines.append("")

    if clean:
        lines.append("## Clean paths\n")
        lines.append("| Path | Repos | Hash |")
        lines.append("|---|---|---|")
        for f in clean:
            h = f["clusters"][0]["hash"] if f["clusters"] else "-"
            lines.append(f"| `{f['path']}` | {f['present_in']} | `{h}` |")
        lines.append("")

    if missing:
        lines.append("## Watched but absent everywhere\n")
        for f in missing:
            lines.append(f"- `{f['path']}`")
        lines.append("")

    Path(path).write_text("\n".join(lines) + "\n")
    return len(drift)


Script executed:

cat -n scripts/drift_check.py | sed -n '250,330p'

Repository: ANcpLua/github-settings-automation


Repository: ANcpLua/github-settings-automation
Exit code: 0

stdout:

   250	
   251	    repo_versions: dict[str, str] = {}
   252	    repo_errors: dict[str, str] = {}
   253	    for repo in repos:
   254	        data = fetch_file(repo, path)
   255	        if data is None:
   256	            continue
   257	        try:
   258	            repo_versions[repo] = norm_func(data)
   259	        except Exception as e:
   260	            repo_errors[repo] = f"{type(e).__name__}: {e}"
   261	
   262	    if not repo_versions:
   263	        return {
   264	            "path": path,
   265	            "normalizer": normaliser_name,
   266	            "present_in": 0,
   267	            "clusters": [],
   268	        }
   269	
   270	    clusters = cluster_versions(repo_versions)
   271	    return {
   272	        "path": path,
   273	        "normalizer": normaliser_name,
   274	        "present_in": len(repo_versions),
   275	        "absent_in": [r for r in repos if r not in repo_versions and r not in repo_errors],
   276	        "clusters": clusters,
   277	        "errors": repo_errors,
   278	        "drift": len(clusters) > 1,
   279	    }
   280	
   281	
   282	# ---------------------------------------------------------------------------
   283	# Output renderers
   284	# ---------------------------------------------------------------------------
   285	
   286	def write_markdown(findings: list[dict], path: str) -> int:
   287	    """Write markdown report. Return drift count."""
   288	    lines: list[str] = ["# Config Drift Report\n"]
   289	    total = len(findings)
   290	    drift = [f for f in findings if f.get("drift")]
   291	    clean = [f for f in findings if not f.get("drift") and f.get("present_in", 0) > 0]
   292	    missing = [f for f in findings if f.get("present_in", 0) == 0]
   293	
   294	    lines.append(f"- Paths checked: **{total}**")
   295	    lines.append(f"- Drifted: **{len(drift)}**")
   296	    lines.append(f"- Clean (semantic 1-cluster): **{len(clean)}**")
   297	    lines.append(f"- Not present in any repo: **{len(missing)}**\n")
   298	
   299	    if drift:
   300	        lines.append("## Drifted paths\n")
   301	        for f in drift:
   302	            lines.append(f"### `{f['path']}` (normaliser: `{f['normalizer']}`)")
   303	            lines.append(f"_{f['present_in']} repos, {len(f['clusters'])} semantic clusters_\n")
   304	            for i, c in enumerate(f["clusters"]):
   305	                marker = "**canonical** (majority)" if i == 0 else f"drift #{i}"
   306	                short_repos = ", ".join(r.split("/")[-1] for r in c["repos"])
   307	                lines.append(f"- {marker} `{c['hash']}` ({c['size']}× ): {short_repos}")
   308	            if f.get("errors"):
   309	                lines.append("\n_Normalisation errors:_")
   310	                for repo, err in f["errors"].items():
   311	                    lines.append(f"  - `{repo.split('/')[-1]}`: {err}")
   312	            lines.append("")
   313	
   314	    if clean:
   315	        lines.append("## Clean paths\n")
   316	        lines.append("| Path | Repos | Hash |")
   317	        lines.append("|---|---|---|")
   318	        for f in clean:
   319	            h = f["clusters"][0]["hash"] if f["clusters"] else "-"
   320	            lines.append(f"| `{f['path']}` | {f['present_in']} | `{h}` |")
   321	        lines.append("")
   322	
   323	    if missing:
   324	        lines.append("## Watched but absent everywhere\n")
   325	        for f in missing:
   326	            lines.append(f"- `{f['path']}`")
   327	        lines.append("")
   328	
   329	    Path(path).write_text("\n".join(lines) + "\n")
   330	    return len(drift)


Include errors and drift in early return; separately report paths with parse-only failures.

When files exist but all normalizations fail, repo_errors is discarded at the early return (lines 262–268) and the path gets reported as "absent everywhere" in the markdown. This masks parse problems that drift detection should surface.

The normal return path (lines 271–279) correctly includes both fields; the early return must match. Additionally, the reporting logic (line 292) conflates truly missing paths with errored ones—missing = [f for f in findings if f.get("present_in", 0) == 0] catches both, so parse-only failures disappear from view.

Proposed fix
     if not repo_versions:
         return {
             "path": path,
             "normalizer": normaliser_name,
             "present_in": 0,
             "clusters": [],
+            "errors": repo_errors,
+            "drift": False,
         }
-    missing = [f for f in findings if f.get("present_in", 0) == 0]
+    missing = [f for f in findings if f.get("present_in", 0) == 0 and not f.get("errors")]
+    errored = [f for f in findings if f.get("errors")]
@@
+    if errored:
+        lines.append("## Paths with normalization errors\n")
+        for f in errored:
+            lines.append(f"### `{f['path']}` (normaliser: `{f['normalizer']}`)")
+            for repo, err in f["errors"].items():
+                lines.append(f"- `{repo.split('/')[-1]}`: {err}")
+            lines.append("")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not repo_versions:
return {
"path": path,
"normalizer": normaliser_name,
"present_in": 0,
"clusters": [],
}
if not repo_versions:
return {
"path": path,
"normalizer": normaliser_name,
"present_in": 0,
"clusters": [],
"errors": repo_errors,
"drift": False,
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drift_check.py` around lines 262 - 268, The early return when
repo_versions is empty should include the same keys as the normal return: add
"errors": repo_errors (or [] if None) and "drift": [] alongside "path",
"normalizer"/normaliser_name, "present_in": 0, and "clusters" so parse-only
failures aren't dropped; update the block that builds the result for the repo in
the function handling repo_versions to return these fields (reference
repo_versions, repo_errors, normaliser_name, path). Also change the downstream
reporting filter that builds missing files (the list comprehension using
findings and f.get("present_in", 0) == 0) to distinguish truly missing files
from parse-failed files by checking the "errors" field (e.g., treat a file as
missing only if present_in == 0 and not f.get("errors")), so errored-but-present
paths are reported separately.

Git's gitignore and gitattributes semantics give the LAST matching
pattern precedence within one level. Sorting + set-dedupe (the previous
"lines" normaliser) destroyed that order, so two semantically
different files could hash to the same cluster and silently pass.

This commit:
- Adds norm_lines_set (was norm_lines_sorted) — sort + dedupe, valid
  only when the consumer treats the file as an unordered set.
- Adds norm_lines_ordered — preserves order, collapses only adjacent
  duplicates. Required for .gitignore and .gitattributes.
- Keeps "lines" as an alias that resolves to ordered (fail-safe
  default for any future entry).
- Switches .gitignore, .gitattributes, .dockerignore, .markdownlintignore
  to lines_ordered in DEFAULT_NORMALISERS.

Verified locally: .gitattributes now collapses to 1 cluster across all
5 repos that have it (semantically identical with comment / whitespace
noise removed); .gitignore stays at 16/16 (genuine content drift);
.dockerignore stays at 2/2 (genuine content drift).

Credit: caught by parallel agent reviewing PR #1.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/drift_check.py`:
- Around line 297-300: When norm_func(data) raises, record the error as now but
also treat it as a fatal detection failure: in the except block for the
repo_versions assignment add a failure marker (e.g., set
repo_versions[repo]=None and increment drift_count or set a has_error flag) and
ensure the final exit path checks repo_errors/has_error and calls sys.exit(1) so
normalization failures produce a non-zero exit; apply the same change to the
analogous try/except at the block around lines 413-422 (same handling for
norm_func exceptions).
- Around line 74-77: The current check treats empty-file content ("") as missing
because it uses "if not content:"; change the logic to only treat missing
content when result.get("content") is None. In the block using
result.get("content") (the local variable content) and returning
base64.b64decode(content), replace the falsy check with an explicit None check
so that an empty string decodes to b"" instead of returning None.
- Around line 397-411: Validate that the loaded policy is a mapping and contains
non-empty "repos" and "watch" lists before dereferencing: after policy =
yaml.safe_load(...), check isinstance(policy, dict) and set repos =
policy.get("repos") or []; watch = policy.get("watch") or []; if not
isinstance(policy, dict) or not repos or not watch then print a clear error to
stderr and sys.exit(2). Also, when iterating watch before calling
check_path(repos, entry), ensure each entry is either a string or a mapping with
a "path" key (reject/massage other types with a helpful error and exit 2); keep
the existing args.quiet logging behavior unchanged.
- Around line 172-188: The current ini normalizer function norm_ini in
scripts/drift_check.py flattens sections and sorts keys, which breaks
.editorconfig's order-dependent override semantics; update the normalization
mapping so files named ".editorconfig" use the raw normalizer (preserving
original line order) instead of norm_ini, leaving norm_ini unchanged for true
INI-like files (e.g., .gitmodules, .npmrc); locate the mapping that assigns
normalizers to file patterns and replace the association for ".editorconfig" to
use the raw/plain normalizer so later-section overrides remain intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 52da6590-0196-43fc-b4f9-9eb7edd20554

📥 Commits

Reviewing files that changed from the base of the PR and between 75ee30b and da3d395.

📒 Files selected for processing (1)
  • scripts/drift_check.py
📜 Review details
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis

Comment thread scripts/drift_check.py
Comment on lines +74 to +77
content = result.get("content")
if not content:
return None
return base64.b64decode(content)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't classify zero-byte files as missing.

An existing empty file comes back with content == "", so if not content: drops it as absent. That will hide drift for legitimately empty watched files.

Proposed fix
     content = result.get("content")
-    if not content:
+    if content is None:
         return None
     return base64.b64decode(content)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drift_check.py` around lines 74 - 77, The current check treats
empty-file content ("") as missing because it uses "if not content:"; change the
logic to only treat missing content when result.get("content") is None. In the
block using result.get("content") (the local variable content) and returning
base64.b64decode(content), replace the falsy check with an explicit None check
so that an empty string decodes to b"" instead of returning None.

Comment thread scripts/drift_check.py
Comment on lines +172 to +188
def norm_ini(data: bytes) -> str:
"""INI-like files (.editorconfig, .gitmodules, .npmrc, .globalconfig):
parse sections + key=value pairs, sort, dump."""
text = data.decode("utf-8", errors="replace")
entries: dict[str, str] = {}
section = ""
for line in text.splitlines():
s = line.strip()
if not s or s.startswith("#") or s.startswith(";"):
continue
if s.startswith("[") and s.endswith("]"):
section = s[1:-1].strip()
continue
if "=" in s:
k, v = s.split("=", 1)
entries[f"{section}/{k.strip()}"] = v.strip()
return "\n".join(f"{k}={v}" for k, v in sorted(entries.items()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In the EditorConfig specification, when multiple sections in one .editorconfig file match the same file, do properties from later sections override earlier sections?

💡 Result:

Yes, in the EditorConfig specification, properties from later sections override properties from earlier sections when multiple sections within the same.editorconfig file match the same file [1][2]. According to the official EditorConfig specification, files are read from top to bottom, and the most recent pairs found take precedence [1][3]. Consequently, if a file matches multiple sections within a single.editorconfig file, the properties defined in the section that appears later in the file will override those defined in earlier sections [1][2]. When multiple.editorconfig files are involved, the search process reads files from the directory of the target file up to the root directory [1][4]. Because the rules from closer EditorConfig files are read last, properties in closer files take precedence over those in files located further up the directory tree [1][5].

Citations:


🏁 Script executed:

# Find the DEFAULT_NORMALISERS mapping and verify the .editorconfig line
rg "DEFAULT_NORMALISERS|\.editorconfig.*ini" scripts/drift_check.py -A 5 -B 5

Repository: ANcpLua/github-settings-automation

Length of output: 948


.editorconfig mapping to ini normalizer creates false semantic equivalences.

.editorconfig files are order-dependent per the official spec: later matching sections override earlier ones. norm_ini() parses sections and keys into a sorted dict, discarding the original order. Two files with different override precedence will produce identical normalized output, creating false positives in drift detection.

Example: [*]→indent=space followed by [*.py]→indent=tab produces the same normalized form as the reversed order, despite opposite final behavior for Python files.

Switch .editorconfig to raw normalizer to preserve order-dependent semantics.

Proposed fix
-    ".editorconfig": "ini",
+    ".editorconfig": "raw",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drift_check.py` around lines 172 - 188, The current ini normalizer
function norm_ini in scripts/drift_check.py flattens sections and sorts keys,
which breaks .editorconfig's order-dependent override semantics; update the
normalization mapping so files named ".editorconfig" use the raw normalizer
(preserving original line order) instead of norm_ini, leaving norm_ini unchanged
for true INI-like files (e.g., .gitmodules, .npmrc); locate the mapping that
assigns normalizers to file patterns and replace the association for
".editorconfig" to use the raw/plain normalizer so later-section overrides
remain intact.

Comment thread scripts/drift_check.py
Comment on lines +297 to +300
try:
repo_versions[repo] = norm_func(data)
except Exception as e:
repo_errors[repo] = f"{type(e).__name__}: {e}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalization failures never affect the final exit code.

repo_errors is collected, but the process still exits solely from drift_count. A malformed watched file should fail the detector, not quietly become a 0/1 result with incomplete clustering.

Proposed fix
     drift_count = write_markdown(findings, args.output)
     write_manifest(findings, args.manifest)
+    had_errors = any(f.get("error") or f.get("errors") for f in findings)
@@
-    sys.exit(1 if drift_count > 0 else 0)
+    if had_errors:
+        sys.exit(2)
+    sys.exit(1 if drift_count > 0 else 0)

Also applies to: 413-422

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drift_check.py` around lines 297 - 300, When norm_func(data) raises,
record the error as now but also treat it as a fatal detection failure: in the
except block for the repo_versions assignment add a failure marker (e.g., set
repo_versions[repo]=None and increment drift_count or set a has_error flag) and
ensure the final exit path checks repo_errors/has_error and calls sys.exit(1) so
normalization failures produce a non-zero exit; apply the same change to the
analogous try/except at the block around lines 413-422 (same handling for
norm_func exceptions).

Comment thread scripts/drift_check.py
Comment on lines +397 to +411
policy = yaml.safe_load(policy_path.read_text())

repos = policy.get("repos") or []
watch = policy.get("watch") or []
if not repos or not watch:
print("policy must define non-empty 'repos' and 'watch' lists", file=sys.stderr)
sys.exit(2)

findings: list[dict] = []
for entry in watch:
if isinstance(entry, str):
entry = {"path": entry}
if not args.quiet:
print(f"checking {entry['path']} across {len(repos)} repos...", file=sys.stderr)
findings.append(check_path(repos, entry))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate the policy structure before dereferencing it.

A malformed policy, a non-mapping top level, or a watch entry without path will traceback here and exit 1. This is config input; fail fast with a clear message and exit 2 instead of indexing straight into it.

Proposed fix
-    policy = yaml.safe_load(policy_path.read_text())
+    try:
+        policy = yaml.safe_load(policy_path.read_text())
+    except Exception as e:
+        print(f"could not parse policy file {args.policy}: {e}", file=sys.stderr)
+        sys.exit(2)
+    if not isinstance(policy, dict):
+        print("policy must be a mapping with 'repos' and 'watch' keys", file=sys.stderr)
+        sys.exit(2)
@@
     for entry in watch:
         if isinstance(entry, str):
             entry = {"path": entry}
+        if not isinstance(entry, dict) or not entry.get("path"):
+            print("each watch entry must be a string or mapping with a non-empty 'path'", file=sys.stderr)
+            sys.exit(2)
         if not args.quiet:
             print(f"checking {entry['path']} across {len(repos)} repos...", file=sys.stderr)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/drift_check.py` around lines 397 - 411, Validate that the loaded
policy is a mapping and contains non-empty "repos" and "watch" lists before
dereferencing: after policy = yaml.safe_load(...), check isinstance(policy,
dict) and set repos = policy.get("repos") or []; watch = policy.get("watch") or
[]; if not isinstance(policy, dict) or not repos or not watch then print a clear
error to stderr and sys.exit(2). Also, when iterating watch before calling
check_path(repos, entry), ensure each entry is either a string or a mapping with
a "path" key (reject/massage other types with a helpful error and exit 2); keep
the existing args.quiet logging behavior unchanged.

@ANcpLua
Copy link
Copy Markdown
Owner Author

ANcpLua commented May 18, 2026

@claude[agent] Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @scripts/drift_check.py around lines 397 - 411, Validate that the loaded
policy is a mapping and contains non-empty "repos" and "watch" lists before
dereferencing: after policy = yaml.safe_load(...), check isinstance(policy,
dict) and set repos = policy.get("repos") or []; watch = policy.get("watch") or
[]; if not isinstance(policy, dict) or not repos or not watch then print a clear
error to stderr and sys.exit(2). Also, when iterating watch before calling
check_path(repos, entry), ensure each entry is either a string or a mapping with
a "path" key (reject/massage other types with a helpful error and exit 2); keep
the existing args.quiet logging behavior unchanged.

@ANcpLua
Copy link
Copy Markdown
Owner Author

ANcpLua commented May 18, 2026

@codex[agent] Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @scripts/drift_check.py around lines 397 - 411, Validate that the loaded
policy is a mapping and contains non-empty "repos" and "watch" lists before
dereferencing: after policy = yaml.safe_load(...), check isinstance(policy,
dict) and set repos = policy.get("repos") or []; watch = policy.get("watch") or
[]; if not isinstance(policy, dict) or not repos or not watch then print a clear
error to stderr and sys.exit(2). Also, when iterating watch before calling
check_path(repos, entry), ensure each entry is either a string or a mapping with
a "path" key (reject/massage other types with a helpful error and exit 2); keep
the existing args.quiet logging behavior unchanged.

Copilot stopped work on behalf of ANcpLua due to an error May 18, 2026 06:33
@ANcpLua ANcpLua merged commit 4610af1 into main May 18, 2026
3 of 4 checks passed
@ANcpLua ANcpLua deleted the feat/config-drift-detector branch May 18, 2026 06:49
Repository owner deleted a comment from Codex AI May 18, 2026
Repository owner deleted a comment from Codex AI May 18, 2026
Repository owner deleted a comment from Codex AI May 18, 2026
Repository owner deleted a comment from Codex AI May 18, 2026
Repository owner deleted a comment from Codex AI May 18, 2026
Repository owner deleted a comment from Codex AI May 18, 2026
ANcpLua added a commit that referenced this pull request May 18, 2026
Three small upgrades from review feedback on PR #1:

1. Added branch_protection() + branch_rulesets() + is_default_branch_writable()
   helpers. Lets a future --remediate path detect 'classic protection OR
   active ruleset on default branch' BEFORE attempting writes, so we choose
   PUT-direct vs branch-then-PR upfront. Read-only audit path is unchanged.

2. Documented why fetch_file uses Contents API + base64-decode (not
   raw.githubusercontent.com): CDN cache lags merges ~5min and lies about
   post-merge state. Already implemented; now the docstring says why.

3. Hardened policy YAML loading per Codex CR review:
   - Reject non-mapping top-level YAML (was implicit AttributeError later).
   - Reject 'repos' or 'watch' not being lists.
   - Normalise watch entries upfront — accept str or {path: str, ...},
     reject anything else with a clear pointer to the offending index.
   No behavioural change for well-formed policy files; bad ones now fail
   fast with a useful message instead of a downstream traceback.
ANcpLua added a commit that referenced this pull request May 18, 2026
Adds templates/triage-bot.yml — exact copy of the validated workflow
from ANcpLua/triage-bot-playground PR #1 + #2 (2026-05-18) with a
documentation header explaining decision rules, prerequisites, and
the gap it closes.

Decision rules (v1, deterministic):
  body contains [triage:auto-resolve]    → dismiss
  body matches /^(nit:|nitpick:|praise:)/i → dismiss
  author is Copilot, body < 200 chars     → dismiss
  anything else                           → needs-human (default)

The gap it closes: GitHub native auto-merge fires the moment CI goes
green, even if Copilot / CodeRabbit later post review comments. With
branch protection or a ruleset enforcing
required_conversation_resolution=true, any unresolved thread keeps
the PR mergeStateStatus=BLOCKED until explicitly resolved. The bot
auto-resolves the non-blocking threads (so they don't accumulate as
noise) while leaving real concerns to block the merge until a human
acts.

This is the structural fix that makes the pr-heal.yml cooldown
(commit 22d0b28) belt-and-suspenders rather than the primary line of
defence — the cooldown buys time for reviewers, the native gate
deterministically waits for explicit resolution.

enforce-repo-settings.yml extended (both enforce-user and enforce-org
jobs): seed-if-missing pattern. Skip-if-exists because adopting the
workflow requires a per-repo TRIAGE_PAT secret and the
required_conversation_resolution gate on the default branch; this
sweep can put the file in place but can't safely flip either.

Gotcha (in the workflow's docstring): GITHUB_TOKEN cannot
resolveReviewThread on threads it didn't author — needs a classic
PAT or GitHub App installation token with pull_requests: write.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant