refactor: migrate PyDeequ Bot to the shared Shadow engine by sudsali · Pull Request #277 · awslabs/python-deequ

sudsali · 2026-07-05T15:40:03Z

Summary

Migrates the PyDeequ Bot from a vendored copy of the engine (scripts/issue_bot/) to consuming the shared Shadow engine (sudsali/shadow) via its reusable workflow, SHA-pinned to 54ec94e. One engine maintained upstream instead of a per-repo fork, matching what awslabs/deequ now does. Net diff removes ~2.6k lines of vendored code + tests.

What Changes

Replaces issue-bot.yml with a thin caller of sudsali/shadow/.github/workflows/shadow-review.yml (all four triggers preserved: issues, issue_comment, pull_request_target, workflow_dispatch)
Adds .shadow.yml — engine config (Python codebase, bot identity, attribution footer)
Deletes the vendored engine (scripts/issue_bot/) and its tests (tests/test_bot.py); keeps scripts/generate_kb.py
Drops the now-dead tests/test_bot.py references left in update-kb.yml (paths-ignore) and base.yml (pytest --ignore)

Behavior Change: PR review moves to the 3-agent pipeline

The vendored bot ran the legacy single-pass file-review path. The shared engine runs the 3-agent Investigator → Critic → Reporter pipeline (the Critic disproves the Investigator's findings before anything is posted, cutting false positives). Issue triage, issue-respond, and follow-up behavior are unchanged.

Behavior Preserved

Prompts — PyDeequ's language-tuned prompts stay in Secrets Manager (pydeequ-bot/*), selected via prompt_sm_prefix: pydeequ-bot; the engine fetches all eight (PR investigator/critic/reporter + 2 commit nudges + issue-classify/respond/followup) in place of its bundled defaults. No new secrets.
Auto-approve — clean reviews still carry ; the bot's marker identity is deliberately kept as deequ-bot so auto-approve.yml needs no change.
Surfaces — PR review, issue triage, issue-respond, and follow-up all run from one engine.

Architecture

GitHub Event → issue-bot.yml (caller) → sudsali/shadow reusable workflow
                ├── Job 1: analyze (id-token, read-only)
                │     ├── Resolve prompts from Secrets Manager (pydeequ-bot/*)
                │     ├── 3-agent pipeline: Investigator → Critic → Reporter
                │     └── Write JSON artifact
                └── Job 2: act (issues/pull-requests write, no AWS)
                      ├── Post review pinned to analyzed commit (commit_id)
                      ├── Add labels / escalate to Slack
                      └── Footer: "Reviewed by Shadow"

Security

OIDC pinning — no IAM change to merge (caller's sub stays repo:awslabs/python-deequ:*); an optional job_workflow_ref trust narrowing may be applied after merge (applying it before breaks the base-branch vendored bot)
SHA-pinned engine — uses: and shadow_ref both pin the exact commit; a moved upstream tag can't change what runs
Two-job split — read-only analyze (AWS creds, no GitHub write), write-only act (GitHub write, no AWS creds), preserved from the reusable workflow
Attribution footer — self-scrubbed (injection markers dropped, @/# auto-link neutralized, marker-spoof-proof)

Files

File	Change
`.github/workflows/issue-bot.yml`	Replaced with reusable-workflow caller
`.shadow.yml`	Added — engine config (codebase, bot name, attribution)
`.github/workflows/update-kb.yml`	Dropped stale `paths-ignore` for the removed bot test
`.github/workflows/base.yml`	Dropped `--ignore` for the removed bot test
`scripts/issue_bot/`	Deleted (vendored engine, now upstream)
`tests/test_bot.py`	Deleted (vendored engine tests)

Testing

PR review — workflow_dispatch dry_run: true on GitHub Actions against a live PR: analyze + act succeed, prompts resolve from sm:pydeequ-bot/*, the 3-agent pipeline produces findings, and the review is pinned to the analyzed commit. (Results linked in a comment below once the run completes.)
Engine — 527 unit/integration tests pass upstream.

Issue-triage and follow-up run natively on real events after merge; workflow_dispatch cannot emit issue_comment events, and live issues trip the correct SKIP guards.

Rollback

Revert this PR — the vendored scripts/issue_bot/ returns and runs as before. No data or state migration; secrets and repo variables are untouched.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Replace the vendored scripts/issue_bot/ bot with a thin caller of the reusable sudsali/shadow workflow (SHA-pinned), matching what awslabs/deequ now does. This kills the code drift between per-repo vendored copies and upgrades PR review from the legacy single-pass path to the 3-agent Investigator/Critic/Reporter pipeline. - .github/workflows/issue-bot.yml: caller of sudsali/shadow@54ec94e - .shadow.yml: PyDeequ codebase config; bot.name kept as deequ-bot so the existing auto-approve.yml clean-marker grep keeps matching - remove vendored scripts/issue_bot/ and its tests/test_bot.py - update-kb.yml: drop stale paths-ignore for the deleted bot dir

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: 416310f3) — may not be fully accurate. Reply if this doesn't help.

github-actions · 2026-07-05T15:40:58Z

          poetry install
          poetry run pip install pyspark==$SPARK_VERSION
-          poetry run python -m pytest -s tests --ignore=tests/test_bot.py
+          poetry run python -m pytest -s tests


BUG: Removing --ignore=tests/test_bot.py without the corresponding deletion of tests/test_bot.py being reflected in the same commit would cause CI to fail if the file still existed. However, the diff shows tests/test_bot.py IS deleted in this PR, so the ignore removal is correct. The real concern is different: the CI matrix will now attempt to import scripts/issue_bot modules (since test_bot.py no longer exists, this is fine). DISPROVED — no issue here.

tests/test_bot.py is deleted in the same PR diff, so removing --ignore is consistent.

github-actions · 2026-07-05T15:40:58Z

+      prompt_sm_prefix: pydeequ-bot
+    secrets:
+      AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
+      GUARDRAIL_ID: ${{ secrets.GUARDRAIL_ID }}


EDGE_CASE: BEDROCK_REPORTER_MODEL_ID and BEDROCK_CRITIC_MODEL_ID are passed via ${{ vars.* }} (repository variables) but placed under the secrets: block. GitHub Actions allows passing vars as secret inputs (they're just strings), but if the reusable workflow declares these as secrets with required: true, the workflow will fail at startup if the repository variables are not set. Since these are new inputs not present in the old workflow, they must be configured before merge or the bot will break on every trigger.

Lines 57-58: BEDROCK_REPORTER_MODEL_ID: ${{ vars.BEDROCK_REPORTER_MODEL_ID }} and BEDROCK_CRITIC_MODEL_ID: ${{ vars.BEDROCK_CRITIC_MODEL_ID }} use vars.* context. If the reusable workflow marks these as required secrets and the vars are not configured, the workflow fails.

github-actions · 2026-07-05T15:40:58Z

-        run: python -m issue_bot.main act
-        working-directory: scripts
+    uses: sudsali/shadow/.github/workflows/shadow-review.yml@54ec94e0ca8c90d9b58ff95a2a06b175a115784e
+    with:


EDGE_CASE: pr_number: ${{ inputs.issue_number }} is only populated during workflow_dispatch. For issues, pull_request_target, and issue_comment events, inputs.issue_number is empty/null, so pr_number will be passed as an empty string. The reusable workflow must handle this gracefully (e.g., by falling back to the event's own issue/PR number). If the reusable workflow requires a non-empty pr_number, the bot will fail on all non-dispatch triggers.

Line 44: pr_number: ${{ inputs.issue_number }} — inputs.* context is only available for workflow_dispatch; for other event types it evaluates to empty string.

github-actions Bot reviewed Jul 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: migrate PyDeequ Bot to the shared Shadow engine#277

refactor: migrate PyDeequ Bot to the shared Shadow engine#277
sudsali wants to merge 1 commit into
masterfrom
migrate-to-shadow-engine

sudsali commented Jul 5, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jul 5, 2026

Uh oh!

github-actions Bot Jul 5, 2026

Uh oh!

github-actions Bot Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sudsali commented Jul 5, 2026

Summary

What Changes

Behavior Change: PR review moves to the 3-agent pipeline

Behavior Preserved

Architecture

Security

Files

Testing

Rollback

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant