Skip to content

ci: incrementally update docs graph on PR merge#478

Merged
galshubeli merged 11 commits into
mainfrom
feat/docs-update-workflow
May 13, 2026
Merged

ci: incrementally update docs graph on PR merge#478
galshubeli merged 11 commits into
mainfrom
feat/docs-update-workflow

Conversation

@galshubeli
Copy link
Copy Markdown
Contributor

@galshubeli galshubeli commented May 12, 2026

Summary

Adds a GitHub Actions workflow that triggers the FalkorDB docs knowledge-graph update on every push to main touching any .md file. The workflow itself does only three things: clone the repo, build a JSON diff payload, and POST it to GraphRAG-UI's /api/admin/update-graph endpoint — all SDK ingestion happens server-side.

What the workflow does

  1. Checks out this repo with full history (needed for git diff against github.event.before)
  2. Runs .github/scripts/build_diff_payload.py to compute the .md diff between BASE_SHA (= github.event.before) and HEAD_SHA (= github.sha), read added + modified file contents via git show <head>:<path> (so historical commits work even if the working tree isn't checked out at HEAD), and write payload.json. Emits skip=true when no .md changes remain after filtering.
  3. curl POSTs the payload to ${GRAPHRAG_UI_URL}/api/admin/update-graph with a bearer token. --fail-with-body ensures the action exits non-zero on any non-2xx and surfaces the server's detail message in the log.

The endpoint does the rest: apply_changes + finalize against a UUID-suffixed copy of the live graph, smoke tests against :Graph.questions, atomic alias flip in the org graph, retention=1 cleanup of the previous-previous.

Files added

  • .github/workflows/update-graph.yml — the workflow (52 lines)
  • .github/scripts/build_diff_payload.py — Python helper that runs inside the workflow runner; pure data transformation (git diff → JSON), no network calls, no credentials

Setup required before merging

Repo secret (Settings → Secrets and variables → Actions → New repository secret)

  • UPDATE_GRAPH_TOKEN — shared bearer token. Must match the same env var on the GraphRAG-UI deployment (Railway).

Repo variable (Settings → Secrets and variables → Actions → Variables tab)

  • GRAPHRAG_UI_URL — base URL of the GraphRAG-UI deployment, e.g. https://staging.graphrag.falkordb.com. No trailing slash.

That's it. No FALKORDB_*, no AZURE_OPENAI_*, no GRAPHRAG_UI_CHECKOUT_PAT. All credentials live on the GraphRAG-UI side.

Test plan

  • Server-side wet run completed on staging (2026-05-13): manual curl of the same payload shape the workflow produces → endpoint accepted → apply_changes ran with full custom strategy parity (SentenceTokenCapChunking(256, 2), GLiNERExtractor @ 0.75, LLMVerifiedResolution @ 0.95/0.80) → 14-entity schema enforced → 5 smoke-test questions passed → atomic alias flip succeeded → :Graph.active_graph now docs_v6_bc0ca6b1
  • UPDATE_GRAPH_TOKEN set on this repo and on GraphRAG-UI's Railway env (same value)
  • GRAPHRAG_UI_URL set as a repo variable
  • Real PR merge to main exercises the workflow end-to-end (the remaining 10% the wet run didn't cover: trigger firing, secrets resolution, runner-environment Python execution)

Companion PRs

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Added an automated workflow to incrementally update the documentation knowledge graph on pushes to main that include markdown changes.
    • Serializes concurrent updates per branch, skips runs when no doc changes are detected, and posts a generated payload to the graph update endpoint, failing the run if the remote update does not succeed.

Review Change Stack

On PR merge to main touching any .md file, this workflow runs the
generic incremental update from FalkorDB/GraphRAG-UI:

  python -m server.scripts.update_graph --graph-id docs_benchmark ...

It checks out this repo (the docs content) AND GraphRAG-UI (where the
Python lives). The action's Python is source-agnostic — it reads
ingestion config from the :Graph node in the org graph, not from this
workflow. The only docs-specific input is `--graph-id docs_benchmark`;
the rest (LLM/embedder/chunker/extractor/resolver/globs/skip_list/
smoke-test questions) is data in the org graph. Future user-created
widgets share the same code path with no workflow change.

Concurrency group serializes runs on `main` so two PRs merging within
seconds queue rather than race. cancel-in-progress: false because each
run costs LLM credit.

Secrets required on this repo before the first run:
  FALKORDB_HOST, FALKORDB_PORT, FALKORDB_PASSWORD
  AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT
  GRAPHRAG_UI_CHECKOUT_PAT (only while FalkorDB/GraphRAG-UI is private;
    drop the `token:` line when it becomes public)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

Triggers incremental graph updates on pushes to main that modify Markdown files; runs a new script to build payload.json (added/modified/deleted .md), and conditionally POSTs that payload to GraphRAG-UI’s /api/admin/update-graph endpoint with bearer auth.

Changes

Graph update automation

Layer / File(s) Summary
Workflow trigger and concurrency
.github/workflows/update-graph.yml
Workflow now triggers on push to main filtered to **/*.md and uses concurrency keyed by github.ref_name.
Job env, permissions, and repo checkout
.github/workflows/update-graph.yml
Defines update-graph job runtime, GRAPH_ID and GRAPHRAG_UI_URL env vars, permissions, timeout, and checks out the repo with fetch-depth: 0.
Build diff payload (script + workflow step)
.github/scripts/build_diff_payload.py, .github/workflows/update-graph.yml
Adds build_diff_payload.py that normalizes empty base SHA, runs git diff --name-status between BASE_SHA and HEAD_SHA, reads added/modified .md contents from HEAD, records deleted paths (treats renames as delete+add), writes payload.json, and sets skip output; workflow invokes this script with BASE_SHA/HEAD_SHA.
POST payload to GraphRAG-UI admin endpoint
.github/workflows/update-graph.yml
Conditional curl POST of payload.json to $GRAPHRAG_UI_URL/api/admin/update-graph using Authorization: Bearer ${{ secrets.UPDATE_GRAPH_TOKEN }}, with --fail-with-body, --show-error, and --max-time 1800, gated by steps.payload.outputs.skip != 'true'.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit scurries through the docs at night,
Reads each changed page by lantern light,
Packs added lines and remembers what’s fled,
Sends a tidy JSON to wake the thread,
Then twitches whiskers — the graph hums bright. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title describes a workflow that updates the docs graph on PR merge, but the actual implementation triggers on pushes to main (not PR merge), which contradicts the stated title. Update the title to reflect the actual trigger mechanism, e.g., 'ci: incrementally update docs graph on push to main' or similar.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/docs-update-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread .github/workflows/update-graph.yml Fixed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.github/workflows/update-graph.yml (1)

57-62: ⚡ Quick win

Consider caching pip dependencies.

Adding pip caching would speed up subsequent runs and reduce load on PyPI.

📦 Proposed addition of pip caching
   - uses: actions/setup-python@v5
     with:
       python-version: "3.12"
+      cache: 'pip'
+      cache-dependency-path: 'graphrag-ui/server/requirements.txt'
 
   - name: Install GraphRAG-UI server deps
     run: pip install -r graphrag-ui/server/requirements.txt
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/update-graph.yml around lines 57 - 62, Add a pip cache
step before the "Install GraphRAG-UI server deps" step to persist ~/.cache/pip
across runs: introduce an actions/cache@v4 (or latest) step keyed by the
requirements file hash (e.g. key: ${{ runner.os }}-pip-${{
hashFiles('graphrag-ui/server/requirements.txt') }}) with path: ~/.cache/pip and
an appropriate restore-keys entry, then leave the "Install GraphRAG-UI server
deps" run: pip install -r graphrag-ui/server/requirements.txt step as-is so
installs use the cached wheel/archive files.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/update-graph.yml:
- Around line 32-34: The concurrency group currently uses "group:
update-graph-${{ github.ref }}" which resolves to the PR-specific ref and
prevents PRs targeting the same branch from queuing; change the group to use the
PR target branch instead (e.g., "group: update-graph-${{
github.event.pull_request.base.ref }}" or use the shorthand "group:
update-graph-${{ github.base_ref }}"), or hardcode the target like "group:
update-graph-main" if the target is always main; update the "concurrency" block
replacing the existing group expression so all merges to the same target share
the same concurrency group.
- Around line 36-83: Add an explicit permissions block to the update-graph job
to restrict the GITHUB_TOKEN to least privilege; update the job named
"update-graph" (the job containing the actions/checkout steps and the python
update_graph run) to include a permissions mapping such as only allowing
contents: read (and any additional narrowly-scoped permissions you actually
need, e.g., checks: write or statuses: write if you must post statuses), rather
than relying on the default token permissions—adjust the permissions entries to
the minimal set required by the steps (checkout, reading repo contents, and any
status updates).
- Line 43: Replace the outdated action versions referenced as uses:
actions/checkout@v4 and uses: actions/setup-python@v5 with the specified stable
releases: set actions/checkout to v6.0.2 and actions/setup-python to v6.2.0;
update those two uses: lines in the workflow so CI uses the new versions and
run/validate the workflow to confirm no breaking changes.

---

Nitpick comments:
In @.github/workflows/update-graph.yml:
- Around line 57-62: Add a pip cache step before the "Install GraphRAG-UI server
deps" step to persist ~/.cache/pip across runs: introduce an actions/cache@v4
(or latest) step keyed by the requirements file hash (e.g. key: ${{ runner.os
}}-pip-${{ hashFiles('graphrag-ui/server/requirements.txt') }}) with path:
~/.cache/pip and an appropriate restore-keys entry, then leave the "Install
GraphRAG-UI server deps" run: pip install -r graphrag-ui/server/requirements.txt
step as-is so installs use the cached wheel/archive files.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c11c11b-b8b0-4fc5-8afb-95fb0dbaab26

📥 Commits

Reviewing files that changed from the base of the PR and between 5f90004 and c85cbd8.

📒 Files selected for processing (1)
  • .github/workflows/update-graph.yml

Comment thread .github/workflows/update-graph.yml
Comment thread .github/workflows/update-graph.yml Outdated
Comment thread .github/workflows/update-graph.yml Outdated
The previous version of this workflow:
  - Checked out FalkorDB/GraphRAG-UI (needed a PAT secret)
  - pip-installed graphrag-sdk + dev deps
  - Ran ``python -m server.scripts.update_graph`` locally on the runner
  - Required FALKORDB_HOST/PORT/PASSWORD and AZURE_OPENAI_API_KEY/
    ENDPOINT/DEPLOYMENT secrets

GraphRAG-UI now exposes /api/admin/update-graph that does all that
work server-side using its existing credentials. This workflow drops to:

  1. Checkout docs (this repo) with full history
  2. Inline Python: parse ``git diff``, read .md content for added+modified,
     build a JSON payload with {graph_id, files:{added,modified,deleted}}
  3. curl POST the payload with a bearer token

Secrets required on this repo, total:
  - ``UPDATE_GRAPH_TOKEN`` — shared bearer token for the endpoint

Repo/environment variable required:
  - ``GRAPHRAG_UI_URL`` — base URL of the GraphRAG-UI deployment
    (e.g., https://api.staging.../  or https://api.prod.../)

What's gone vs. the previous version:
  - FALKORDB_HOST / FALKORDB_PORT / FALKORDB_PASSWORD
  - AZURE_OPENAI_API_KEY / AZURE_OPENAI_ENDPOINT / AZURE_OPENAI_DEPLOYMENT
  - GRAPHRAG_UI_CHECKOUT_PAT
  - The whole graphrag-ui sibling checkout and pip-install steps

Behavior notes:
  - The diff payload includes only .md files (path filter on the trigger
    catches non-.md PRs; the inline Python also re-filters for safety).
  - Renames are split into (delete old) + (add new with current content).
  - If the post-filter diff is empty, the workflow exits clean before the
    POST. The endpoint also short-circuits on empty diff but skipping
    saves one round-trip + the bearer-token cost.
  - curl --fail-with-body bubbles HTTP non-2xx (400 bad path, 401 wrong
    token, 422 smoke fail, 500 server config issue) up as CI failures
    with the server's detail message in the output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .github/workflows/update-graph.yml Fixed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/update-graph.yml (1)

31-35: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add an explicit least-privilege permissions block for this job.

GITHUB_TOKEN permissions are currently implicit. Lock this down to the minimum required scope.

🔒 Minimal fix
 jobs:
   update-graph:
     if: github.event.pull_request.merged == true
     runs-on: ubuntu-latest
     timeout-minutes: 30
+    permissions:
+      contents: read
     env:
#!/bin/bash
# Verify whether an explicit permissions block exists in this workflow.
rg -n '^\s*permissions:\s*$|^\s*contents:\s*read\s*$' .github/workflows/update-graph.yml
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/update-graph.yml around lines 31 - 35, The job
"update-graph" currently relies on implicit GITHUB_TOKEN permissions; add an
explicit least-privilege permissions block under the update-graph job to lock
down the token (e.g., add a permissions section with contents: read or whatever
minimal scopes the job actually needs) so the workflow no longer uses implicit
full permissions; update the job named update-graph to include the permissions
mapping (e.g., permissions: then the minimal key(s) like contents: read) to
satisfy the reviewer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/update-graph.yml:
- Around line 31-35: The job "update-graph" currently relies on implicit
GITHUB_TOKEN permissions; add an explicit least-privilege permissions block
under the update-graph job to lock down the token (e.g., add a permissions
section with contents: read or whatever minimal scopes the job actually needs)
so the workflow no longer uses implicit full permissions; update the job named
update-graph to include the permissions mapping (e.g., permissions: then the
minimal key(s) like contents: read) to satisfy the reviewer.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b2ae82e-8918-4fd7-89f6-40e1906ed077

📥 Commits

Reviewing files that changed from the base of the PR and between c85cbd8 and 160fd4a.

📒 Files selected for processing (1)
  • .github/workflows/update-graph.yml

galshubeli and others added 2 commits May 13, 2026 14:10
…ssions

Two issues flagged by CodeRabbit + CodeQL on PR #478:

1. concurrency.group used github.ref, which in a pull_request event
   resolves to refs/pull/<N>/merge — a per-PR value. Two PRs merging
   to main simultaneously would have ended up in *different*
   concurrency groups and run in parallel, defeating the queue.
   Server-side CAS in /api/admin/update-graph (FalkorDB/GraphRAG-UI#152)
   would have caught the race, but parallel runs would still cost 2×
   LLM credit for what should be one ingestion. Use
   github.event.pull_request.base.ref so all merges to main share
   update-graph-main and queue properly.

2. The job ran with default GITHUB_TOKEN permissions. The work only
   needs to read repo source (for the git diff); nothing writes back
   to the repo. Added `permissions: { contents: read }`. Closes CodeQL
   alerts #14 + #15 ("workflow does not contain permissions").

No functional change beyond serializing concurrent merges and
restricting the GITHUB_TOKEN scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@Naseem77 Naseem77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: use pull_request.merge_commit_sha instead of github.sha for HEAD_SHA

For pull_request events, github.sha resolves to the temporary refs/pull//merge test-merge commit GitHub creates for CI, not the commit that actually lands on main. On a closed + merged == true event, the authoritative
post-merge commit is github.event.pull_request.merge_commit_sha.

Why this matters here:

  • The whole payload is derived from git diff $BASE_SHA $HEAD_SHA. If HEAD_SHA points at the stale test-merge SHA, the diff can drift from what actually landed on main — especially with squash/rebase merges, or if the PR was
    updated between the last test-merge and the merge button click.
  • Failure mode is silent: the workflow succeeds, the endpoint accepts the payload, and the graph is updated from the wrong file set. No alarm fires.

Fix (one-line):

  • name: Build diff payload
    id: payload
    env:
    BASE_SHA: ${{ github.event.pull_request.base.sha }}

In a pull_request event, github.sha resolves to the temporary
test-merge commit (refs/pull/<N>/merge) GitHub creates for CI, NOT
the commit that actually lands on main when the merge button is
clicked. Using it for HEAD_SHA in `git diff $BASE_SHA $HEAD_SHA`
would silently corrupt the payload in three concrete scenarios:

  1. Squash merges — the squash commit on main is a different object
     than the test-merge; tree diffs *should* match but corner cases
     exist.
  2. Rebase merges — definitely different commits per rebased PR
     commit.
  3. PRs updated between last test-merge and actual merge (user clicked
     "Update branch" or rebased after CI's last run) — test-merge SHA
     is stale.

Failure mode is silent: workflow succeeds, endpoint accepts the
payload, graph ingests a different file set than what's actually on
main. First symptom would be widget answers referencing files that
don't match live docs.

Authoritative post-merge SHA is github.event.pull_request.merge_commit_sha,
populated on closed+merged events.

One subtle wrinkle worth knowing: pull_request.base.sha reflects
main's tip when the PR was *last updated*, not at *merge time*. If
main moved forward between those points, our diff includes the
interim changes too. Content-hash short-circuit in update_graph.py
makes re-ingesting identical content a no-op, so this is wasted-LLM
not wrong-graph. Tighter fix (use `merge_commit_sha^1` for base) is
a separate refinement that can land later if the edge case ever
matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@galshubeli
Copy link
Copy Markdown
Contributor Author

Confirmed and fixed in 03c58e6. You're right — this would have silently corrupted the graph on the first real merge, especially under squash (which I'd recommended in the merge checklist). Walked through the three scenarios you raised:

  • Squash merges: test-merge ≠ squash commit on main; tree diffs should line up but corner cases exist
  • Rebase merges: definitely different commits per rebased PR commit
  • PR updated between last CI run and merge: test-merge SHA is stale; real merge captures the newer state

All three end with the workflow happily POSTing a payload that doesn't match what's actually on main, no alarm fires. Failure mode is silent until someone notices widget answers citing wrong files.

One-line fix:

       env:
         BASE_SHA: ${{ github.event.pull_request.base.sha }}
-        HEAD_SHA: ${{ github.sha }}
+        HEAD_SHA: ${{ github.event.pull_request.merge_commit_sha }}

One related subtlety worth flagging (in the commit message too, not addressing in this PR):

pull_request.base.sha reflects main's tip when the PR was last updated, not at merge time. If main moves forward between those two points, our diff includes the interim landed changes too. The content-hash short-circuit in update_graph.py makes re-ingesting identical content a no-op, so this is wasted LLM cost rather than wrong-graph corruption.

The tighter fix would be BASE_SHA=$(git rev-parse merge_commit_sha^1) in a shell step, so the base is always the actual parent of the merge commit. Filing that as a v2 refinement — not blocking this PR since worst-case is cost waste on a rare timing.

Per product direction: "changes on main" is the source of truth, not
"PR closed." Switching the trigger has a couple of nice second-order
effects:

1. Catches direct pushes to main, not just PR merges. If anyone
   bypasses the PR flow (rare given branch protection, but possible
   for emergency fixes or admin pushes), the graph still updates.

2. The Naseem-flagged ``github.sha`` vs ``merge_commit_sha`` quirk
   goes away. On push events github.sha IS the actual commit on main,
   not a synthetic test-merge commit. The diff payload can use
   github.event.before + github.sha directly with no special-casing.

Changes:

  on:
-   pull_request:
-     types: [closed]
-     branches: [main]
+   push:
+     branches:
+       - main
      paths:
        - "**/*.md"

  jobs:
    update-graph:
-     if: github.event.pull_request.merged == true   # not needed on push

  concurrency:
-   group: update-graph-${{ github.event.pull_request.base.ref }}
+   group: update-graph-${{ github.ref_name }}      # = "main"

  env BASE_SHA:
-   ${{ github.event.pull_request.base.sha }}
+   ${{ github.event.before }}                       # parent of new HEAD

  env HEAD_SHA:
-   ${{ github.event.pull_request.merge_commit_sha }}
+   ${{ github.sha }}                                # new HEAD on main

Added defensive handling for the all-zero ``before`` SHA (carried by
the first push to a brand-new branch) — falls back to diffing against
git's empty-tree object so the workflow doesn't crash if main is ever
recreated.

Module docstring updated to reflect the new trigger.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@galshubeli
Copy link
Copy Markdown
Contributor Author

Heads-up: switched the trigger from pull_request: closed to push: branches: [main] (commit 836fb6d).

Per product direction, "changes on main" is the source of truth rather than "PR closed." Two upsides:

  1. Catches direct pushes to main, not just PR merges. If anyone bypasses the PR flow (rare under branch protection but possible for hotfixes or admin pushes), the graph still updates.
  2. The earlier github.sha vs merge_commit_sha quirk goes away. On push events github.sha IS the actual commit on main — not a synthetic test-merge commit. github.event.before gives the parent. The diff payload can use both directly with no special-casing.

Net change to the YAML:

 on:
-  pull_request:
-    types: [closed]
-    branches: [main]
+  push:
+    branches:
+      - main
     paths:
       - "**/*.md"

 jobs:
   update-graph:
-    if: github.event.pull_request.merged == true   # not needed on push

 concurrency:
-  group: update-graph-${{ github.event.pull_request.base.ref }}
+  group: update-graph-${{ github.ref_name }}       # = "main"

 BASE_SHA: ${{ github.event.before }}                # parent of new HEAD
 HEAD_SHA: ${{ github.sha }}                         # new HEAD on main

Also added defensive handling for the all-zero before SHA (carried by the first push to a newly-created branch) — falls back to diffing against git's empty-tree object so the workflow doesn't crash if main is ever recreated. Edge case only.

paths: ['**/*.md'] filter and concurrency-group serialization are preserved. CodeQL permissions block stays in place.

The previous pull_request-flavored fixes (concurrency target, merge_commit_sha) are obsoleted by this change but kept in history so the review trail is intact.

galshubeli and others added 2 commits May 13, 2026 14:40
The header block + verbose multi-line explanations were editorial,
not load-bearing — anyone reading the workflow can see what it does
from the step names. Kept one-line WHY notes only where the choice
isn't obvious from the code (empty-tree fallback for all-zero
before-SHA, fetch-depth=0 for the diff).

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brief header so a reader skimming the file knows what it does without
reading the steps. No editorial or marketing — just trigger, action,
and where the heavy lifting lives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gkorland gkorland requested a review from Naseem77 May 13, 2026 11:42
Inline heredoc moved to .github/scripts/build_diff_payload.py. The
workflow YAML drops from ~110 lines to ~50; the script gains a
``main()`` + helper functions and is syntax-highlighted / lintable /
unit-testable like normal Python.

``.github/`` is the CI-config corner of the repo, so this preserves
the spirit of "docs repo stays content-only" — Python lives in the
ops directory, not in source/content paths.

No behavior change. ``BASE_SHA`` / ``HEAD_SHA`` / ``GRAPH_ID`` /
``GITHUB_OUTPUT`` are still read from env exactly as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by a local dry-run against a historical commit pair: the
script was calling ``pathlib.Path(path).read_text()`` to grab the
content of added/modified files. That works in CI because
``actions/checkout`` puts HEAD on disk, but it fails locally when
the working directory isn't at HEAD_SHA (e.g., running the script
against a past commit to debug).

Switched to ``git show <head>:<path>``, which pulls the blob from
the object store regardless of what's checked out. Returns None on
"not present at that commit" and the caller skips silently — same
permissive behavior as the previous ``except FileNotFoundError: pass``
in the rename branch, now applied uniformly.

Verified locally on commit pair 5f90004..3ce7b18 (PR #477 merge):
script produces +1 ~1 -0 with correct file contents from the object
store. CI behavior unchanged because the checked-out HEAD's blobs
match the object-store blobs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a GitHub Actions workflow that, on merges/pushes to main affecting Markdown files, computes the .md file diff and posts the changes to an external GraphRAG-UI admin endpoint to incrementally update the FalkorDB docs knowledge graph.

Changes:

  • Added a push-to-main workflow (filtered to **/*.md) with concurrency control to serialize graph updates.
  • Implemented a Python helper script to compute git diff --name-status between github.event.before and github.sha, read added/modified .md contents, and write payload.json.
  • Added a curl step to call ${GRAPHRAG_UI_URL}/api/admin/update-graph with a bearer token, skipping when there are no ingestable .md changes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/update-graph.yml New workflow to trigger incremental graph updates on .md changes to main, build a diff payload, and POST it to GraphRAG-UI.
.github/scripts/build_diff_payload.py New script to compute changed .md files between two SHAs and generate the JSON payload used by the workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/update-graph.yml Outdated
Comment thread .github/workflows/update-graph.yml
Matches the existing convention from ``.github/workflows/spellcheck.yml``
(``actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6``).
Protects against tag squatting and silent action retargeting; addresses
Copilot review comment on PR #478.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/scripts/build_diff_payload.py:
- Around line 31-48: The _read_at function currently returns None on a failing
`git show`, allowing _collect_md_changes to silently skip changed files; change
_read_at (the function named _read_at) to raise a clear exception (e.g.,
RuntimeError or a custom error) when subprocess.run returns non-zero so the
script fails fast when a file cannot be read, and then remove the
now-unnecessary `if content is not None:` guards in _collect_md_changes to rely
on the raised error for unreadable changed files.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d613284-74f6-4dc9-880c-0413ade905ec

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae19a and b745c17.

📒 Files selected for processing (2)
  • .github/scripts/build_diff_payload.py
  • .github/workflows/update-graph.yml

Comment on lines +31 to +48
def _read_at(head: str, path: str) -> str | None:
"""Read a file's content at a specific commit, regardless of what's
currently checked out in the working tree.

Uses ``git show <head>:<path>``, which pulls the blob from the
object store. Reading from disk via ``pathlib`` would only work if
the runner had already checked out ``head``; this is more robust
and lets the script be exercised locally against historical
commits without checking them out first. Returns None if the path
doesn't exist at ``head`` (e.g. rare rename edge cases).
"""
proc = subprocess.run(
["git", "show", f"{head}:{path}"],
capture_output=True, text=True, check=False,
)
if proc.returncode != 0:
return None
return proc.stdout
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail closed when git show cannot read a changed file.

Returning None here lets _collect_md_changes() silently drop an added/modified Markdown file while the workflow still succeeds. On a rename, that can send deleted=["old.md"] without re-adding new.md, leaving the graph incomplete. This should raise and fail the job instead of emitting a partial payload.

🔧 Proposed fix
-def _read_at(head: str, path: str) -> str | None:
+def _read_at(head: str, path: str) -> str:
     """Read a file's content at a specific commit, regardless of what's
     currently checked out in the working tree.
@@
     proc = subprocess.run(
         ["git", "show", f"{head}:{path}"],
         capture_output=True, text=True, check=False,
     )
     if proc.returncode != 0:
-        return None
+        raise RuntimeError(
+            f"Failed to read {path!r} at {head}: {proc.stderr.strip() or 'git show failed'}"
+        )
     return proc.stdout

After this, the if content is not None: guards in _collect_md_changes() can be removed because unreadable changed files will already fail the step.

🧰 Tools
🪛 Ruff (0.15.12)

[error] 42-42: subprocess call: check for execution of untrusted input

(S603)


[error] 43-43: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/build_diff_payload.py around lines 31 - 48, The _read_at
function currently returns None on a failing `git show`, allowing
_collect_md_changes to silently skip changed files; change _read_at (the
function named _read_at) to raise a clear exception (e.g., RuntimeError or a
custom error) when subprocess.run returns non-zero so the script fails fast when
a file cannot be read, and then remove the now-unnecessary `if content is not
None:` guards in _collect_md_changes to rely on the raised error for unreadable
changed files.

@galshubeli galshubeli merged commit 62df1a3 into main May 13, 2026
6 checks passed
@galshubeli galshubeli deleted the feat/docs-update-workflow branch May 13, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants