ci: incrementally update docs graph on PR merge by galshubeli · Pull Request #478 · FalkorDB/docs

galshubeli · 2026-05-12T12:04:03Z

Summary

Adds a GitHub Actions workflow that triggers the FalkorDB docs knowledge-graph update on every push to main touching any .md file. The workflow itself does only three things: clone the repo, build a JSON diff payload, and POST it to GraphRAG-UI's /api/admin/update-graph endpoint — all SDK ingestion happens server-side.

What the workflow does

Checks out this repo with full history (needed for git diff against github.event.before)
Runs .github/scripts/build_diff_payload.py to compute the .md diff between BASE_SHA (= github.event.before) and HEAD_SHA (= github.sha), read added + modified file contents via git show <head>:<path> (so historical commits work even if the working tree isn't checked out at HEAD), and write payload.json. Emits skip=true when no .md changes remain after filtering.
curl POSTs the payload to ${GRAPHRAG_UI_URL}/api/admin/update-graph with a bearer token. --fail-with-body ensures the action exits non-zero on any non-2xx and surfaces the server's detail message in the log.

The endpoint does the rest: apply_changes + finalize against a UUID-suffixed copy of the live graph, smoke tests against :Graph.questions, atomic alias flip in the org graph, retention=1 cleanup of the previous-previous.

Files added

.github/workflows/update-graph.yml — the workflow (52 lines)
.github/scripts/build_diff_payload.py — Python helper that runs inside the workflow runner; pure data transformation (git diff → JSON), no network calls, no credentials

Setup required before merging

Repo secret (Settings → Secrets and variables → Actions → New repository secret)

UPDATE_GRAPH_TOKEN — shared bearer token. Must match the same env var on the GraphRAG-UI deployment (Railway).

Repo variable (Settings → Secrets and variables → Actions → Variables tab)

GRAPHRAG_UI_URL — base URL of the GraphRAG-UI deployment, e.g. https://staging.graphrag.falkordb.com. No trailing slash.

That's it. No FALKORDB_*, no AZURE_OPENAI_*, no GRAPHRAG_UI_CHECKOUT_PAT. All credentials live on the GraphRAG-UI side.

Test plan

Server-side wet run completed on staging (2026-05-13): manual curl of the same payload shape the workflow produces → endpoint accepted → apply_changes ran with full custom strategy parity (SentenceTokenCapChunking(256, 2), GLiNERExtractor @ 0.75, LLMVerifiedResolution @ 0.95/0.80) → 14-entity schema enforced → 5 smoke-test questions passed → atomic alias flip succeeded → :Graph.active_graph now docs_v6_bc0ca6b1
UPDATE_GRAPH_TOKEN set on this repo and on GraphRAG-UI's Railway env (same value)
GRAPHRAG_UI_URL set as a repo variable
Real PR merge to main exercises the workflow end-to-end (the remaining 10% the wet run didn't cover: trigger firing, secrets resolution, runner-environment Python execution)

Companion PRs

FalkorDB/GraphRAG-UI#152 — alias resolver + /api/admin/update-graph endpoint (merged)
FalkorDB/GraphRAG-UI#156 — SDK 1.1.1 bump + strategy kwarg restore (merged, deployed)
feat(api): forward loader/chunker/extractor/resolver from apply_changes to ingest/update GraphRAG-SDK#250 + Change the FalkorDB logo to point to www.falkordb.com #251 — strategy forwarding in apply_changes, released as v1.1.1 (both merged)

🤖 Generated with Claude Code

Summary by CodeRabbit

Chores
- Added an automated workflow to incrementally update the documentation knowledge graph on pushes to main that include markdown changes.
- Serializes concurrent updates per branch, skips runs when no doc changes are detected, and posts a generated payload to the graph update endpoint, failing the run if the remote update does not succeed.

On PR merge to main touching any .md file, this workflow runs the generic incremental update from FalkorDB/GraphRAG-UI: python -m server.scripts.update_graph --graph-id docs_benchmark ... It checks out this repo (the docs content) AND GraphRAG-UI (where the Python lives). The action's Python is source-agnostic — it reads ingestion config from the :Graph node in the org graph, not from this workflow. The only docs-specific input is `--graph-id docs_benchmark`; the rest (LLM/embedder/chunker/extractor/resolver/globs/skip_list/ smoke-test questions) is data in the org graph. Future user-created widgets share the same code path with no workflow change. Concurrency group serializes runs on `main` so two PRs merging within seconds queue rather than race. cancel-in-progress: false because each run costs LLM credit. Secrets required on this repo before the first run: FALKORDB_HOST, FALKORDB_PORT, FALKORDB_PASSWORD AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT GRAPHRAG_UI_CHECKOUT_PAT (only while FalkorDB/GraphRAG-UI is private; drop the `token:` line when it becomes public) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-12T12:04:20Z

📝 Walkthrough

Walkthrough

Triggers incremental graph updates on pushes to main that modify Markdown files; runs a new script to build payload.json (added/modified/deleted .md), and conditionally POSTs that payload to GraphRAG-UI’s /api/admin/update-graph endpoint with bearer auth.

Changes

Graph update automation

Layer / File(s)	Summary
Workflow trigger and concurrency `.github/workflows/update-graph.yml`	Workflow now triggers on `push` to `main` filtered to `*/.md` and uses concurrency keyed by `github.ref_name`.
Job env, permissions, and repo checkout `.github/workflows/update-graph.yml`	Defines `update-graph` job runtime, `GRAPH_ID` and `GRAPHRAG_UI_URL` env vars, permissions, timeout, and checks out the repo with `fetch-depth: 0`.
Build diff payload (script + workflow step) `.github/scripts/build_diff_payload.py`, `.github/workflows/update-graph.yml`	Adds `build_diff_payload.py` that normalizes empty base SHA, runs `git diff --name-status` between BASE_SHA and HEAD_SHA, reads added/modified `.md` contents from HEAD, records deleted paths (treats renames as delete+add), writes `payload.json`, and sets `skip` output; workflow invokes this script with BASE_SHA/HEAD_SHA.
POST payload to GraphRAG-UI admin endpoint `.github/workflows/update-graph.yml`	Conditional `curl` POST of `payload.json` to `$GRAPHRAG_UI_URL/api/admin/update-graph` using `Authorization: Bearer ${{ secrets.UPDATE_GRAPH_TOKEN }}`, with `--fail-with-body`, `--show-error`, and `--max-time 1800`, gated by `steps.payload.outputs.skip != 'true'`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit scurries through the docs at night,
Reads each changed page by lantern light,
Packs added lines and remembers what’s fled,
Sends a tidy JSON to wake the thread,
Then twitches whiskers — the graph hums bright. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title describes a workflow that updates the docs graph on PR merge, but the actual implementation triggers on pushes to main (not PR merge), which contradicts the stated title.	Update the title to reflect the actual trigger mechanism, e.g., 'ci: incrementally update docs graph on push to main' or similar.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/docs-update-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

.github/workflows/update-graph.yml (1)

57-62: ⚡ Quick win

Consider caching pip dependencies.

Adding pip caching would speed up subsequent runs and reduce load on PyPI.

📦 Proposed addition of pip caching

   - uses: actions/setup-python@v5
     with:
       python-version: "3.12"
+      cache: 'pip'
+      cache-dependency-path: 'graphrag-ui/server/requirements.txt'
 
   - name: Install GraphRAG-UI server deps
     run: pip install -r graphrag-ui/server/requirements.txt

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/update-graph.yml around lines 57 - 62, Add a pip cache
step before the "Install GraphRAG-UI server deps" step to persist ~/.cache/pip
across runs: introduce an actions/cache@v4 (or latest) step keyed by the
requirements file hash (e.g. key: ${{ runner.os }}-pip-${{
hashFiles('graphrag-ui/server/requirements.txt') }}) with path: ~/.cache/pip and
an appropriate restore-keys entry, then leave the "Install GraphRAG-UI server
deps" run: pip install -r graphrag-ui/server/requirements.txt step as-is so
installs use the cached wheel/archive files.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/update-graph.yml:
- Around line 32-34: The concurrency group currently uses "group:
update-graph-${{ github.ref }}" which resolves to the PR-specific ref and
prevents PRs targeting the same branch from queuing; change the group to use the
PR target branch instead (e.g., "group: update-graph-${{
github.event.pull_request.base.ref }}" or use the shorthand "group:
update-graph-${{ github.base_ref }}"), or hardcode the target like "group:
update-graph-main" if the target is always main; update the "concurrency" block
replacing the existing group expression so all merges to the same target share
the same concurrency group.
- Around line 36-83: Add an explicit permissions block to the update-graph job
to restrict the GITHUB_TOKEN to least privilege; update the job named
"update-graph" (the job containing the actions/checkout steps and the python
update_graph run) to include a permissions mapping such as only allowing
contents: read (and any additional narrowly-scoped permissions you actually
need, e.g., checks: write or statuses: write if you must post statuses), rather
than relying on the default token permissions—adjust the permissions entries to
the minimal set required by the steps (checkout, reading repo contents, and any
status updates).
- Line 43: Replace the outdated action versions referenced as uses:
actions/checkout@v4 and uses: actions/setup-python@v5 with the specified stable
releases: set actions/checkout to v6.0.2 and actions/setup-python to v6.2.0;
update those two uses: lines in the workflow so CI uses the new versions and
run/validate the workflow to confirm no breaking changes.

---

Nitpick comments:
In @.github/workflows/update-graph.yml:
- Around line 57-62: Add a pip cache step before the "Install GraphRAG-UI server
deps" step to persist ~/.cache/pip across runs: introduce an actions/cache@v4
(or latest) step keyed by the requirements file hash (e.g. key: ${{ runner.os
}}-pip-${{ hashFiles('graphrag-ui/server/requirements.txt') }}) with path:
~/.cache/pip and an appropriate restore-keys entry, then leave the "Install
GraphRAG-UI server deps" run: pip install -r graphrag-ui/server/requirements.txt
step as-is so installs use the cached wheel/archive files.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c11c11b-b8b0-4fc5-8afb-95fb0dbaab26

📥 Commits

Reviewing files that changed from the base of the PR and between 5f90004 and c85cbd8.

📒 Files selected for processing (1)

.github/workflows/update-graph.yml

The previous version of this workflow: - Checked out FalkorDB/GraphRAG-UI (needed a PAT secret) - pip-installed graphrag-sdk + dev deps - Ran ``python -m server.scripts.update_graph`` locally on the runner - Required FALKORDB_HOST/PORT/PASSWORD and AZURE_OPENAI_API_KEY/ ENDPOINT/DEPLOYMENT secrets GraphRAG-UI now exposes /api/admin/update-graph that does all that work server-side using its existing credentials. This workflow drops to: 1. Checkout docs (this repo) with full history 2. Inline Python: parse ``git diff``, read .md content for added+modified, build a JSON payload with {graph_id, files:{added,modified,deleted}} 3. curl POST the payload with a bearer token Secrets required on this repo, total: - ``UPDATE_GRAPH_TOKEN`` — shared bearer token for the endpoint Repo/environment variable required: - ``GRAPHRAG_UI_URL`` — base URL of the GraphRAG-UI deployment (e.g., https://api.staging.../ or https://api.prod.../) What's gone vs. the previous version: - FALKORDB_HOST / FALKORDB_PORT / FALKORDB_PASSWORD - AZURE_OPENAI_API_KEY / AZURE_OPENAI_ENDPOINT / AZURE_OPENAI_DEPLOYMENT - GRAPHRAG_UI_CHECKOUT_PAT - The whole graphrag-ui sibling checkout and pip-install steps Behavior notes: - The diff payload includes only .md files (path filter on the trigger catches non-.md PRs; the inline Python also re-filters for safety). - Renames are split into (delete old) + (add new with current content). - If the post-filter diff is empty, the workflow exits clean before the POST. The endpoint also short-circuits on empty diff but skipping saves one round-trip + the bearer-token cost. - curl --fail-with-body bubbles HTTP non-2xx (400 bad path, 401 wrong token, 422 smoke fail, 500 server config issue) up as CI failures with the server's detail message in the output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

.github/workflows/update-graph.yml (1)

31-35: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add an explicit least-privilege permissions block for this job.

GITHUB_TOKEN permissions are currently implicit. Lock this down to the minimum required scope.

🔒 Minimal fix

 jobs:
   update-graph:
     if: github.event.pull_request.merged == true
     runs-on: ubuntu-latest
     timeout-minutes: 30
+    permissions:
+      contents: read
     env:

#!/bin/bash
# Verify whether an explicit permissions block exists in this workflow.
rg -n '^\s*permissions:\s*$|^\s*contents:\s*read\s*$' .github/workflows/update-graph.yml

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/update-graph.yml around lines 31 - 35, The job
"update-graph" currently relies on implicit GITHUB_TOKEN permissions; add an
explicit least-privilege permissions block under the update-graph job to lock
down the token (e.g., add a permissions section with contents: read or whatever
minimal scopes the job actually needs) so the workflow no longer uses implicit
full permissions; update the job named update-graph to include the permissions
mapping (e.g., permissions: then the minimal key(s) like contents: read) to
satisfy the reviewer.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/update-graph.yml:
- Around line 31-35: The job "update-graph" currently relies on implicit
GITHUB_TOKEN permissions; add an explicit least-privilege permissions block
under the update-graph job to lock down the token (e.g., add a permissions
section with contents: read or whatever minimal scopes the job actually needs)
so the workflow no longer uses implicit full permissions; update the job named
update-graph to include the permissions mapping (e.g., permissions: then the
minimal key(s) like contents: read) to satisfy the reviewer.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b2ae82e-8918-4fd7-89f6-40e1906ed077

📥 Commits

Reviewing files that changed from the base of the PR and between c85cbd8 and 160fd4a.

📒 Files selected for processing (1)

.github/workflows/update-graph.yml

…ssions Two issues flagged by CodeRabbit + CodeQL on PR #478: 1. concurrency.group used github.ref, which in a pull_request event resolves to refs/pull/<N>/merge — a per-PR value. Two PRs merging to main simultaneously would have ended up in *different* concurrency groups and run in parallel, defeating the queue. Server-side CAS in /api/admin/update-graph (FalkorDB/GraphRAG-UI#152) would have caught the race, but parallel runs would still cost 2× LLM credit for what should be one ingestion. Use github.event.pull_request.base.ref so all merges to main share update-graph-main and queue properly. 2. The job ran with default GITHUB_TOKEN permissions. The work only needs to read repo source (for the git diff); nothing writes back to the repo. Added `permissions: { contents: read }`. Closes CodeQL alerts #14 + #15 ("workflow does not contain permissions"). No functional change beyond serializing concurrent merges and restricting the GITHUB_TOKEN scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Naseem77

Blocking: use pull_request.merge_commit_sha instead of github.sha for HEAD_SHA

For pull_request events, github.sha resolves to the temporary refs/pull//merge test-merge commit GitHub creates for CI, not the commit that actually lands on main. On a closed + merged == true event, the authoritative
post-merge commit is github.event.pull_request.merge_commit_sha.

Why this matters here:

The whole payload is derived from git diff $BASE_SHA $HEAD_SHA. If HEAD_SHA points at the stale test-merge SHA, the diff can drift from what actually landed on main — especially with squash/rebase merges, or if the PR was
updated between the last test-merge and the merge button click.
Failure mode is silent: the workflow succeeds, the endpoint accepts the payload, and the graph is updated from the wrong file set. No alarm fires.

Fix (one-line):

name: Build diff payload
id: payload
env:
BASE_SHA: ${{ github.event.pull_request.base.sha }}

In a pull_request event, github.sha resolves to the temporary test-merge commit (refs/pull/<N>/merge) GitHub creates for CI, NOT the commit that actually lands on main when the merge button is clicked. Using it for HEAD_SHA in `git diff $BASE_SHA $HEAD_SHA` would silently corrupt the payload in three concrete scenarios: 1. Squash merges — the squash commit on main is a different object than the test-merge; tree diffs *should* match but corner cases exist. 2. Rebase merges — definitely different commits per rebased PR commit. 3. PRs updated between last test-merge and actual merge (user clicked "Update branch" or rebased after CI's last run) — test-merge SHA is stale. Failure mode is silent: workflow succeeds, endpoint accepts the payload, graph ingests a different file set than what's actually on main. First symptom would be widget answers referencing files that don't match live docs. Authoritative post-merge SHA is github.event.pull_request.merge_commit_sha, populated on closed+merged events. One subtle wrinkle worth knowing: pull_request.base.sha reflects main's tip when the PR was *last updated*, not at *merge time*. If main moved forward between those points, our diff includes the interim changes too. Content-hash short-circuit in update_graph.py makes re-ingesting identical content a no-op, so this is wasted-LLM not wrong-graph. Tighter fix (use `merge_commit_sha^1` for base) is a separate refinement that can land later if the edge case ever matters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

galshubeli · 2026-05-13T11:30:45Z

Confirmed and fixed in 03c58e6. You're right — this would have silently corrupted the graph on the first real merge, especially under squash (which I'd recommended in the merge checklist). Walked through the three scenarios you raised:

Squash merges: test-merge ≠ squash commit on main; tree diffs should line up but corner cases exist
Rebase merges: definitely different commits per rebased PR commit
PR updated between last CI run and merge: test-merge SHA is stale; real merge captures the newer state

All three end with the workflow happily POSTing a payload that doesn't match what's actually on main, no alarm fires. Failure mode is silent until someone notices widget answers citing wrong files.

One-line fix:

       env:
         BASE_SHA: ${{ github.event.pull_request.base.sha }}
-        HEAD_SHA: ${{ github.sha }}
+        HEAD_SHA: ${{ github.event.pull_request.merge_commit_sha }}

One related subtlety worth flagging (in the commit message too, not addressing in this PR):

pull_request.base.sha reflects main's tip when the PR was last updated, not at merge time. If main moves forward between those two points, our diff includes the interim landed changes too. The content-hash short-circuit in update_graph.py makes re-ingesting identical content a no-op, so this is wasted LLM cost rather than wrong-graph corruption.

The tighter fix would be BASE_SHA=$(git rev-parse merge_commit_sha^1) in a shell step, so the base is always the actual parent of the merge commit. Filing that as a v2 refinement — not blocking this PR since worst-case is cost waste on a rare timing.

Per product direction: "changes on main" is the source of truth, not "PR closed." Switching the trigger has a couple of nice second-order effects: 1. Catches direct pushes to main, not just PR merges. If anyone bypasses the PR flow (rare given branch protection, but possible for emergency fixes or admin pushes), the graph still updates. 2. The Naseem-flagged ``github.sha`` vs ``merge_commit_sha`` quirk goes away. On push events github.sha IS the actual commit on main, not a synthetic test-merge commit. The diff payload can use github.event.before + github.sha directly with no special-casing. Changes: on: - pull_request: - types: [closed] - branches: [main] + push: + branches: + - main paths: - "**/*.md" jobs: update-graph: - if: github.event.pull_request.merged == true # not needed on push concurrency: - group: update-graph-${{ github.event.pull_request.base.ref }} + group: update-graph-${{ github.ref_name }} # = "main" env BASE_SHA: - ${{ github.event.pull_request.base.sha }} + ${{ github.event.before }} # parent of new HEAD env HEAD_SHA: - ${{ github.event.pull_request.merge_commit_sha }} + ${{ github.sha }} # new HEAD on main Added defensive handling for the all-zero ``before`` SHA (carried by the first push to a brand-new branch) — falls back to diffing against git's empty-tree object so the workflow doesn't crash if main is ever recreated. Module docstring updated to reflect the new trigger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

galshubeli · 2026-05-13T11:39:15Z

Heads-up: switched the trigger from pull_request: closed to push: branches: [main] (commit 836fb6d).

Per product direction, "changes on main" is the source of truth rather than "PR closed." Two upsides:

Catches direct pushes to main, not just PR merges. If anyone bypasses the PR flow (rare under branch protection but possible for hotfixes or admin pushes), the graph still updates.
The earlier github.sha vs merge_commit_sha quirk goes away. On push events github.sha IS the actual commit on main — not a synthetic test-merge commit. github.event.before gives the parent. The diff payload can use both directly with no special-casing.

Net change to the YAML:

 on:
-  pull_request:
-    types: [closed]
-    branches: [main]
+  push:
+    branches:
+      - main
     paths:
       - "**/*.md"

 jobs:
   update-graph:
-    if: github.event.pull_request.merged == true   # not needed on push

 concurrency:
-  group: update-graph-${{ github.event.pull_request.base.ref }}
+  group: update-graph-${{ github.ref_name }}       # = "main"

 BASE_SHA: ${{ github.event.before }}                # parent of new HEAD
 HEAD_SHA: ${{ github.sha }}                         # new HEAD on main

Also added defensive handling for the all-zero before SHA (carried by the first push to a newly-created branch) — falls back to diffing against git's empty-tree object so the workflow doesn't crash if main is ever recreated. Edge case only.

paths: ['**/*.md'] filter and concurrency-group serialization are preserved. CodeQL permissions block stays in place.

The previous pull_request-flavored fixes (concurrency target, merge_commit_sha) are obsoleted by this change but kept in history so the review trail is intact.

The header block + verbose multi-line explanations were editorial, not load-bearing — anyone reading the workflow can see what it does from the step names. Kept one-line WHY notes only where the choice isn't obvious from the code (empty-tree fallback for all-zero before-SHA, fetch-depth=0 for the diff). No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brief header so a reader skimming the file knows what it does without reading the steps. No editorial or marketing — just trigger, action, and where the heavy lifting lives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Inline heredoc moved to .github/scripts/build_diff_payload.py. The workflow YAML drops from ~110 lines to ~50; the script gains a ``main()`` + helper functions and is syntax-highlighted / lintable / unit-testable like normal Python. ``.github/`` is the CI-config corner of the repo, so this preserves the spirit of "docs repo stays content-only" — Python lives in the ops directory, not in source/content paths. No behavior change. ``BASE_SHA`` / ``HEAD_SHA`` / ``GRAPH_ID`` / ``GITHUB_OUTPUT`` are still read from env exactly as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surfaced by a local dry-run against a historical commit pair: the script was calling ``pathlib.Path(path).read_text()`` to grab the content of added/modified files. That works in CI because ``actions/checkout`` puts HEAD on disk, but it fails locally when the working directory isn't at HEAD_SHA (e.g., running the script against a past commit to debug). Switched to ``git show <head>:<path>``, which pulls the blob from the object store regardless of what's checked out. Returns None on "not present at that commit" and the caller skips silently — same permissive behavior as the previous ``except FileNotFoundError: pass`` in the rename branch, now applied uniformly. Verified locally on commit pair 5f90004..3ce7b18 (PR #477 merge): script produces +1 ~1 -0 with correct file contents from the object store. CI behavior unchanged because the checked-out HEAD's blobs match the object-store blobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a GitHub Actions workflow that, on merges/pushes to main affecting Markdown files, computes the .md file diff and posts the changes to an external GraphRAG-UI admin endpoint to incrementally update the FalkorDB docs knowledge graph.

Changes:

Added a push-to-main workflow (filtered to **/*.md) with concurrency control to serialize graph updates.
Implemented a Python helper script to compute git diff --name-status between github.event.before and github.sha, read added/modified .md contents, and write payload.json.
Added a curl step to call ${GRAPHRAG_UI_URL}/api/admin/update-graph with a bearer token, skipping when there are no ingestable .md changes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`.github/workflows/update-graph.yml`	New workflow to trigger incremental graph updates on `.md` changes to `main`, build a diff payload, and POST it to GraphRAG-UI.
`.github/scripts/build_diff_payload.py`	New script to compute changed `.md` files between two SHAs and generate the JSON payload used by the workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Matches the existing convention from ``.github/workflows/spellcheck.yml`` (``actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6``). Protects against tag squatting and silent action retargeting; addresses Copilot review comment on PR #478. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/scripts/build_diff_payload.py:
- Around line 31-48: The _read_at function currently returns None on a failing
`git show`, allowing _collect_md_changes to silently skip changed files; change
_read_at (the function named _read_at) to raise a clear exception (e.g.,
RuntimeError or a custom error) when subprocess.run returns non-zero so the
script fails fast when a file cannot be read, and then remove the
now-unnecessary `if content is not None:` guards in _collect_md_changes to rely
on the raised error for unreadable changed files.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d613284-74f6-4dc9-880c-0413ade905ec

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae19a and b745c17.

📒 Files selected for processing (2)

.github/scripts/build_diff_payload.py
.github/workflows/update-graph.yml

coderabbitai · 2026-05-13T13:57:16Z

+def _read_at(head: str, path: str) -> str | None:
+    """Read a file's content at a specific commit, regardless of what's
+    currently checked out in the working tree.
+
+    Uses ``git show <head>:<path>``, which pulls the blob from the
+    object store. Reading from disk via ``pathlib`` would only work if
+    the runner had already checked out ``head``; this is more robust
+    and lets the script be exercised locally against historical
+    commits without checking them out first. Returns None if the path
+    doesn't exist at ``head`` (e.g. rare rename edge cases).
+    """
+    proc = subprocess.run(
+        ["git", "show", f"{head}:{path}"],
+        capture_output=True, text=True, check=False,
+    )
+    if proc.returncode != 0:
+        return None
+    return proc.stdout


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail closed when git show cannot read a changed file.

Returning None here lets _collect_md_changes() silently drop an added/modified Markdown file while the workflow still succeeds. On a rename, that can send deleted=["old.md"] without re-adding new.md, leaving the graph incomplete. This should raise and fail the job instead of emitting a partial payload.

🔧 Proposed fix

-def _read_at(head: str, path: str) -> str | None: +def _read_at(head: str, path: str) -> str: """Read a file's content at a specific commit, regardless of what's currently checked out in the working tree. @@ proc = subprocess.run( ["git", "show", f"{head}:{path}"], capture_output=True, text=True, check=False, ) if proc.returncode != 0: - return None + raise RuntimeError( + f"Failed to read {path!r} at {head}: {proc.stderr.strip() or 'git show failed'}" + ) return proc.stdout

After this, the if content is not None: guards in _collect_md_changes() can be removed because unreadable changed files will already fail the step.

🧰 Tools

🪛 Ruff (0.15.12)

[error] 42-42: subprocess call: check for execution of untrusted input

(S603)

[error] 43-43: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/build_diff_payload.py around lines 31 - 48, The _read_at function currently returns None on a failing `git show`, allowing _collect_md_changes to silently skip changed files; change _read_at (the function named _read_at) to raise a clear exception (e.g., RuntimeError or a custom error) when subprocess.run returns non-zero so the script fails fast when a file cannot be read, and then remove the now-unnecessary `if content is not None:` guards in _collect_md_changes to rely on the raised error for unreadable changed files.

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Comment thread .github/workflows/update-graph.yml Fixed

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread .github/workflows/update-graph.yml

Comment thread .github/workflows/update-graph.yml Outdated

Comment thread .github/workflows/update-graph.yml Outdated

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Comment thread .github/workflows/update-graph.yml Fixed

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

galshubeli and others added 2 commits May 13, 2026 14:10

Merge branch 'main' into feat/docs-update-workflow

fa72491

Naseem77 requested changes May 13, 2026

View reviewed changes

galshubeli and others added 2 commits May 13, 2026 14:40

gkorland requested a review from Naseem77 May 13, 2026 11:42

gkorland requested a review from Copilot May 13, 2026 11:54

Copilot started reviewing on behalf of gkorland May 13, 2026 11:54 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread .github/workflows/update-graph.yml Outdated

Comment thread .github/workflows/update-graph.yml

Naseem77 approved these changes May 13, 2026

View reviewed changes

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

galshubeli merged commit 62df1a3 into main May 13, 2026
6 checks passed

galshubeli deleted the feat/docs-update-workflow branch May 13, 2026 14:06

galshubeli mentioned this pull request May 13, 2026

docs(genai-tools): rewrite GraphRAG-SDK page for v1.1.x API #479

Merged

3 tasks

Conversation

galshubeli commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the workflow does

Files added

Setup required before merging

Repo secret (Settings → Secrets and variables → Actions → New repository secret)

Repo variable (Settings → Secrets and variables → Actions → Variables tab)

Test plan

Companion PRs

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Naseem77 left a comment

Choose a reason for hiding this comment

Uh oh!

galshubeli commented May 13, 2026

Uh oh!

galshubeli commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

galshubeli commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading