Add user confirmation before sending code to external embedding APIs isuue #174 by Bakul2006 · Pull Request #179 · tirth8205/code-review-graph

Bakul2006 · 2026-04-09T08:25:51Z

-Added a confirmation prompt before invoking external embedding APIs

Ensures that code is not sent outside the local environment without explicit user consent

Behavior
If user confirms → API call proceeds
If user declines → operation is aborted
In non-interactive environments (e.g., AI agents, CI) → call is blocked by default

Impact
Prevents accidental code exfiltration
Improves transparency and user control
Keeps behavior aligned with expectations of local-first tooling

…ion to allow GOOGLE_API export the data to external layers

Bakul2006 · 2026-04-09T10:47:23Z

This change adds a minimal confirmation step before sending code to external embedding APIs. Happy to adjust the behavior based on feedback.

tirth8205 · 2026-04-11T08:52:40Z

Code Review — Concept valid, implementation has a critical bug

The goal of preventing accidental code exfiltration to cloud embedding APIs is correct. However the implementation has a critical bug and should not be merged as-is.

Critical Bug: `input()` corrupts the MCP stdio protocol

The confirmation prompt in code_review_graph/embeddings.py (or wherever you placed it — the diff shows it in tools/docs.py) calls input(), which writes to stdout. The MCP server uses stdio transport, where stdout is the JSON-RPC channel. Writing a user-facing prompt to stdout corrupts the protocol and breaks any MCP client connected to the server.

Example: Claude Code calling embed_graph via MCP would receive a corrupted response like:

WARNING: This will send codebase content to google APIs. Continue? (yes/no): {"result": ...}

The JSON parser would fail on the prefix.

Correct approach

For non-interactive environments (MCP server, CI, AI agents), the confirmation should refuse by default without any stdout output. In interactive TTY environments, the prompt must go to stderr (not stdout):

def _confirm_cloud_usage(provider_name: str) -> bool:
    """Confirm cloud provider usage. Safe for both MCP and CLI use."""
    if not sys.stdin.isatty():
        return False  # Non-interactive: always refuse
    # Write to stderr so MCP's stdout channel is not corrupted
    sys.stderr.write(
        f"\nWARNING: embed_graph will send codebase content to {provider_name} APIs.\n"
        "Continue? (yes/no): "
    )
    sys.stderr.flush()
    try:
        return sys.stdin.readline().strip().lower() in ("yes", "y")
    except (EOFError, KeyboardInterrupt):
        return False

Other issues

Placement: The confirmation logic belongs in tools/docs.py's embed_graph() function (the MCP tool layer), not inside EmbeddingStore itself. EmbeddingStore is a library class used in tests and scripts — injecting a confirmation prompt there breaks all automated use.
EmbeddingStore.provider.name.startswith("local:"): The check for whether a provider is local uses a string prefix. This is fragile. Better to check the provider type: isinstance(emb_store.provider, LocalProvider) or add an is_local property.
CI/testing: Any test that creates a non-local EmbeddingStore will now hang waiting for user input unless EOFError is raised. Your try/except EOFError handles this, but only if input() raises it — which it does in non-TTY test environments. The sys.stdin.isatty() check is the reliable way to detect this.

What to keep from this PR

The concept: before invoking a cloud embedding provider, check and block in non-interactive environments, warn in interactive ones. The EmbeddingStore wrapping approach in embed_graph() is the right place. The EOFError/KeyboardInterrupt handling is correct.

Please fix the stdout corruption issue and resubmit. The change is small (move to sys.stdin.isatty() + sys.stderr.write()) but critical.

…irmation

Bakul2006 · 2026-04-11T10:26:30Z

Addressed all review feedback and resolved conflicts.
Ready for review.

tirth8205 · 2026-04-11T17:16:01Z

Skipping auto-merge: after updating against current main, CI shows the following problems:

ruff (13 errors) — mostly W293 (blank line whitespace) and F841 (unused variables); the diff wasn't run through ruff check code_review_graph/ before push.

mypy (1 error): code_review_graph/main.py:739: "FastMCP" has no attribute "_local_only" — accessing a private attribute that doesn't exist.

All 4 test jobs (3.10–3.13) pass, so the feature itself looks fine; just needs a lint/type-check pass. Please run ruff check --fix code_review_graph/ and mypy code_review_graph/ --ignore-missing-imports --no-strict-optional and push a new commit, and I'll re-review.

…#174) Closes #174 (supersedes #179 which overlapped with #207 and used input()-on-stdio that would corrupt MCP transport). ``get_provider()`` now prints an explicit warning to **stderr** (never stdout/stdin) before returning a Google Gemini or MiniMax provider: ⚠️ code-review-graph: about to embed code via the 'google' cloud provider. Your source code (function names, docstrings, file paths) will be sent to an external API. To skip this warning in future runs, set CRG_ACCEPT_CLOUD_EMBEDDINGS=1 in your environment. To stay fully offline, use the default 'local' provider instead (no API key needed). Why stderr: the MCP stdio transport uses stdout for JSON-RPC. Writing to stdout or reading from stdin would corrupt the transport. stderr is safe and is captured by Claude Code's MCP logs. Why a warning not a block: the user already explicitly chose a cloud provider (either via ``provider="google"`` arg or via the MiniMax API key + explicit provider selection). Blocking would be paternalistic; warning is informative. The ``CRG_ACCEPT_CLOUD_EMBEDDINGS=1`` env var lets CI / scripted workflows acknowledge once and move on without repeated noise. Tests (4 new in TestCloudProviderWarning): - test_minimax_triggers_stderr_warning — captures stderr, asserts warning content, asserts stdout stays clean - test_google_triggers_stderr_warning — same for google - test_accept_env_var_suppresses_warning — CRG_ACCEPT_CLOUD_EMBEDDINGS=1 silences the warning - test_local_provider_never_warns — offline path stays quiet Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… (supersedes #204 #207 #179) (#228) * docs(troubleshooting): add quick-reference for the 4 most common support patterns These four issues account for the majority of support questions: 1. "Hooks use a matcher + hooks array" — users on pre-v2.2.3 releases. Answer: upgrade to v2.2.4 and re-run `code-review-graph install`. 2. "code-review-graph: command not found" after pip install — PATH issue. Answer: use pipx, uvx, or `python -m code_review_graph`. 3. "Is this project-scoped or user-scoped?" — 4-piece scope table (package is user-scoped, graph+mcp config are project-scoped, multi-repo registry is user-scoped). 4. "Built the graph but Claude Code doesn't see it in a new session" — didn't restart Claude Code / wrong cwd / ran build but not install. The new section is placed at the top of TROUBLESHOOTING.md so it's found first by anyone searching for these error messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Elixir parser support (#112) Closes #112. tree-sitter-elixir is already bundled with tree-sitter-language-pack so no new runtime dep. Elixir's AST is uniform: ``defmodule``, ``def``/``defp``/``defmacro``, ``alias``/``import``/``require``/``use``, and ordinary function invocations all share the ``call`` node type and are distinguished only by the leading identifier text. Added a dedicated ``_extract_elixir_constructs`` helper (following the R and Lua pattern) that dispatches by first-identifier: - ``defmodule Name do ... end`` -> Class node, recurse into do_block with ``enclosing_class=Name`` - ``def`` / ``defp`` / ``defmacro`` / ``defmacrop`` -> Function/Test node attached to the enclosing module, recurse into do_block with ``enclosing_func=<fn>`` so nested calls resolve to qualified names - ``alias`` / ``import`` / ``require`` / ``use`` -> IMPORTS_FROM edge targeting the module name (flat or dotted, e.g. ``Foo.Bar``) - Anything else -> CALLS edge with the leading identifier as the target name, followed by recursion into arguments + do_block for nested calls Helpers added: - ``_elixir_call_identifier`` — handles both plain identifiers and ``dot > alias + identifier`` for ``IO.puts`` style calls - ``_elixir_module_name`` — extracts a module name from an arguments subtree (``alias`` node or ``dot`` node) - ``_elixir_function_name_and_params`` — extracts the function name from the inner call node inside ``def``'s arguments Verified on ``tests/fixtures/sample.ex`` (two modules with 6 functions total across public/private def/defp, plus alias/import/require): - 9 nodes, 16 edges - 2 Class nodes (Calculator, MathHelpers) - 6 Function nodes with correct parent_name - 3 IMPORTS_FROM (alias, import, require) - 5 CALLS with internal resolution: Calculator.compute -> Calculator.add, MathHelpers.double -> Calculator.compute, MathHelpers.triple -> MathHelpers.double, etc. Tests (7 new in TestElixirParsing): - Language detection (.ex + .exs) - Module -> Class mapping - def/defp producing Function nodes with correct parents - alias/import/require producing IMPORTS_FROM - Internal call resolution to qualified names - CONTAINS edges linking module -> functions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: apply_refactor_tool dry-run mode returning unified diffs (#176) Closes #176. The existing rename_preview / apply_refactor flow was technically two-step, but apply_refactor wrote to disk immediately so there was no way to inspect the exact diff before committing. apply_refactor() now takes an optional ``dry_run: bool`` parameter. When true: - Compute the same edit plan as the real write path, grouping edits by file so multiple edits to the same file apply sequentially against updated content (fixes a subtle bug where separate edits on the same file could stomp each other) - Return a unified-diff string per would-be-modified file using ``difflib.unified_diff`` - Leave the refactor_id in the pending cache so the user can review the diff then call again with ``dry_run=False`` to commit The MCP tool wrapper (``apply_refactor_tool`` in main.py + ``apply_refactor_func`` in tools/refactor_tools.py) now exposes the ``dry_run`` parameter. The real-write path was also refactored to share the plan-computation step with dry-run — multi-edit files now work correctly in both modes. Tests (2 new in test_refactor.py): - test_apply_refactor_dry_run_returns_diff_without_writing: confirms the diff contains both old and new names, the file on disk is unchanged, the refactor_id is still valid after the dry-run, and a follow-up real apply with the same id succeeds and consumes the id. - test_apply_refactor_dry_run_no_edits: empty edit list returns an empty diffs dict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: CRG_DATA_DIR / CRG_REPO_ROOT env vars for out-of-tree graph storage (#155) Closes #155 (supersedes #207 which had unresolved review comments about input()-on-stdio and _local_only fragility). New env vars let users keep the .code-review-graph directory outside their working tree — useful for ephemeral workspaces, Docker volumes, shared CI caches, and multi-repo orchestrators where scripting the CLI from outside the repo is necessary. ## CRG_DATA_DIR When set, replaces the default ``<repo>/.code-review-graph`` directory verbatim. New ``get_data_dir(repo_root)`` helper in incremental.py is the single source of truth — it creates the directory (respecting the override) and writes the auto-generated inner ``.gitignore`` with ``*`` so graph files are never committed. Call sites updated to use ``get_data_dir()``: - ``incremental.py::get_db_path`` (the graph.db location) - ``cli.py::visualize`` (graph.html) - ``cli.py::wiki`` (wiki output dir) - ``tools/docs.py::generate_wiki_func`` (MCP wiki tool) - ``tools/docs.py::get_wiki_page_func`` (MCP wiki lookup) The legacy ``<repo>/.code-review-graph.db`` migration is preserved but only runs when CRG_DATA_DIR is unset (there's no relationship between the legacy location and an override location). ## CRG_REPO_ROOT ``find_project_root()`` now checks ``CRG_REPO_ROOT`` before the usual git-root walk. Useful for anyone scripting ``code-review-graph`` from a cwd that isn't inside the target repo (daemons, CI jobs, wrappers). Falls through to normal resolution if the path doesn't exist. ## Deliberate design choices (vs PR #207) - **No input() / confirmation prompts** — this branch of code runs from the CLI entry and can also be pulled into MCP wrappers which pipe stdio. Reading from stdin there would corrupt JSON-RPC. - **No ``mcp._local_only`` attribute** — that approach was fragile because it touched a private FastMCP attribute. - **Tilde expansion** — both env vars go through ``Path.expanduser()`` so ``CRG_DATA_DIR=~/Library/crg`` works. Tests (5 new in TestDataDir): - test_default_uses_repo_subdir — unset env, default behavior - test_env_override_replaces_repo_subdir — CRG_DATA_DIR works - test_get_db_path_uses_data_dir — graph.db follows the override - test_find_project_root_env_override — CRG_REPO_ROOT works - test_find_project_root_env_override_missing_dir_falls_through — bogus override doesn't crash, falls back to git root Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: stderr warning before cloud embedding providers exfiltrate code (#174) Closes #174 (supersedes #179 which overlapped with #207 and used input()-on-stdio that would corrupt MCP transport). ``get_provider()`` now prints an explicit warning to **stderr** (never stdout/stdin) before returning a Google Gemini or MiniMax provider: ⚠️ code-review-graph: about to embed code via the 'google' cloud provider. Your source code (function names, docstrings, file paths) will be sent to an external API. To skip this warning in future runs, set CRG_ACCEPT_CLOUD_EMBEDDINGS=1 in your environment. To stay fully offline, use the default 'local' provider instead (no API key needed). Why stderr: the MCP stdio transport uses stdout for JSON-RPC. Writing to stdout or reading from stdin would corrupt the transport. stderr is safe and is captured by Claude Code's MCP logs. Why a warning not a block: the user already explicitly chose a cloud provider (either via ``provider="google"`` arg or via the MiniMax API key + explicit provider selection). Blocking would be paternalistic; warning is informative. The ``CRG_ACCEPT_CLOUD_EMBEDDINGS=1`` env var lets CI / scripted workflows acknowledge once and move on without repeated noise. Tests (4 new in TestCloudProviderWarning): - test_minimax_triggers_stderr_warning — captures stderr, asserts warning content, asserts stdout stays clean - test_google_triggers_stderr_warning — same for google - test_accept_env_var_suppresses_warning — CRG_ACCEPT_CLOUD_EMBEDDINGS=1 silences the warning - test_local_provider_never_warns — offline path stays quiet Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tirth8205 · 2026-04-11T20:46:49Z

Thank you for this — closing as superseded by PR #228, which is a clean reimplementation of your changes on top of current `main` (v2.2.4 + the features from #227). The bundle ships in v2.3.0.

Your original design + issue discussion pushed this forward; really appreciate the work. If there's anything in the reimplementation you think is missing or wrong, please open a follow-up issue and I'll address it quickly.

@implementation

Additive feature release on top of v2.2.4 — new languages, new platform, new env vars, new MCP tool parameters. No breaking changes. New: - #112 Elixir parser (.ex/.exs) — modules, def/defp/defmacro, alias/import/require/use, internal call resolution - #88 Objective-C parser (.m) — @interface/@implementation, message expressions, C-style main, #import - #197 Bash/Shell parser (.sh/.bash/.zsh) — functions, commands, source imports with path resolution - #83 Qwen Code as a supported MCP install platform - #176 apply_refactor_tool dry-run mode (returns unified diffs without writing, keeps refactor_id valid for follow-up) - #155 CRG_DATA_DIR / CRG_REPO_ROOT env vars for out-of-tree graph storage and scripting from outside the repo - #173 install preview + --no-instructions + -y/--yes (supersedes closed PR #204) - #174 cloud embeddings stderr warning + CRG_ACCEPT_CLOUD_EMBEDDINGS opt-out (supersedes closed PR #179) Fixed: - Multi-edit refactor correctness bug (same-file edits could stomp each other) Docs: - TROUBLESHOOTING.md quick-reference section for the 4 most common support questions (hooks schema error, command not found, scoping, "built but not visible") Verified locally on Python 3.11: - ruff clean, mypy clean, bandit clean - 735 tests pass (+18 vs v2.2.4), coverage 74.63% - 6-repo smoke test still passes end-to-end on express/fastapi/flask/ gin/httpx/next.js Supersedes closed PRs: - #204 (install preview, credit @lngyeen) - #207 (CRG_DATA_DIR, credit @yashmewada9618) - #179 (cloud embeddings warning, credit @Bakul2006) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@implementation

Additive feature release on top of v2.2.4 — new languages, new platform, new env vars, new MCP tool parameters. No breaking changes. New: - #112 Elixir parser (.ex/.exs) — modules, def/defp/defmacro, alias/import/require/use, internal call resolution - #88 Objective-C parser (.m) — @interface/@implementation, message expressions, C-style main, #import - #197 Bash/Shell parser (.sh/.bash/.zsh) — functions, commands, source imports with path resolution - #83 Qwen Code as a supported MCP install platform - #176 apply_refactor_tool dry-run mode (returns unified diffs without writing, keeps refactor_id valid for follow-up) - #155 CRG_DATA_DIR / CRG_REPO_ROOT env vars for out-of-tree graph storage and scripting from outside the repo - #173 install preview + --no-instructions + -y/--yes (supersedes closed PR #204) - #174 cloud embeddings stderr warning + CRG_ACCEPT_CLOUD_EMBEDDINGS opt-out (supersedes closed PR #179) Fixed: - Multi-edit refactor correctness bug (same-file edits could stomp each other) Docs: - TROUBLESHOOTING.md quick-reference section for the 4 most common support questions (hooks schema error, command not found, scoping, "built but not visible") Verified locally on Python 3.11: - ruff clean, mypy clean, bandit clean - 735 tests pass (+18 vs v2.2.4), coverage 74.63% - 6-repo smoke test still passes end-to-end on express/fastapi/flask/ gin/httpx/next.js Supersedes closed PRs: - #204 (install preview, credit @lngyeen) - #207 (CRG_DATA_DIR, credit @yashmewada9618) - #179 (cloud embeddings warning, credit @Bakul2006) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bakul2006 added 4 commits April 9, 2026 00:16

fixing issue tirth8205#148

8a0fa64

Merge pull request #1 from Bakul2006/bakul2006/tirth8205#148

c8c8d07

fixing issue tirth8205#148

making user compatible to use his own directory path

ce6bc8f

added a security layer for the user by adding a human in loop permiss…

9abda53

…ion to allow GOOGLE_API export the data to external layers

Bakul2006 added 2 commits April 11, 2026 09:47

Fix MCP stdout corruption by replacing input() with stderr-based conf…

59633b7

…irmation

Merge branch 'main' into bakul2006/warningAPICall

edc1ab4

Merge branch 'main' into bakul2006/warningAPICall

ee8696b

Merge branch 'main' into bakul2006/warningAPICall

dcd0d6b

tirth8205 mentioned this pull request Apr 11, 2026

feat: qwen-code platform + Objective-C + Bash parser support #227

Merged

5 tasks

tirth8205 mentioned this pull request Apr 11, 2026

feat: Elixir + refactor dry-run + CRG_DATA_DIR + cloud warning + docs (supersedes #204 #207 #179) #228

Merged

5 tasks

tirth8205 closed this Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add user confirmation before sending code to external embedding APIs isuue #174#179

Add user confirmation before sending code to external embedding APIs isuue #174#179
Bakul2006 wants to merge 8 commits intotirth8205:mainfrom
Bakul2006:bakul2006/warningAPICall

Bakul2006 commented Apr 9, 2026

Uh oh!

Bakul2006 commented Apr 9, 2026

Uh oh!

tirth8205 commented Apr 11, 2026

Uh oh!

Bakul2006 commented Apr 11, 2026

Uh oh!

tirth8205 commented Apr 11, 2026

Uh oh!

tirth8205 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Bakul2006 commented Apr 9, 2026

Uh oh!

Bakul2006 commented Apr 9, 2026

Uh oh!

tirth8205 commented Apr 11, 2026

Code Review — Concept valid, implementation has a critical bug

Critical Bug: input() corrupts the MCP stdio protocol

Correct approach

Other issues

What to keep from this PR

Uh oh!

Bakul2006 commented Apr 11, 2026

Uh oh!

tirth8205 commented Apr 11, 2026

Uh oh!

tirth8205 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Critical Bug: `input()` corrupts the MCP stdio protocol