Skip to content

Commit 8b020ed

Browse files
committed
Feedback from jgravelle
Feedback #1: Rebase required "Large PRs need to be rebased onto current main before merging." How addressed: The fork's main was force-reset to match jgravelle/jcodemunch-mcp:main at 971ae14. The feat/dbt-sql-support branch (SQL language support) was cherry-picked cleanly onto upstream main as a separate branch. This branch (feat-provider-context-encrichment) is also based on 971ae14. --- Feedback #2: No opt-out path for context providers "discover_providers() runs on every index_folder call... no way to disable this." How addressed: - src/jcodemunch_mcp/tools/index_folder.py — Added context_providers: bool = True parameter to index_folder(). When False, or when JCODEMUNCH_CONTEXT_PROVIDERS=0 is set in the environment, provider discovery is skipped entirely — no YAML parsing, no doc block scanning, no overhead. Renamed the internal variable from context_providers to active_providers to avoid shadowing the new parameter. - CONTEXT_PROVIDERS.md — Added a new "Disabling Context Providers" subsection documenting both the env var and per-call parameter, including an MCP server config JSON example. - README.md — Added JCODEMUNCH_CONTEXT_PROVIDERS row to the environment variables table. - USER_GUIDE.md — Added bullet point for the env var in the Claude Desktop setup section. --- Feedback #3: detect() sets instance state as a side effect "If load() is called without detect(), it raises AttributeError." How addressed: - src/jcodemunch_mcp/parser/context/dbt.py — Initialized self._dbt_yml_path: Optional[Path] = None in __init__(). Added a guard at the top of load() that logs a warning and returns early if _dbt_yml_path is None. --- Feedback #4: File context lookup by stem could false-match "A file named schema.sql would match a dbt model named schema." How addressed: - src/jcodemunch_mcp/parser/context/dbt.py — Added _model_path_prefixes list, populated during load() with the relative paths of the project's configured model-paths directories. Added _is_in_model_path() method. get_file_context() now returns None immediately for files outside model directories, preventing false matches on files like scripts/schema.sql or schema.sql at the project root. - tests/test_dbt_provider.py — Added test_get_file_context_outside_model_path test verifying that files outside models/ don't match even when the stem matches a model name (schema.sql, my_model.sql, scripts/my_model.sql), while files inside models/ still match correctly. Updated test_get_file_context_by_stem to reflect the new path-scoped behavior. - CONTEXT_PROVIDERS.md — Added a "How It Matches Files" subsection to the dbt Provider docs explaining the stem + path scoping strategy with examples. Updated the Terraform example's get_file_context comment to recommend path validation before stem matching. --- Feedback #5: bash.exe.stackdump in .gitignore "Windows build artifact, not a project file." How addressed: - .gitignore — Removed the bash.exe.stackdump line. --- Feedback #6: SQL_SPEC empty dicts "Worth adding a pragma or docstring so future contributors don't try to fix it." How addressed: Already addressed in the separate feat/dbt-sql-support branch. The SQL_SPEC in languages.py has a multi-line comment explaining that the derekstride grammar has no named field accessors and pointing to _parse_sql_symbols() in extractor.py where the actual extraction logic lives. No change needed on this branch. --- Test results - 481 passed, 4 skipped, 0 failures (one new test added)
1 parent 08d9028 commit 8b020ed

File tree

8 files changed

+189
-17
lines changed

8 files changed

+189
-17
lines changed

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,4 +60,3 @@ CLAUDE.md
6060
.vbw-planning
6161

6262
.vbw-planning/
63-
bash.exe.stackdump

CONTEXT_PROVIDERS.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,17 @@ models:
6060
6161
Doc references (`{{ doc('name') }}`) are resolved automatically.
6262

63+
### How It Matches Files
64+
65+
The provider matches indexed files to dbt models by **file stem** (filename without extension), but only for files within the project's configured `model-paths` directories. This prevents false matches — for example, a `scripts/schema.sql` file will not be matched to a dbt model named `schema`, but `models/schema.sql` will.
66+
67+
```
68+
models/fct_daily_revenue.sql ✓ matches model "fct_daily_revenue"
69+
models/staging/fct_daily_revenue.sql ✓ matches (subdirectories OK)
70+
scripts/fct_daily_revenue.sql ✗ outside model-paths
71+
schema.sql ✗ outside model-paths
72+
```
73+
6374
### How It Enriches
6475
6576
**Symbol `ecosystem_context`** (injected into AI prompts):
@@ -194,7 +205,8 @@ class TerraformContextProvider(ContextProvider):
194205
# ... your parsing logic here ...
195206

196207
def get_file_context(self, file_path: str) -> Optional[FileContext]:
197-
# Return context if this file has metadata
208+
# Validate the file is within your tool's project directories
209+
# before matching by stem, to avoid false positives
198210
module = self._modules.get(Path(file_path).stem)
199211
if module:
200212
return FileContext(
@@ -260,6 +272,42 @@ Potential future providers for community contribution:
260272

261273
Context providers require no configuration — they activate automatically when their ecosystem is detected. Provider-specific optional dependencies (like `pyyaml` for dbt) should be installed separately.
262274

275+
### Disabling Context Providers
276+
277+
Context providers can be disabled globally via environment variable or per-call via parameter:
278+
279+
**Environment variable** — disables providers for all `index_folder` calls:
280+
281+
```bash
282+
JCODEMUNCH_CONTEXT_PROVIDERS=0
283+
```
284+
285+
In your MCP server config:
286+
287+
```json
288+
{
289+
"mcpServers": {
290+
"jcodemunch": {
291+
"command": "uvx",
292+
"args": ["jcodemunch-mcp"],
293+
"env": {
294+
"JCODEMUNCH_CONTEXT_PROVIDERS": "0"
295+
}
296+
}
297+
}
298+
}
299+
```
300+
301+
**Per-call parameter** — pass `context_providers: false` to `index_folder`:
302+
303+
```python
304+
index_folder(path="/my/project", context_providers=False)
305+
```
306+
307+
Either method skips provider discovery entirely — no YAML parsing, no doc block scanning, no enrichment overhead.
308+
309+
### Debugging
310+
263311
To verify which providers activated during indexing, check the `context_enrichment` key in the `index_folder` response or enable debug logging:
264312

265313
```

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -478,6 +478,7 @@ For **LM Studio**, ensure the Local Server is running (usually on port 1234):
478478
| `OPENAI_MAX_TOKENS` | Max output tokens per batch response (default: `500`) | No |
479479
| `CODE_INDEX_PATH` | Custom cache path | No |
480480
| `JCODEMUNCH_MAX_INDEX_FILES`| Maximum files to index per repo/folder (default: `10000`) | No |
481+
| `JCODEMUNCH_CONTEXT_PROVIDERS` | Set to `0` to disable context providers (dbt, etc.) during indexing | No |
481482
| `JCODEMUNCH_SHARE_SAVINGS` | Set to `0` to disable anonymous community token savings reporting | No |
482483
| `JCODEMUNCH_LOG_LEVEL` | Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR` (default: `WARNING`) | No |
483484
| `JCODEMUNCH_LOG_FILE` | Path to log file. If unset, logs go to stderr. Use a file to avoid polluting MCP stdio. | No |

USER_GUIDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ Environment variables are optional:
109109
* `GOOGLE_API_KEY` enables AI-generated summaries via Gemini Flash (used if `ANTHROPIC_API_KEY` is not set).
110110
* `GOOGLE_MODEL` overrides the Gemini model (default: `gemini-2.5-flash-lite`).
111111
* If neither key is set, summaries fall back to docstrings or signatures.
112+
* `JCODEMUNCH_CONTEXT_PROVIDERS=0` disables context providers (dbt metadata enrichment, etc.) during indexing.
112113

113114
Restart Claude Desktop after saving the config.
114115

src/jcodemunch_mcp/parser/context/dbt.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,8 +172,10 @@ class DbtContextProvider(ContextProvider):
172172
"""
173173

174174
def __init__(self):
175+
self._dbt_yml_path: Optional[Path] = None
175176
self._doc_blocks: dict[str, str] = {}
176177
self._models: dict[str, DbtModelMetadata] = {}
178+
self._model_path_prefixes: list[str] = []
177179

178180
@property
179181
def name(self) -> str:
@@ -184,6 +186,9 @@ def detect(self, folder_path: Path) -> bool:
184186
return self._dbt_yml_path is not None
185187

186188
def load(self, folder_path: Path) -> None:
189+
if self._dbt_yml_path is None:
190+
logger.warning("load() called without detect() — skipping")
191+
return
187192
project_root = self._dbt_yml_path.parent
188193
logger.info("dbt project detected at %s", project_root)
189194

@@ -210,13 +215,32 @@ def load(self, folder_path: Path) -> None:
210215
if md not in docs_dirs:
211216
docs_dirs.append(md)
212217

218+
# Store model path prefixes (relative to indexed folder) for path validation.
219+
# Only files under these prefixes are considered dbt models.
220+
self._model_path_prefixes = []
221+
for md in models_dirs:
222+
try:
223+
rel = md.resolve().relative_to(folder_path.resolve())
224+
# Normalize to forward slashes for cross-platform matching
225+
self._model_path_prefixes.append(str(rel).replace("\\", "/") + "/")
226+
except ValueError:
227+
# Model dir is outside the indexed folder — use absolute as fallback
228+
self._model_path_prefixes.append(str(md).replace("\\", "/") + "/")
229+
213230
self._doc_blocks = _parse_doc_blocks(docs_dirs)
214231
logger.info("Loaded %d dbt doc blocks", len(self._doc_blocks))
215232

216233
self._models = _parse_yml_files(models_dirs, self._doc_blocks)
217234
logger.info("Loaded metadata for %d dbt models", len(self._models))
218235

236+
def _is_in_model_path(self, file_path: str) -> bool:
237+
"""Check if a file is within a dbt model directory."""
238+
normalized = file_path.replace("\\", "/")
239+
return any(normalized.startswith(prefix) for prefix in self._model_path_prefixes)
240+
219241
def get_file_context(self, file_path: str) -> Optional[FileContext]:
242+
if not self._is_in_model_path(file_path):
243+
return None
220244
stem = Path(file_path).stem
221245
model = self._models.get(stem)
222246
if model is not None:

src/jcodemunch_mcp/tools/index_folder.py

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,7 @@ def index_folder(
310310
extra_ignore_patterns: Optional[list[str]] = None,
311311
follow_symlinks: bool = False,
312312
incremental: bool = True,
313+
context_providers: bool = True,
313314
) -> dict:
314315
"""Index a local folder containing source code.
315316
@@ -319,6 +320,8 @@ def index_folder(
319320
storage_path: Custom storage path (default: ~/.code-index/).
320321
extra_ignore_patterns: Additional gitignore-style patterns to exclude.
321322
follow_symlinks: Whether to follow symlinks (default False for safety).
323+
context_providers: Whether to run context providers (default True).
324+
Set to False or set JCODEMUNCH_CONTEXT_PROVIDERS=0 to disable.
322325
incremental: When True and an existing index exists, only re-index changed files.
323326
324327
Returns:
@@ -352,9 +355,10 @@ def index_folder(
352355
return {"success": False, "error": "No source files found"}
353356

354357
# Discover context providers (dbt, terraform, etc.)
355-
context_providers = discover_providers(folder_path)
356-
if context_providers:
357-
names = ", ".join(p.name for p in context_providers)
358+
_providers_enabled = context_providers and os.environ.get("JCODEMUNCH_CONTEXT_PROVIDERS", "1") != "0"
359+
active_providers = discover_providers(folder_path) if _providers_enabled else []
360+
if active_providers:
361+
names = ", ".join(p.name for p in active_providers)
358362
logger.info("Active context providers: %s", names)
359363

360364
# Create repo identifier from folder path
@@ -441,16 +445,16 @@ def index_folder(
441445
)
442446

443447
# Enrich with context providers before summarization
444-
if context_providers:
445-
enrich_symbols(new_symbols, context_providers)
448+
if active_providers:
449+
enrich_symbols(new_symbols, active_providers)
446450

447451
new_symbols = summarize_symbols(new_symbols, use_ai=use_ai_summaries)
448452

449453
# Generate file summaries for changed/new files
450454
incr_symbols_map = defaultdict(list)
451455
for s in new_symbols:
452456
incr_symbols_map[s.file].append(s)
453-
incr_file_summaries = _complete_file_summaries(sorted(files_to_parse), incr_symbols_map, context_providers=context_providers)
457+
incr_file_summaries = _complete_file_summaries(sorted(files_to_parse), incr_symbols_map, context_providers=active_providers)
454458
incr_file_languages = _file_languages_for_paths(sorted(files_to_parse), incr_symbols_map)
455459

456460
git_head = _get_git_head(folder_path) or ""
@@ -513,8 +517,8 @@ def index_folder(
513517
)
514518

515519
# Enrich with context providers before summarization
516-
if context_providers and all_symbols:
517-
enrich_symbols(all_symbols, context_providers)
520+
if active_providers and all_symbols:
521+
enrich_symbols(all_symbols, active_providers)
518522

519523
# Generate summaries
520524
if all_symbols:
@@ -526,7 +530,7 @@ def index_folder(
526530
file_symbols_map[s.file].append(s)
527531
file_languages = _file_languages_for_paths(source_file_list, file_symbols_map)
528532
languages = _language_counts(file_languages)
529-
file_summaries = _complete_file_summaries(source_file_list, file_symbols_map, context_providers=context_providers)
533+
file_summaries = _complete_file_summaries(source_file_list, file_symbols_map, context_providers=active_providers)
530534

531535
# Save index
532536
# Track hashes for all discovered source files so incremental change detection
@@ -567,9 +571,9 @@ def index_folder(
567571
}
568572

569573
# Report context enrichment stats from all active providers
570-
if context_providers:
574+
if active_providers:
571575
enrichment = {}
572-
for provider in context_providers:
576+
for provider in active_providers:
573577
enrichment[provider.name] = provider.stats()
574578
result["context_enrichment"] = enrichment
575579

tests/test_dbt_provider.py

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -360,7 +360,7 @@ def test_full_lifecycle(self, tmp_path):
360360
assert "order_id" in ctx.properties
361361

362362
def test_get_file_context_by_stem(self, tmp_path):
363-
"""Matches by file stem regardless of directory path."""
363+
"""Matches by file stem within model directories."""
364364
root = _create_dbt_project(tmp_path)
365365
models_dir = root / "models"
366366
models_dir.mkdir()
@@ -374,10 +374,35 @@ def test_get_file_context_by_stem(self, tmp_path):
374374
provider.detect(root)
375375
provider.load(root)
376376

377-
# Different directory prefixes, same stem
377+
# Within models directory — matches by stem
378378
assert provider.get_file_context("models/my_model.sql") is not None
379379
assert provider.get_file_context("models/staging/my_model.sql") is not None
380-
assert provider.get_file_context("my_model.sql") is not None
380+
381+
def test_get_file_context_outside_model_path(self, tmp_path):
382+
"""Files outside model directories are not matched, even if stem matches."""
383+
root = _create_dbt_project(tmp_path)
384+
models_dir = root / "models"
385+
models_dir.mkdir()
386+
_write_schema_yml(models_dir / "schema.yml", """
387+
models:
388+
- name: my_model
389+
description: "Should not match outside models/"
390+
- name: schema
391+
description: "A model named schema"
392+
""")
393+
394+
provider = DbtContextProvider()
395+
provider.detect(root)
396+
provider.load(root)
397+
398+
# Outside models/ — should not match
399+
assert provider.get_file_context("my_model.sql") is None
400+
assert provider.get_file_context("scripts/my_model.sql") is None
401+
assert provider.get_file_context("schema.sql") is None
402+
403+
# Inside models/ — should match
404+
assert provider.get_file_context("models/my_model.sql") is not None
405+
assert provider.get_file_context("models/schema.sql") is not None
381406

382407
def test_get_file_context_no_match(self, tmp_path):
383408
root = _create_dbt_project(tmp_path)

0 commit comments

Comments
 (0)