CAMEL-23273: Camel-Jbang-mcp: Sanitize sensitive data in POM content passed to migration tools#22344
Conversation
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
|
🧪 CI tested the following changed modules:
|
gnodet
left a comment
There was a problem hiding this comment.
Review Summary
Claude Code on behalf of Guillaume Nodet
Overview: This PR adds a PomSanitizer utility to detect and mask sensitive data (passwords, tokens, API keys) in POM content before processing by MCP migration tools. It adds a sanitizePom boolean parameter (default: true) to camel_migration_analyze, camel_dependency_check, and camel_migration_wildfly_karaf tools. Includes 21 unit tests for the sanitizer and 3 integration tests.
Verdict: Request changes
Blocking
- Rebase needed against current
main— This PR was branched beforedba5a0f7194e(CAMEL-23270), which added@Tool.Annotations(readOnlyHint, destructiveHint, openWorldHint)to all MCP tools. The PR's versions ofMigrationTools.java,DependencyCheckTools.java, andMigrationWildflyKarafTools.javado not include theannotationsparameter on@Tool. Merging as-is will either cause conflicts or silently drop the annotations. Please rebase onto currentmain.
Major
-
Code duplication — The 13-line sanitization block is copy-pasted identically across all three tool methods:
String processedPom = pomContent; List<String> sanitizationWarnings = new ArrayList<>(); if (sanitizePom == null || sanitizePom) { PomSanitizer.SanitizationResult sr = PomSanitizer.sanitize(pomContent); processedPom = sr.pomContent(); for (String pattern : sr.detectedPatterns()) { sanitizationWarnings.add("Sensitive data detected and masked: " + pattern); } }
Consider extracting a helper into
PomSanitizer, e.g.:record ProcessedPom(String content, List<String> warnings) {} static ProcessedPom process(String pomContent, Boolean sanitize) { ... }
This keeps each tool method clean and ensures consistent behavior if the sanitization logic evolves.
-
Missing integration tests for
MigrationToolsandMigrationWildflyKarafTools— Sanitization was added to all three tools, but integration tests were only added toDependencyCheckToolsTest. The other two tool test classes should also verify that:- sanitization masks sensitive data and produces warnings
sanitizePom=falsebypasses sanitization- analysis still works correctly after sanitization
Minor
<servers>is asettings.xmlelement, not apom.xmlelement — The<servers>section belongs to Maven'ssettings.xml, notpom.xml. A valid POM should never contain<servers>. While stripping it as a safety net for accidental pastes is harmless, the Javadoc should clarify this (e.g., "Strips<servers>sections which belong to settings.xml and may be accidentally included").
Nit
-
wasSanitizedfield is computed but never consumed —SanitizationResult.wasSanitized()is only used in tests, never by the tool methods themselves (they checkdetectedPatternsinstead). Consider removing it or documenting it's for testing/logging only. -
Per-pattern warning messages are verbose — The loop prefixes each detected pattern with
"Sensitive data detected and masked: ". A single summary warning (e.g.,"Sensitive data detected and masked: db.password, api.token") would be more concise in the tool response.
Overall the approach is sound — POM sanitization is a sensible security measure for MCP tools that accept user-provided POM content. The PomSanitizer regex patterns are well-crafted and the property placeholder preservation (${...}) is a nice touch. The test coverage for the sanitizer itself is thorough. The main action items are rebasing onto current main and reducing the code duplication.
|
Claude Code on behalf of Guillaume Nodet Follow-up on point 4 from my review — on further thought, stripping
Stripping it removes legitimate POM information with no real security benefit. I'd suggest removing |
|
Claude Code on behalf of Guillaume Nodet Another follow-up on the section stripping — the
In summary, both section-stripping patterns are questionable:
I'd suggest removing both section-stripping patterns and keeping only the sensitive element value masking (passwords, tokens, API keys in properties, etc.), which is the part that genuinely adds security value. |
|
Claude Code on behalf of Guillaume Nodet One more observation on the sensitive data detection approach — the regex is purely tag-name-based, matching any XML element whose name contains keywords like "password", "token", "secret", etc. This has some limitations worth considering: False positives — non-secret values in elements that happen to contain a keyword:
False negatives — actual secrets in elements with non-obvious names:
The heuristic is reasonable as a best-effort safety net, but worth documenting these limitations — especially since false positives could mask legitimate configuration values that the migration analysis might need. |
|
Claude Code on behalf of Andrea Cosentino Thank you for the thorough review! All feedback has been addressed in the latest commit: Blocking #1 (Rebase): Checked — no conflicts exist against current Major #2 (Code duplication): Extracted Major #3 (Missing integration tests): Added Follow-up: Remove section stripping: Removed both Follow-up: Document regex limitations: Added comprehensive Javadoc to Nit #5 ( Nit #6 (Verbose warnings): Consolidated per-pattern warnings into a single summary: All 189 tests pass, code is formatted. |
|
Why should we care about this? We're adding more to the LLM context just to work around a bad practice. Do we really want to be opinionated about that? |
Considering the amount of slop we are going to see and face, maybe it will be good to be opinionated. |
|
Claude Code on behalf of Guillaume Nodet Suggestion: plugin-aware sanitization instead of tag-name heuristicsThe current regex approach matches any XML element whose tag name contains keywords like "password", "token", "secret", etc. This has inherent limitations: False positives — non-secret values get masked:
False negatives — actual secrets are missed:
A more accurate approach would be plugin/mojo-aware:
This is more work than the current regex, but it's precise — no false positives on config values, no false negatives on credentials in non-obviously-named elements, and no need to strip entire POM sections ( |
…nt passed to migration tools Add PomSanitizer utility to detect and mask sensitive data (passwords, tokens, API keys, secrets) in POM content before processing. Strips <servers> and <distributionManagement> sections. Add sanitizePom boolean parameter (default: true) to camel_migration_analyze, camel_dependency_check, and camel_migration_wildfly_karaf tools. Update tool descriptions with sanitization guidance. Add 21 tests covering detection, masking, placeholder preservation, and tool integration. Signed-off-by: Andrea Cosentino <ancosen@gmail.com>
- Remove <servers> and <distributionManagement> section stripping (servers belongs to settings.xml, distributionManagement contains URLs not credentials) - Extract PomSanitizer.process() helper to eliminate code duplication across MigrationTools, DependencyCheckTools, and MigrationWildflyKarafTools - Consolidate per-pattern warnings into a single summary warning - Remove unused wasSanitized field from SanitizationResult - Document regex-based detection limitations (false positives/negatives) - Add sanitization integration tests for MigrationTools and MigrationWildflyKarafTools Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude Code on behalf of Andrea Cosentino Rebased onto current Regarding the plugin-aware sanitization suggestion: I agree this would be more precise long-term, but it's a significantly larger effort (maintaining a plugin catalog, XML parsing, property tracing). The current tag-name heuristic is documented with its limitations (false positives/negatives in the Javadoc) and serves as a reasonable best-effort safety net for the initial implementation. We could evolve toward plugin-aware sanitization in a follow-up if the heuristic proves insufficient in practice. |
Add PomSanitizer utility to detect and mask sensitive data (passwords, tokens, API keys, secrets) in POM content before processing. Add sanitizePom boolean parameter (default: true) to camel_migration_analyze, camel_dependency_check, and camel_migration_wildfly_karaf tools. Update tool descriptions with sanitization guidance.
Changes
${...}). Javadoc documents known limitations (false positives/negatives of tag-name-based heuristic).camel_migration_analyze,camel_dependency_check, andcamel_migration_wildfly_karaf. Defaults totrue.Target
mainbranch)Tracking
Apache Camel coding standards and style
mvn clean install -DskipTestslocally from root folder and I have committed all auto-generated changes.