Skip to content

CAMEL-23273: Camel-Jbang-mcp: Sanitize sensitive data in POM content passed to migration tools#22344

Merged
oscerd merged 2 commits into
mainfrom
CAMEL-23273
Apr 1, 2026
Merged

CAMEL-23273: Camel-Jbang-mcp: Sanitize sensitive data in POM content passed to migration tools#22344
oscerd merged 2 commits into
mainfrom
CAMEL-23273

Conversation

@oscerd
Copy link
Copy Markdown
Contributor

@oscerd oscerd commented Mar 30, 2026

Add PomSanitizer utility to detect and mask sensitive data (passwords, tokens, API keys, secrets) in POM content before processing. Add sanitizePom boolean parameter (default: true) to camel_migration_analyze, camel_dependency_check, and camel_migration_wildfly_karaf tools. Update tool descriptions with sanitization guidance.

Changes

  • PomSanitizer: Regex-based detection of sensitive XML element values with masking. Preserves property placeholders (${...}). Javadoc documents known limitations (false positives/negatives of tag-name-based heuristic).
  • PomSanitizer.process(): Shared helper used by all three tool methods to avoid code duplication. Returns processed content and a single summary warning when sensitive data is found.
  • sanitizePom parameter: Added to camel_migration_analyze, camel_dependency_check, and camel_migration_wildfly_karaf. Defaults to true.
  • Tests: 16 unit tests for PomSanitizer (detection, masking, placeholders, process helper). Integration tests in MigrationToolsTest, MigrationWildflyKarafToolsTest, and DependencyCheckToolsTest verifying sanitization, bypass, and post-sanitization correctness.

Target

  • I checked that the commit is targeting the correct branch (Camel 4 uses the main branch)

Tracking

  • If this is a large change, bug fix, or code improvement, I checked there is a JIRA issue filed for the change (usually before you start working on it).

Apache Camel coding standards and style

  • I checked that each commit in the pull request has a meaningful subject line and body.
  • I have run mvn clean install -DskipTests locally from root folder and I have committed all auto-generated changes.

@github-actions
Copy link
Copy Markdown
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run
  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
  • You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.
  • Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@github-actions github-actions Bot added the dsl label Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🧪 CI tested the following changed modules:

  • dsl/camel-jbang/camel-jbang-mcp

@oscerd oscerd requested review from Croway and luigidemasi March 30, 2026 16:44
Copy link
Copy Markdown
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Claude Code on behalf of Guillaume Nodet

Overview: This PR adds a PomSanitizer utility to detect and mask sensitive data (passwords, tokens, API keys) in POM content before processing by MCP migration tools. It adds a sanitizePom boolean parameter (default: true) to camel_migration_analyze, camel_dependency_check, and camel_migration_wildfly_karaf tools. Includes 21 unit tests for the sanitizer and 3 integration tests.

Verdict: Request changes


Blocking

  1. Rebase needed against current main — This PR was branched before dba5a0f7194e (CAMEL-23270), which added @Tool.Annotations(readOnlyHint, destructiveHint, openWorldHint) to all MCP tools. The PR's versions of MigrationTools.java, DependencyCheckTools.java, and MigrationWildflyKarafTools.java do not include the annotations parameter on @Tool. Merging as-is will either cause conflicts or silently drop the annotations. Please rebase onto current main.

Major

  1. Code duplication — The 13-line sanitization block is copy-pasted identically across all three tool methods:

    String processedPom = pomContent;
    List<String> sanitizationWarnings = new ArrayList<>();
    if (sanitizePom == null || sanitizePom) {
        PomSanitizer.SanitizationResult sr = PomSanitizer.sanitize(pomContent);
        processedPom = sr.pomContent();
        for (String pattern : sr.detectedPatterns()) {
            sanitizationWarnings.add("Sensitive data detected and masked: " + pattern);
        }
    }

    Consider extracting a helper into PomSanitizer, e.g.:

    record ProcessedPom(String content, List<String> warnings) {}
    static ProcessedPom process(String pomContent, Boolean sanitize) { ... }

    This keeps each tool method clean and ensures consistent behavior if the sanitization logic evolves.

  2. Missing integration tests for MigrationTools and MigrationWildflyKarafTools — Sanitization was added to all three tools, but integration tests were only added to DependencyCheckToolsTest. The other two tool test classes should also verify that:

    • sanitization masks sensitive data and produces warnings
    • sanitizePom=false bypasses sanitization
    • analysis still works correctly after sanitization

Minor

  1. <servers> is a settings.xml element, not a pom.xml element — The <servers> section belongs to Maven's settings.xml, not pom.xml. A valid POM should never contain <servers>. While stripping it as a safety net for accidental pastes is harmless, the Javadoc should clarify this (e.g., "Strips <servers> sections which belong to settings.xml and may be accidentally included").

Nit

  1. wasSanitized field is computed but never consumedSanitizationResult.wasSanitized() is only used in tests, never by the tool methods themselves (they check detectedPatterns instead). Consider removing it or documenting it's for testing/logging only.

  2. Per-pattern warning messages are verbose — The loop prefixes each detected pattern with "Sensitive data detected and masked: ". A single summary warning (e.g., "Sensitive data detected and masked: db.password, api.token") would be more concise in the tool response.


Overall the approach is sound — POM sanitization is a sensible security measure for MCP tools that accept user-provided POM content. The PomSanitizer regex patterns are well-crafted and the property placeholder preservation (${...}) is a nice touch. The test coverage for the sanitizer itself is thorough. The main action items are rebasing onto current main and reducing the code duplication.

@gnodet
Copy link
Copy Markdown
Contributor

gnodet commented Mar 31, 2026

Claude Code on behalf of Guillaume Nodet

Follow-up on point 4 from my review — on further thought, stripping <distributionManagement> is questionable and should probably be reconsidered:

<distributionManagement> contains repository URLs and IDs, not credentials. The actual credentials for those repositories live in settings.xml under <servers> (matched by <id>). At most, the URLs reveal internal hostnames, but that's not in the same category as passwords/tokens/API keys.

Stripping it removes legitimate POM information with no real security benefit. I'd suggest removing <distributionManagement> from the sanitizer's scope — or at minimum, making it opt-in rather than default behavior.

@gnodet
Copy link
Copy Markdown
Contributor

gnodet commented Mar 31, 2026

Claude Code on behalf of Guillaume Nodet

Another follow-up on the section stripping — the <servers> stripping has the same issue:

<servers> is a settings.xml element, not a pom.xml element. It should never appear in a valid POM file. Since all three tools accept "pom.xml file content" as input, stripping <servers> is guarding against something that can't legitimately exist in the input.

In summary, both section-stripping patterns are questionable:

  • <servers> — belongs to settings.xml, not pom.xml
  • <distributionManagement> — valid POM element but contains URLs/IDs, not credentials

I'd suggest removing both section-stripping patterns and keeping only the sensitive element value masking (passwords, tokens, API keys in properties, etc.), which is the part that genuinely adds security value.

@gnodet
Copy link
Copy Markdown
Contributor

gnodet commented Mar 31, 2026

Claude Code on behalf of Guillaume Nodet

One more observation on the sensitive data detection approach — the regex is purely tag-name-based, matching any XML element whose name contains keywords like "password", "token", "secret", etc. This has some limitations worth considering:

False positives — non-secret values in elements that happen to contain a keyword:

  • <password-policy>strict</password-policy> — config value, not a secret
  • <token-refresh-interval>300</token-refresh-interval> — numeric setting
  • <secret-sharing-enabled>true</secret-sharing-enabled> — boolean flag

False negatives — actual secrets in elements with non-obvious names:

  • <db.connection>jdbc:mysql://user:s3cret@host/db</db.connection> — credential embedded in a URL
  • <my.credential>actual-secret</my.credential> — note: "credential" (singular) is not in the keyword list, only "credentials" (plural)

The heuristic is reasonable as a best-effort safety net, but worth documenting these limitations — especially since false positives could mask legitimate configuration values that the migration analysis might need.

@oscerd
Copy link
Copy Markdown
Contributor Author

oscerd commented Mar 31, 2026

Claude Code on behalf of Andrea Cosentino

Thank you for the thorough review! All feedback has been addressed in the latest commit:

Blocking #1 (Rebase): Checked — no conflicts exist against current main (no changes to the MCP module since the branch point). CAMEL-23270 has not landed on main yet, so no rebase is needed at this time.

Major #2 (Code duplication): Extracted PomSanitizer.process(pomContent, sanitize) helper that returns a ProcessedPom(content, warnings) record. All three tool methods now use this single entry point instead of the duplicated 13-line block.

Major #3 (Missing integration tests): Added MigrationToolsTest and MigrationWildflyKarafToolsTest with sanitization tests covering: sensitive data masking with warnings, sanitizePom=false bypass, analysis correctness after sanitization, and clean POM baseline.

Follow-up: Remove section stripping: Removed both <servers> and <distributionManagement> stripping as suggested. <servers> belongs to settings.xml, and <distributionManagement> contains URLs/IDs, not credentials. Only sensitive element value masking remains.

Follow-up: Document regex limitations: Added comprehensive Javadoc to PomSanitizer documenting false positives (e.g., <password-policy>strict</password-policy>) and false negatives (e.g., credentials in JDBC URLs, singular <my.credential>).

Nit #5 (wasSanitized unused): Removed the field from SanitizationResult.

Nit #6 (Verbose warnings): Consolidated per-pattern warnings into a single summary: "Sensitive data detected and masked: db.password, api.token".

All 189 tests pass, code is formatted.

@oscerd oscerd changed the title CAMEL-23273 - Camel-Jbang-mcp: Warn about sensitive data in POM conte… CAMEL-23273: Camel-Jbang-mcp: Sanitize sensitive data in POM content passed to migration tools Mar 31, 2026
@oscerd oscerd requested a review from gnodet March 31, 2026 07:47
@Croway
Copy link
Copy Markdown
Contributor

Croway commented Apr 1, 2026

Why should we care about this? We're adding more to the LLM context just to work around a bad practice. Do we really want to be opinionated about that?

@oscerd
Copy link
Copy Markdown
Contributor Author

oscerd commented Apr 1, 2026

Why should we care about this? We're adding more to the LLM context just to work around a bad practice. Do we really want to be opinionated about that?

Considering the amount of slop we are going to see and face, maybe it will be good to be opinionated.

@gnodet
Copy link
Copy Markdown
Contributor

gnodet commented Apr 1, 2026

Claude Code on behalf of Guillaume Nodet

Suggestion: plugin-aware sanitization instead of tag-name heuristics

The current regex approach matches any XML element whose tag name contains keywords like "password", "token", "secret", etc. This has inherent limitations:

False positives — non-secret values get masked:

  • <password-policy>strict</password-policy> — a config value, not a credential
  • <token-refresh-interval>300</token-refresh-interval> — a numeric setting
  • These masked values might be useful for the migration analysis

False negatives — actual secrets are missed:

  • Credentials stored in elements with non-obvious names
  • <my.credential>secret</my.credential> — "credential" (singular) isn't in the keyword list

A more accurate approach would be plugin/mojo-aware:

  1. Maintain a catalog of known plugins and their sensitive configuration parameters:

    maven-deploy-plugin       → [password]
    maven-jarsigner-plugin    → [storepass, keypass]
    docker-maven-plugin       → [password, authConfig/password]
    sql-maven-plugin          → [password]
    maven-scm-plugin          → [password, passphrase]
    ...
    
  2. Parse the POM as XML, identify each <plugin> by its <artifactId>, look up its sensitive config params in the catalog.

  3. Trace property references: if a sensitive param uses ${prop.name}, resolve it back to <properties> and mask the property value there.

This is more work than the current regex, but it's precise — no false positives on config values, no false negatives on credentials in non-obviously-named elements, and no need to strip entire POM sections (<servers> belongs to settings.xml not pom.xml, and <distributionManagement> contains URLs not credentials).

oscerd and others added 2 commits April 1, 2026 11:23
…nt passed to migration tools

Add PomSanitizer utility to detect and mask sensitive data (passwords,
tokens, API keys, secrets) in POM content before processing. Strips
<servers> and <distributionManagement> sections. Add sanitizePom boolean
parameter (default: true) to camel_migration_analyze,
camel_dependency_check, and camel_migration_wildfly_karaf tools. Update
tool descriptions with sanitization guidance. Add 21 tests covering
detection, masking, placeholder preservation, and tool integration.

Signed-off-by: Andrea Cosentino <ancosen@gmail.com>
- Remove <servers> and <distributionManagement> section stripping (servers
  belongs to settings.xml, distributionManagement contains URLs not credentials)
- Extract PomSanitizer.process() helper to eliminate code duplication across
  MigrationTools, DependencyCheckTools, and MigrationWildflyKarafTools
- Consolidate per-pattern warnings into a single summary warning
- Remove unused wasSanitized field from SanitizationResult
- Document regex-based detection limitations (false positives/negatives)
- Add sanitization integration tests for MigrationTools and
  MigrationWildflyKarafTools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@oscerd
Copy link
Copy Markdown
Contributor Author

oscerd commented Apr 1, 2026

Claude Code on behalf of Andrea Cosentino

Rebased onto current main — the branch now includes CAMEL-23270 (@Tool.Annotations). All 189 tests pass.

Regarding the plugin-aware sanitization suggestion: I agree this would be more precise long-term, but it's a significantly larger effort (maintaining a plugin catalog, XML parsing, property tracing). The current tag-name heuristic is documented with its limitations (false positives/negatives in the Javadoc) and serves as a reasonable best-effort safety net for the initial implementation. We could evolve toward plugin-aware sanitization in a follow-up if the heuristic proves insufficient in practice.

@oscerd oscerd merged commit 95ca2a8 into main Apr 1, 2026
5 checks passed
@oscerd oscerd deleted the CAMEL-23273 branch April 1, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants