Skip to content

[backend] Fix strict_dynamic_mapping_exception exceptions thrown in fileIndexManager (#14665)#14655

Merged
fellowseb merged 9 commits intomasterfrom
issue/89-strict-dynamic-mapping-exception-in-fileindexmanager
Mar 5, 2026
Merged

[backend] Fix strict_dynamic_mapping_exception exceptions thrown in fileIndexManager (#14665)#14655
fellowseb merged 9 commits intomasterfrom
issue/89-strict-dynamic-mapping-exception-in-fileindexmanager

Conversation

@fellowseb
Copy link
Member

@fellowseb fellowseb commented Feb 26, 2026

Context

The issue arose because of missing index mappings in the attachment sub-document: we use an Elasticsearch pipeline processor for attachments that extracts fields for us. By default this processor extracts all the fields it can: https://www.elastic.co/guide/en/elasticsearch/reference/8.19/attachment.html#attachment-fields.

The problem is that we've created index mapping sfor only a subset of those fields (see document.ts). This added to the fact that we enforce dynamic: strict behavior on indices, meaning we don't let unknown fields be pushed on an index, resulted in a few exceptions.

Proposed changes

  • This PR specifies which fields to extract when configuring the attachment pipeline in ES/OS: those for which we already have a mapping.
    We could consider ingesting the other pieces of data but I'm not sure it's useful and the volume is ultra low for now.

  • I added an integration test making sur that a PDF with a metadata (dc:publisher or dc:rating) that could be extracted by the processor, but that isn't because we now tell it not to, doesn't fail the indexing.

  • I ran the test with ES and OS locally. Running with OS required tweaking the dev setup to install the ingest-attachment plugin before starting the OS process which requires a custom Dockerfile (https://docs.opensearch.org/latest/install-and-configure/install-opensearch/docker/#working-with-plugins).

Related issues

Checklist

  • I consider the submitted work as finished
  • I tested the code for its functionality
  • I wrote test cases for the relevant uses case (coverage and e2e)
  • I added/update the relevant documentation (either on github or on notion)
  • Where necessary I refactored code to improve the overall quality

Further comments

I had to track down how to name the metadata by looking at the ES code (https://github.com/elastic/elasticsearch/blob/main/modules/ingest-attachment/src/main/java/org/elasticsearch/ingest/attachment/AttachmentProcessor.java#L200) and the library itself uses (Apache TIka). Without the dc: prefix it wouldn't be seen by the processor.

I used https://www.embedpdf.com/tools/pdf-metadata-editor to add metadata to the test file and tika to read them like the ES dependency.

@github-actions github-actions bot added the filigran team use to identify PR from the Filigran team label Feb 26, 2026
@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 64.28571% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.37%. Comparing base (901f1ce) to head (6498c22).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ti-platform/opencti-graphql/src/database/engine.ts 64.28% 10 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #14655   +/-   ##
=======================================
  Coverage   32.37%   32.37%           
=======================================
  Files        3108     3108           
  Lines      211664   211692   +28     
  Branches    38383    38383           
=======================================
+ Hits        68516    68537   +21     
- Misses     143148   143155    +7     
Flag Coverage Δ
opencti-client-python 45.53% <ø> (ø)
opencti-front 2.82% <ø> (ø)
opencti-graphql 67.66% <64.28%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@fellowseb fellowseb force-pushed the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch from 698cd42 to 2ea4f68 Compare February 26, 2026 20:33
@fellowseb fellowseb marked this pull request as ready for review February 26, 2026 21:05
@fellowseb fellowseb self-assigned this Feb 26, 2026
@fellowseb fellowseb force-pushed the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch from 3635ce9 to c4b70ac Compare February 26, 2026 21:16
@aHenryJard
Copy link
Member

Can you please create a public issue in OpenCTI repo and link it https://github.com/OpenCTI-Platform/opencti/issues

@fellowseb fellowseb force-pushed the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch from 1f89b8f to c97d2f2 Compare February 27, 2026 08:51
@fellowseb fellowseb changed the title [backend] Fix strict_dynamic_mapping_exception exceptions thrown in fileIndexManager (#89) [backend] Fix strict_dynamic_mapping_exception exceptions thrown in fileIndexManager (#14665) Feb 27, 2026
@fellowseb
Copy link
Member Author

Can you please create a public issue in OpenCTI repo and link it https://github.com/OpenCTI-Platform/opencti/issues

Done. I 'll make sure the correct issue number is in the commit message when squashing too 👍 .

@fellowseb fellowseb force-pushed the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch from c97d2f2 to f8aaca1 Compare February 27, 2026 10:39
@fellowseb fellowseb force-pushed the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch 3 times, most recently from ea8f73f to bf5d902 Compare February 28, 2026 11:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses strict_dynamic_mapping_exception errors during file indexing by constraining which fields the Elasticsearch/OpenSearch attachment ingest processor is allowed to extract, aligning ingestion with the existing strict index mappings.

Changes:

  • Configure the attachment ingest pipeline (properties) to only extract fields that are mapped (separately for Elasticsearch vs OpenSearch).
  • Add an integration test indexing a PDF containing metadata that would previously trigger strict mapping failures.
  • Add an OpenSearch dev Docker image build (with ingest-attachment plugin) and update dependent test expectations.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
opencti-platform/opencti-graphql/tests/03-integration/04-manager/retentionManager-test.ts Updates expected file counts due to the new indexed test file.
opencti-platform/opencti-graphql/tests/03-integration/01-database/index-file-test.js Adds an integration test to validate indexing succeeds with “unhandled” PDF metadata.
opencti-platform/opencti-graphql/src/utils/type-utils.ts Adds TS utility types/helpers used for compile-time type assertions.
opencti-platform/opencti-graphql/src/modules/internal/document/document.ts Adds a compile-time check to keep attachment mappings aligned with extracted props.
opencti-platform/opencti-graphql/src/database/engine.ts Restricts ingest-attachment extracted properties for ES/OS pipelines.
opencti-platform/opencti-graphql/src/database/attachment-processor-props.ts Defines the explicit extracted-property allowlists (ES vs OpenSearch) + shared union type.
opencti-platform/opencti-dev/opensearch/Dockerfile Builds an OpenSearch image with the ingest-attachment plugin installed.
opencti-platform/opencti-dev/docker-compose.yml Switches OpenSearch service to build: the new Dockerfile and updates usage hint.

@fellowseb fellowseb force-pushed the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch from bf5d902 to 23afbf3 Compare March 5, 2026 00:03
@fellowseb fellowseb force-pushed the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch from 23afbf3 to 6498c22 Compare March 5, 2026 08:56
@fellowseb fellowseb requested a review from SouadHadjiat March 5, 2026 08:56
@fellowseb fellowseb merged commit e02f1a8 into master Mar 5, 2026
36 checks passed
@fellowseb fellowseb deleted the issue/89-strict-dynamic-mapping-exception-in-fileindexmanager branch March 5, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

filigran team use to identify PR from the Filigran team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backend error: strict_dynamic_mapping_exception, ([publisher] or [rating] within [attachment]), fileIndexManager usecase

4 participants