Skip to content

New component: Enrichment Processor #41816

@jsvd

Description

@jsvd

The purpose and use-cases of the new component

This issue is a follow up to a presentation regarding enhancing enrichment capabilities of the Collector on the Collector SIG meeting (July 23rd).
The feedback was to create an issue to get the discussion going, so here it is:

The OpenTelemetry Collector currently supports limited enrichment types, mostly focusing on self-contained parsing and contextual metadata. To improve the versatility of the Collector in comparison to other data collectors and transformation tools, we should expand its capabilities to include other enrichment types.

The original document “Enrichment in OTel Collector” introduces a taxonomy of enrichment types and its support in the Collector:
• Type 1: Self-Contained Parsing & Derivation (supported)
• Type 2: Reference Data Lookup (Static or Semi-Static) (very limited support)
• Type 3: Dynamic External Enrichment (Live Lookups) (not supported)
• Type 4: Contextual Metadata Enrichment (supported)
• Type 5: Cross-Event Correlation & Aggregation (not supported)
• Type 6: Analytical & ML-Based Enrichment (not supported)

Of this list, looking at similar tools (comparison can be seen in the original document), type 2 and type 3 are the strongest candidates to include in the Collector to facilitate migration of workloads to the Collector from other tools.

From this problem statement we could consider introducing a Lookup Processor to aimed at handling both static reference data lookups (Type 2) and dynamic external enrichments (Type 3).

The processor would support:

  • Local lookups: Using static or semi-static data sources such as CSV, JSON, or inline key/value pairs.
  • Remote lookups: Dynamic enrichment from APIs, DNS, databases, or cache systems like Redis or Memcached.

Lookups should be done according to the scope that they're configured for depending on the location of the attribute used for the lookups. The configuration below shows examples of resource and log level lookups.

Example configuration for the component

processors:
  # YAML source - map service.name to display name
  lookup/yaml:
    source:
      type: yaml
      path: ./mappings.yaml
    attributes:
      - key: service.display_name
        from_attribute: service.name
        default: "Unknown Service"
        action: upsert
        context: resource

  # HTTP source - lookup user name from API
  lookup/http:
    source:
      type: http
      url: "http://localhost:8080/users/{key}"
      method: GET
      timeout: 5s
      response_field: "name"
      cache:
        enabled: true
        size: 1000
        ttl: 5m
    attributes:
      - key: user.name
        from_attribute: user.id
        context: log
        action: upsert

  # DNS source - reverse DNS lookup for client IP
  lookup/dns:
    source:
      type: dns
      record_type: PTR
      timeout: 2s
      server: "8.8.8.8:53"
      cache:
        enabled: true
        size: 1000
        ttl: 5m
    attributes:
      - key: client.hostname
        from_attribute: client.ip
        default: "unknown"
        action: upsert
        context: log

Implementing such a processor requires significant considerations to abide by the long term vision, namely:
• Progressive implementation, starting with basic local and then remote lookup capabilities.
• Modular structure facilitating easy addition of new lookup sources.
• Built-in caching and timeout mechanisms for performance optimization.
• Comprehensive and useful observability metrics (success/failure rates, latency percentiles).

This is not an exhaustive list of concerns and neither does it provide solutions, just an acknowledgment that these will have to be addressed.

Telemetry data types supported

Logs first given that enrichment is a more common logging use case requirement.
This could then be extended to Metrics, Traces, and Profiles.

Code Owner(s)

@jsvd, @VihasMakwana, @dehaansa

Sponsor (optional)

@dehaansa

Additional context

Similar ideas and alternative solutions have been suggested in the past:

Trade Offs Considered

New component vs resourcedetectionprocessor

Conclusion: new component

While there are similarities to the resourcedetectionprocessor, performing lookups to a remote source is not in scope for resource detection which, according to its README, is to detect resource information from the host, while this processor would look to data that is not necessarily either present in the host or about the host.
Also these remote lookups introduce enough performance concerns that are better addressed with a separate implementation such as supporting configurable caching.

Extensibility method - extensions vs NewFactoryWithOptions

Conclusion: NewFactoryWithOptions

This processor should support third-party lookup sources. This could either be implemented by collector extensions or with an extensible factory using NewFactoryWithOptions (like filterprocessor and transformprocessors).

For now the decision is to use NewFactoryWithOptions not not extensions, as extensions in the collector should be reusable by any component, and there are no current use cases requiring lookups during a component's execution, like the storageextension, that can't be replaced by using a lookupprocessor before or after said component.

Signal Support

Conclusion: Logs first, others after

The decision is to support Log signal first, the signal that most often needs such enrichment, and then extend to others as the processor matures.

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions