Skip to content

[receiver/k8scluster] Service metrics #45620

@jinja2

Description

@jinja2

Component(s)

receiver/k8scluster

Is your feature request related to a problem? Please describe.

While the k8scluster receiver currently provides pod-level health metrics, k8s service can aggregate endpoints across multiple deployments via label selectors. Aggregating matching pods to determine service health can be complex.
I'd like to introduce a k8s.service entity and associated metrics to provide a direct view of service-level availability and provisioning status. I have a PR for this entity and metrics in semantic-conventions.

Describe the solution you'd like

New Metrics:

  • k8s.service.endpoint.count: Gauge counting endpoints by condition, address type (IPv4/IPv6/FQDN), and zone.

  • k8s.service.load_balancer.ingress.count: Gauge counting provisioned ingress points for an LB type service

New Resource Attributes :

  • k8s.service.name (Enabled)
  • k8s.service.uid (Enabled)
  • k8s.service.type (Enabled)
  • k8s.service.traffic_distribution (Opt-in)
  • k8s.service.publish_not_ready_addresses (Opt-in)

Entity Metadata: Emits entity events for k8s.service including selectors, labels, and annotations.

I think the above metrics/metadata implementation enables the following use cases:

Service Availability during Deployments: By using the EndpointSlice API, we provide k8s.service.endpoint.count broken down by ready, serving, and terminating conditions. This allows k8s users to track how many endpoints are ready for new traffic vs. those still serving in-flight requests while draining, providing a high-fidelity view of rolling update progress.

Zone-Aware Distribution: The endpoint count include the zone attribute which will help monitoring that topology-aware routing has sufficient endpoints in each zone to prevent cross-zone latency and costs.

LoadBalancer Provisioning Tracking: The new k8s.service.load_balancer.ingress.count metric confirms whether external IPs or hostnames have been assigned by the cloud provider's controller. This helps track provisioning status and latency.

Describe alternatives you've considered

No response

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions