Skip to content

USHIFT-6957: Use nodename as Prometheus instance#6659

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
sjug:prom-instance-nodename
May 14, 2026
Merged

USHIFT-6957: Use nodename as Prometheus instance#6659
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
sjug:prom-instance-nodename

Conversation

@sjug

@sjug sjug commented May 12, 2026

Copy link
Copy Markdown
Contributor

Switches the Prometheus instance label from host:port to the Kubernetes NodeName so the same queries work uniformly against MicroShift and OpenShift cluster-monitoring. The label value is now resolved by replicating MicroShift's own NodeName logic (node.hostnameOverride from /etc/microshift/config.yaml and drop-ins under config.d/ in lexical order, lowercased; fallback to OS hostname) instead of using the inventory hostname, so the scrape label tracks what kubelet actually advertises even when an override is configured.

Label Prometheus metrics with the Kubernetes NodeName instead of
host:port to match the OpenShift cluster-monitoring convention so the
same tooling and queries can target MicroShift and OpenShift uniformly.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 12, 2026
@openshift-ci-robot

openshift-ci-robot commented May 12, 2026

Copy link
Copy Markdown

@sjug: This pull request references USHIFT-6957 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Label Prometheus metrics with the Kubernetes NodeName instead of host:port to match OpenShift cluster-monitoring's convention so the same tooling and queries can target MicroShift and OpenShift uniformly.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

The PR updates Ansible infrastructure to resolve Kubernetes node names from MicroShift configuration files and propagate that name through Prometheus scrape job configurations, kubelet logging blocks, and network query defaults, while adding CGNAT network support.

Changes

Kubernetes Node Name Resolution & Prometheus Configuration

Layer / File(s) Summary
Node name resolution from MicroShift config
ansible/roles/common/tasks/nodename.yml
New task file reads /etc/microshift/config.yaml and drop-in files under /etc/microshift/config.d, parses YAML to extract node.hostnameOverride, computes merged override (later drop-ins take precedence), and sets kubernetes_nodename fact with fallback to ansible_hostname.
Playbook orchestration and fact usage
ansible/setup-node.yml, ansible/roles/microshift-start/tasks/main.yml
Playbook adds new "Resolve microshift nodename" play to run nodename resolution before logging setup. microshift-start includes nodename task and updates Prometheus instance reference to use kubernetes_nodename instead of inventory_hostname:9100.
Prometheus scrape job instance labels
ansible/roles/add-kubelet-logging/tasks/main.yml, ansible/roles/install-logging/templates/prometheus.yml.j2
Kubelet and cadvisor scrape jobs now add labels.instance derived from kubernetes_nodename (with fallback to host). Template-based node, process, and crio jobs similarly emit per-host targets with instance labels.
Network query defaults and CGNAT support
ansible/roles/common/tasks/files/prom-network-query.py
Default Prometheus URL changes from http://zen3:9091 to http://prometheus:9091. IP validation now treats CGNAT range (100.64.0.0/10) as local/private in addition to loopback, link-local, and RFC1918. Help text updated to reflect new defaults.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: using Kubernetes nodename as the Prometheus instance label instead of host:port, which is the core objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains only Ansible config files (YAML, Python, Jinja2), no Go test files. The custom check for Ginkgo test names is not applicable.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code. Changes are only to Ansible roles, tasks, templates, and Python utilities. Custom check for test quality requirements is not applicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests added in this PR. Changes are limited to Ansible playbooks/roles for Prometheus configuration.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added. PR contains only Ansible automation and configuration changes (YAML, Python, Jinja templates). SNO test compatibility check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR is Ansible automation/configuration code only. No Kubernetes manifests, operator code, or scheduling constraints introduced. Check applies only to deployment manifests and operator code.
Ote Binary Stdout Contract ✅ Passed PR modifies only Ansible playbooks, Python utility scripts, and Jinja2 templates. Contains no OTE binaries, Go test code, or Ginkgo test framework. Check not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added in this PR. Changes consist only of Ansible playbooks, roles, Python scripts, and Jinja2 templates. Check is not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from ggiguash and pmtk May 12, 2026 17:20

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
ansible/roles/common/tasks/nodename.yml (2)

33-54: ⚡ Quick win

Add error handling for malformed YAML in config files.

The from_yaml filter will raise an error if config files contain invalid YAML, but there's no explicit handling for this case. A malformed drop-in will fail the entire playbook rather than gracefully falling back to ansible_hostname.

Consider wrapping the YAML parsing in a rescue block or validating files before parsing.

🛡️ Option: Add validation task

Before line 40, add:

    - name: Validate YAML syntax in config files
      ansible.builtin.command:
        cmd: python3 -c "import yaml; yaml.safe_load(open('{{ item }}'))"
      loop: "{{ _microshift_configs.results | selectattr('content', 'defined') | map(attribute='item') | list }}"
      register: _yaml_validation
      failed_when: false
      changed_when: false

    - name: Warn about invalid YAML files
      ansible.builtin.debug:
        msg: "Warning: {{ item.item }} contains invalid YAML and will be skipped"
      loop: "{{ _yaml_validation.results | selectattr('rc', 'defined') | selectattr('rc', 'ne', 0) | list }}"
      when: _yaml_validation is defined

Then filter out invalid files in the merge logic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ansible/roles/common/tasks/nodename.yml` around lines 33 - 54, The YAML
parsing in the set_fact block for _microshift_hostname_override uses from_yaml
on r.content and can raise on malformed YAML; update the logic to handle parse
errors by wrapping the from_yaml call in a Jinja2 try/rescue (or pre-validate
files) so that if parsing fails for a given r in _microshift_configs.results it
is skipped and does not abort the playbook, and ensure the namespace value
fallback remains ansible_hostname; specifically modify the template inside the
Compute hostnameOverride task (referencing _microshift_configs, parsed,
ns.value, and from_yaml) to catch exceptions (or check validity before parsing)
and only set ns.value when parsing succeeds.

43-54: ⚖️ Poor tradeoff

Consider simplifying the Jinja template logic.

The namespace-based iteration and nested conditionals are dense. This could be refactored into a custom Ansible filter or a separate Python script for better testability and maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ansible/roles/common/tasks/nodename.yml` around lines 43 - 54, The Jinja loop
is too dense—replace the inline namespace/loop with a custom Ansible filter
(e.g., filter name extract_hostname_override) that accepts
_microshift_configs.results, decodes each r.content, parses YAML, and returns
parsed.node.hostnameOverride or '' as a default; then call
extract_hostname_override from the template instead of the current block.
Implement the filter in filter_plugins (function name extract_hostname_override)
to handle b64decode/from_yaml, mapping checks for parsed and parsed.node, and
safe defaults so the template becomes a single function call returning ns.value.
ansible/setup-node.yml (1)

51-57: 💤 Low value

Minor: Inconsistent capitalization in task name.

The play name uses "Resolve microshift nodename" (line 51) but the task name uses "resolve kubernetes nodename" (line 56). Consider consistent capitalization for readability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ansible/setup-node.yml` around lines 51 - 57, The play uses "Resolve
microshift nodename" but the task is titled "resolve kubernetes nodename" which
is inconsistent; update the task name string "resolve kubernetes nodename" to
match the play's capitalization and wording (e.g., "Resolve microshift nodename"
or at least capitalize and standardize to "Resolve Kubernetes nodename") so both
the play name and the task name are consistent; edit the task title in the
include_tasks block that references roles/common/tasks/nodename.yml.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ansible/roles/common/tasks/nodename.yml`:
- Around line 33-54: The YAML parsing in the set_fact block for
_microshift_hostname_override uses from_yaml on r.content and can raise on
malformed YAML; update the logic to handle parse errors by wrapping the
from_yaml call in a Jinja2 try/rescue (or pre-validate files) so that if parsing
fails for a given r in _microshift_configs.results it is skipped and does not
abort the playbook, and ensure the namespace value fallback remains
ansible_hostname; specifically modify the template inside the Compute
hostnameOverride task (referencing _microshift_configs, parsed, ns.value, and
from_yaml) to catch exceptions (or check validity before parsing) and only set
ns.value when parsing succeeds.
- Around line 43-54: The Jinja loop is too dense—replace the inline
namespace/loop with a custom Ansible filter (e.g., filter name
extract_hostname_override) that accepts _microshift_configs.results, decodes
each r.content, parses YAML, and returns parsed.node.hostnameOverride or '' as a
default; then call extract_hostname_override from the template instead of the
current block. Implement the filter in filter_plugins (function name
extract_hostname_override) to handle b64decode/from_yaml, mapping checks for
parsed and parsed.node, and safe defaults so the template becomes a single
function call returning ns.value.

In `@ansible/setup-node.yml`:
- Around line 51-57: The play uses "Resolve microshift nodename" but the task is
titled "resolve kubernetes nodename" which is inconsistent; update the task name
string "resolve kubernetes nodename" to match the play's capitalization and
wording (e.g., "Resolve microshift nodename" or at least capitalize and
standardize to "Resolve Kubernetes nodename") so both the play name and the task
name are consistent; edit the task title in the include_tasks block that
references roles/common/tasks/nodename.yml.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 17ac164f-4570-41c7-8dd0-e7f8bb042889

📥 Commits

Reviewing files that changed from the base of the PR and between db194f3 and 70d39b3.

📒 Files selected for processing (6)
  • ansible/roles/add-kubelet-logging/tasks/main.yml
  • ansible/roles/common/tasks/files/prom-network-query.py
  • ansible/roles/common/tasks/nodename.yml
  • ansible/roles/install-logging/templates/prometheus.yml.j2
  • ansible/roles/microshift-start/tasks/main.yml
  • ansible/setup-node.yml

@ggiguash

Copy link
Copy Markdown
Contributor

/lgtm
/verified by @sjug

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 14, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@ggiguash: This PR has been marked as verified by @sjug.

Details

In response to this:

/lgtm
/verified by @sjug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2026
@openshift-ci

openshift-ci Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, sjug

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2026
@openshift-ci

openshift-ci Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

@sjug: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 23e4691 into openshift:main May 14, 2026
7 checks passed
@sjug sjug deleted the prom-instance-nodename branch May 14, 2026 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants