Skip to content

Chart: mariadb-headless Service selector also matches nginx & php-fpm pods → DNS poisoning #155

@CorentinRegnier

Description

@CorentinRegnier

Context

  • Chart: glpi-11.0.7
  • All four sub-components enabled (mariadb.enabled: true, default)

What happens

GLPI init Jobs (glpi-db-install, glpi-db-configure) fail with:

Database connection failed with message "(2002) Operation timed out".

Despite MariaDB running and reachable directly via its pod IP, ~75% of connections from GLPI to mariadb-headless.glpi.svc.cluster.local time out.

Root cause

templates/mariadb-statefulset.yaml declares a headless Service with this selector:

spec:
  selector:
{{ include "glpi.selectorLabels" . | indent 4 }}

glpi.selectorLabels in templates/_helpers.tpl:

app.kubernetes.io/name: {{ include "glpi.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}

These labels are applied chart-wide — every Deployment, StatefulSet and Job in the chart inherits them via glpi.labels. So the headless Service mariadb-headless matches:

  • mariadb-0 (intended)
  • nginx-* pods (collateral)
  • php-fpm-* pods (collateral)
$ kubectl get endpoints mariadb-headless
NAME                ENDPOINTS                                                          AGE
mariadb-headless    10.244.1.200:3306,10.244.0.30:3306,10.244.0.190:3306 + 1 more...   4m
#                                ^ mariadb-0       ^ nginx           ^ php-fpm

GLPI gets mariadb-headless.glpi.svc.cluster.local and round-robins across 4 IPs, 3 of which don't listen on :3306 → most connection attempts time out.

The ClusterIP Service mariadb does the right thing because it pins role: primary in its selector — only that one shows it.

Reproduction

  1. Install the chart with defaults.
  2. kubectl -n glpi get endpoints mariadb-headless → 4 endpoints.
  3. kubectl -n glpi run --rm -it debug --image=alpine -- shnc -vz mariadb-headless 3306 fails ~75% of the time.

Suggested fix

The selector on the headless Service should match MariaDB pods only. Two options:

Option A (minimal): add role: primary to the headless Service selector, the same way the ClusterIP Service does it:

spec:
  clusterIP: None
  selector:
{{ include "glpi.selectorLabels" . | indent 4 }}
    role: primary

Option B (cleaner): introduce a dedicated app.kubernetes.io/component: database label on MariaDB pods and select on it. This aligns with the standard k8s recommended label and keeps room for future sub-components (replicas, backup pods, …).

Workaround

Override MARIADB_HOST in both ConfigMaps via a post-renderer to point at the ClusterIP Service mariadb (whose selector is correct):

postRenderers:
  - kustomize:
      patches:
        - target: { kind: ConfigMap, name: glpi-config }
          patch: |
            - op: replace
              path: /data/MARIADB_HOST
              value: mariadb.glpi.svc.cluster.local
        - target: { kind: ConfigMap, name: mariadb-glpi-config }
          patch: |
            - op: replace
              path: /data/MARIADB_HOST
              value: mariadb.glpi.svc.cluster.local

Env

  • Chart: glpi-11.0.7
  • Kubernetes: AKS v1.35
  • Helm: v3 (via Flux helm-controller)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions