Monitoring for AMQ on OpenShift using capabilities provided in OCP 4.10.x
The user-defined alerting material in this repository is based on content from the following repository https://github.com/bszeti/openshift-app-monitoring-grafana
from https://github.com/bszeti.
There is hopes to migrate material from various sources into this repository https://github.com/amq-broker-hub/amq-on-openshift
Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/monitoring-overview.html
The Alertmanager service handles alerts received from Prometheus, and is responsible for sending the alerts to
external notification systems. This architecture for this service is best described in the documentation
located here: https://docs.openshift.com/container-platform/4.10/monitoring/monitoring-overview.html#understanding-the-monitoring-stack_monitoring-overview
It is important to understand the relationship between Alertmanager, Prometheus(s), and Thanos Querier.
Within the OpenShift Projects, Prometheus is observing various OpenShift components such as kube-state-metrics, openshift-state-metrics, and node-exporter agents. For User-Defined Porjects, the Prometheus operator manages Prometheus and Thanos Rules instances. These Prometheus instances then send the Alerts to Alertmanager so that they can be managed.
Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/managing-alerts.html
The Alert Monitoring UI povides the mechanism to manage a set of Alerts, including the set of Alerting rules, Alerts, and Silences. As part of the Alert Monitoring capability, notifications can be sent to a configured receiver type. As of OCP 4.10, these recevier types are: PagerDuty, Webhook, Email, Slack.
The cluster providers an Alertmanager to manage the set of Alerts received from the cluster Prometheus. This Alertmanager can be further customized manage the set of Alerts received from the user defined projects, or alertanatively, a seperate Alertmanager can be user for user defined projects.
The documentation for creating an additional Alertmanager for user defined projects can be found in the
OCP 4.11 Documenation located here: https://docs.openshift.com/container-platform/4.11/monitoring/managing-alerts.html#applying-a-custom-configuration-to-alertmanager-for-user-defined-alert-routing_managing-alerts.
Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/managing-alerts.html#configuring-alert-receivers_managing-alerts
Of note, these directions are to extend the cluster Alertmanager as specified in the OCP 4.10 documentation. These instructions are necessary to listen to Alerts received from the cluster Prometheus, and are then extended for user defined projects. Alerts from User defined projects are configured through Thanos Ruler / Prometheus.
From the OpenShift Console, traverse to the following location:
Administration → Cluster Settings → Configuration → Alertmanager
This displays the Alert routing and Receivers configuration. By default, there is a Critical, Default, and Watchdog entry where no Integration type has been specified. There is a "Create Receiver" button which can be used to create a new receiver.
In the Create Receiver, a Name and Type needs to be specified.
|
Tip
|
If you would like to test web hook alerting functionality, the http://webhook.site provides a generic URL that will accept any POST from webhook functionality. |
For a Webhook URL, use the URL provided by webhook.site. If you wish to fire alerts that match exactly to a criteria, a Routing Label can be specified. The following are regular expression examples:
-
namespace: ^(openshift|kube)$
-
alertname: ^(Cluster|Cloud|Machine|Pod|Kube|MCD|Alertmanager|etcd|TargetDown|CPU|Node|Clock|Prometheus|Failing|Network|IPTable)$
To view the currently active Alertmanager configuration into a file alertmanager.yaml. If unable, an example alertmanager.yaml has
provided in this repo ( openshift-monitoring/EXAMPLE_CUSTOM_CONFIG_alertmanager.yaml )
$ oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yamlTo update, edit the alertmanager.yaml with an addition stanza in the routes section.
global:
resolve_timeout: 5m
route:
...
routes:
...
- receiver: Webhook Alert Testing
match_re:
severity : .*
receivers:
- name: default
- name: watchdog
- name: Webhook Alert Testing
webhook_configs:
- url: 'https://webhook.site/c685d8d7-TESTING-REPLACE-0329b3025666'Then apply the new configuration in the file:
$ oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run=client -o=yaml | oc -n openshift-monitoring replace secret --filename=-Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/enabling-monitoring-for-user-defined-projects.html
First we need to enable User Workload Monitoring by creating the related ConfigMaps.
To determine the current cluster setting, run the following command:
oc describe configmap cluster-monitoring-config -n openshift-monitoringIf enableUserWorkload parameter is false or not present, then the parameter is set true.
In addition, the enableUserAlertmanagerConfig is also set to true.
cat openshift-monitoring/cluster-monitoring-config.yaml | oc apply -f -Next step is to apply a config map to the namespace hosting AMQ
cat openshift-monitoring/user-workload-monitoring-config.yaml | oc apply -f -To verify that the user defined prometheus-operator, prometheus-user-workload, and thanos-ruler-user-workload pods are
running in the openshift-user-workload-monitoring project.
$ oc -n openshift-user-workload-monitoring get pod
NAME READY STATUS RESTARTS AGE
prometheus-operator-6568d65cbc-wtwzj 2/2 Running 0 52m
prometheus-user-workload-0 6/6 Running 0 105s
prometheus-user-workload-1 6/6 Running 0 2m3s
thanos-ruler-user-workload-0 3/3 Running 0 119s
thanos-ruler-user-workload-1 3/3 Running 0 2m3sTraverse to the existing AMQ Broker namespace, or create a new namespace. The namespace name is reference in the prometheus rules, so note the name of the namespace.
For alerts baed on AMQ Broker behevior, ensure that "enableMetricsPlugin" is enabled (true).
To validate that metrics are available, traverse to the OCP Route(s) that access the
AMQ Console. By default, this Route contains "wconsj-" in the URL. Using this location,
append /metrics to the URL to view the data that can be scraped by Prometheus.
Apply monitoring view/edit roles to the namespace.
$ oc policy add-role-to-user monitoring-rules-view developer -n amq-710-playground
clusterrole.rbac.authorization.k8s.io/monitoring-rules-view added: "developer"
$ oc policy add-role-to-user monitoring-rules-edit developer -n amq-710-playground
clusterrole.rbac.authorization.k8s.io/monitoring-rules-edit added: "developer"
$ oc policy add-role-to-user monitoring-edit developer -n amq-710-playground
clusterrole.rbac.authorization.k8s.io/monitoring-edit added: "developer"Now that Workload Monitoring is configured, it is now time to create Prometheus Rules scoped to the user defined namespace.
For ActiveMQ, the are a number of Alerts that are recommended to understand the overall health of the broker.
The following set of rules provide the following:
-
AmqPodCrashLooping : If the pod is in waiting state ("CrashLoopBackOff") over a period of time.
-
AmqPodNotReady : If the pod (broker or broker operator) has been in a non-ready state for period of time.
-
AmqDeploymentGenerationMismatch : Amq deployment has failed.
-
AmqStatefulSetReplicasMismatch : The Amq stateful set has not matched the expected number of replicas.
-
AmqStatefulSetGenerationMismatch : The Amq stateful set generation does not match. It has failed and not been rolled back.
-
AmqStatefulSetUpdateNotRolledOut : The Amq stateful set update has not been rolled out.
$ cat amq-monitors/amq-monitoring-rules.yaml | oc apply -f -
prometheusrule.monitoring.coreos.com/amq-monitoring-rules createdTo enable the monitoring of ActiveMQ Artemis Prometheus Metrics
$ cat amq-monitors/amq-servicemonitor.yaml | oc apply -f -
servicemonitor.monitoring.coreos.com/ex-aao-app created