ocp-amq-monitoring

Monitoring for AMQ on OpenShift using capabilities provided in OCP 4.10.x

Attributions

The user-defined alerting material in this repository is based on content from the following repository https://github.com/bszeti/openshift-app-monitoring-grafana from https://github.com/bszeti.

There is hopes to migrate material from various sources into this repository https://github.com/amq-broker-hub/amq-on-openshift

Openshift Alert Monitoring (v4.10)

Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/monitoring-overview.html

The Alertmanager service handles alerts received from Prometheus, and is responsible for sending the alerts to external notification systems. This architecture for this service is best described in the documentation located here: https://docs.openshift.com/container-platform/4.10/monitoring/monitoring-overview.html#understanding-the-monitoring-stack_monitoring-overview

It is important to understand the relationship between Alertmanager, Prometheus(s), and Thanos Querier.

Within the OpenShift Projects, Prometheus is observing various OpenShift components such as kube-state-metrics, openshift-state-metrics, and node-exporter agents. For User-Defined Porjects, the Prometheus operator manages Prometheus and Thanos Rules instances. These Prometheus instances then send the Alerts to Alertmanager so that they can be managed.

OpenShift Managing Alerts (Cluster)

Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/managing-alerts.html

The Alert Monitoring UI povides the mechanism to manage a set of Alerts, including the set of Alerting rules, Alerts, and Silences. As part of the Alert Monitoring capability, notifications can be sent to a configured receiver type. As of OCP 4.10, these recevier types are: PagerDuty, Webhook, Email, Slack.

The cluster providers an Alertmanager to manage the set of Alerts received from the cluster Prometheus. This Alertmanager can be further customized manage the set of Alerts received from the user defined projects, or alertanatively, a seperate Alertmanager can be user for user defined projects.

The documentation for creating an additional Alertmanager for user defined projects can be found in the OCP 4.11 Documenation located here: https://docs.openshift.com/container-platform/4.11/monitoring/managing-alerts.html#applying-a-custom-configuration-to-alertmanager-for-user-defined-alert-routing_managing-alerts.

Configuring Alert Receiver

Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/managing-alerts.html#configuring-alert-receivers_managing-alerts

Of note, these directions are to extend the cluster Alertmanager as specified in the OCP 4.10 documentation. These instructions are necessary to listen to Alerts received from the cluster Prometheus, and are then extended for user defined projects. Alerts from User defined projects are configured through Thanos Ruler / Prometheus.

Console

From the OpenShift Console, traverse to the following location:

Administration → Cluster Settings → Configuration → Alertmanager

This displays the Alert routing and Receivers configuration. By default, there is a Critical, Default, and Watchdog entry where no Integration type has been specified. There is a "Create Receiver" button which can be used to create a new receiver.

In the Create Receiver, a Name and Type needs to be specified.

Tip	If you would like to test web hook alerting functionality, the http://webhook.site provides a generic URL that will accept any POST from webhook functionality.

For a Webhook URL, use the URL provided by webhook.site. If you wish to fire alerts that match exactly to a criteria, a Routing Label can be specified. The following are regular expression examples:

namespace: ^(openshift|kube)$
alertname: ^(Cluster|Cloud|Machine|Pod|Kube|MCD|Alertmanager|etcd|TargetDown|CPU|Node|Clock|Prometheus|Failing|Network|IPTable)$

Command Line

To view the currently active Alertmanager configuration into a file alertmanager.yaml. If unable, an example alertmanager.yaml has provided in this repo ( openshift-monitoring/EXAMPLE_CUSTOM_CONFIG_alertmanager.yaml )

$ oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml

To update, edit the alertmanager.yaml with an addition stanza in the routes section.

global:
  resolve_timeout: 5m
route:
  ...
  routes:
    ...
    - receiver: Webhook Alert Testing
      match_re:
        severity : .*

receivers:
- name: default
- name: watchdog
- name: Webhook Alert Testing
  webhook_configs:
    - url: 'https://webhook.site/c685d8d7-TESTING-REPLACE-0329b3025666'

Then apply the new configuration in the file:

$ oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run=client -o=yaml |  oc -n openshift-monitoring replace secret --filename=-

Enable User Workload Monitoring

Documentation reference: https://docs.openshift.com/container-platform/4.10/monitoring/enabling-monitoring-for-user-defined-projects.html

First we need to enable User Workload Monitoring by creating the related ConfigMaps.

To determine the current cluster setting, run the following command:

oc describe configmap cluster-monitoring-config -n openshift-monitoring

If enableUserWorkload parameter is false or not present, then the parameter is set true. In addition, the enableUserAlertmanagerConfig is also set to true.

cat openshift-monitoring/cluster-monitoring-config.yaml | oc apply -f -

Next step is to apply a config map to the namespace hosting AMQ

cat openshift-monitoring/user-workload-monitoring-config.yaml | oc apply -f -

To verify that the user defined prometheus-operator, prometheus-user-workload, and thanos-ruler-user-workload pods are running in the openshift-user-workload-monitoring project.

$ oc -n openshift-user-workload-monitoring get pod
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-6568d65cbc-wtwzj   2/2     Running   0          52m
prometheus-user-workload-0             6/6     Running   0          105s
prometheus-user-workload-1             6/6     Running   0          2m3s
thanos-ruler-user-workload-0           3/3     Running   0          119s
thanos-ruler-user-workload-1           3/3     Running   0          2m3s

AMQ Namespace / Broker Configuration

Traverse to the existing AMQ Broker namespace, or create a new namespace. The namespace name is reference in the prometheus rules, so note the name of the namespace.

For alerts baed on AMQ Broker behevior, ensure that "enableMetricsPlugin" is enabled (true).

To validate that metrics are available, traverse to the OCP Route(s) that access the AMQ Console. By default, this Route contains "wconsj-" in the URL. Using this location, append /metrics to the URL to view the data that can be scraped by Prometheus.

Granting User Permission to Monitor

Apply monitoring view/edit roles to the namespace.

$ oc policy add-role-to-user monitoring-rules-view developer -n amq-710-playground
clusterrole.rbac.authorization.k8s.io/monitoring-rules-view added: "developer"
$ oc policy add-role-to-user monitoring-rules-edit developer -n amq-710-playground
clusterrole.rbac.authorization.k8s.io/monitoring-rules-edit added: "developer"
$ oc policy add-role-to-user monitoring-edit developer -n amq-710-playground
clusterrole.rbac.authorization.k8s.io/monitoring-edit added: "developer"

Now that Workload Monitoring is configured, it is now time to create Prometheus Rules scoped to the user defined namespace.

Creating User Defined Prometheus Rules

For ActiveMQ, the are a number of Alerts that are recommended to understand the overall health of the broker.

The following set of rules provide the following:

AmqPodCrashLooping : If the pod is in waiting state ("CrashLoopBackOff") over a period of time.
AmqPodNotReady : If the pod (broker or broker operator) has been in a non-ready state for period of time.
AmqDeploymentGenerationMismatch : Amq deployment has failed.
AmqStatefulSetReplicasMismatch : The Amq stateful set has not matched the expected number of replicas.
AmqStatefulSetGenerationMismatch : The Amq stateful set generation does not match. It has failed and not been rolled back.
AmqStatefulSetUpdateNotRolledOut : The Amq stateful set update has not been rolled out.

$ cat amq-monitors/amq-monitoring-rules.yaml | oc apply -f -
prometheusrule.monitoring.coreos.com/amq-monitoring-rules created

To enable the monitoring of ActiveMQ Artemis Prometheus Metrics

$ cat amq-monitors/amq-servicemonitor.yaml | oc apply -f -
servicemonitor.monitoring.coreos.com/ex-aao-app created

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocp-amq-monitoring

Attributions

Openshift Alert Monitoring (v4.10)

OpenShift Managing Alerts (Cluster)

Configuring Alert Receiver

Console

Command Line

Enable User Workload Monitoring

AMQ Namespace / Broker Configuration

Granting User Permission to Monitor

Creating User Defined Prometheus Rules

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
amq-monitors		amq-monitors
openshift-monitoring		openshift-monitoring
README.adoc		README.adoc

Folders and files

Latest commit

History

Repository files navigation

ocp-amq-monitoring

Attributions

Openshift Alert Monitoring (v4.10)

OpenShift Managing Alerts (Cluster)

Configuring Alert Receiver

Console

Command Line

Enable User Workload Monitoring

AMQ Namespace / Broker Configuration

Granting User Permission to Monitor

Creating User Defined Prometheus Rules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages