|
| 1 | + |
| 2 | +## Metrics in Prometheus: |
| 3 | +- Metrics in Prometheus are the core data objects that represent measurements collected from monitored systems. |
| 4 | +- These metrics provide insights into various aspects of **system performance, health, and behavior**. |
| 5 | + |
| 6 | +## Labels: |
| 7 | +- Metrics are paired with Labels. |
| 8 | +- Labels are key-value pairs that allow you to differentiate between dimensions of a metric, such as different services, instances, or endpoints. |
| 9 | + |
| 10 | + |
| 11 | +## Example: |
| 12 | +```bash |
| 13 | +container_cpu_usage_seconds_total{namespace="kube-system", endpoint="https-metrics"} |
| 14 | +``` |
| 15 | +- `container_cpu_usage_seconds_total` is the metric. |
| 16 | +- `{namespace="kube-system", endpoint="https-metrics"}` are the labels. |
| 17 | + |
| 18 | +## Types of Metrics in Prometheus |
| 19 | +- **Counter**: |
| 20 | + - A Counter is a cumulative metric that represents a single numerical value that only ever goes up. It is used for counting events like the number of HTTP requests, errors, or tasks completed. |
| 21 | + - **Example**: Counting the number of times a container restarts in your Kubernetes cluster |
| 22 | + - **Metric Example**: `kube_pod_container_status_restarts_total` |
| 23 | + |
| 24 | +- **Gauge**: |
| 25 | + - A Gauge is a metric that represents a single numerical value that can go up and down. It is typically used for things like memory usage, CPU usage, or the current number of active users. |
| 26 | + - **Example**: Monitoring the memory usage of a container in your Kubernetes cluster. |
| 27 | + - **Metric Example**: `container_memory_usage_bytes` |
| 28 | + |
| 29 | +- **Histogram**: |
| 30 | + - A Histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. |
| 31 | + - It also provides a sum of all observed values and a count of observations. |
| 32 | + - **Example**: Measuring the response time of Kubernetes API requests in various time buckets. |
| 33 | + - **Metric Example**: `apiserver_request_duration_seconds_bucket` |
| 34 | + |
| 35 | +- Summary: |
| 36 | + - Similar to a Histogram, a Summary samples observations and provides a total count of observations, their sum, and configurable quantiles (percentiles). |
| 37 | + - **Example**: Monitoring the 95th percentile of request durations to understand high latency in your Kubernetes API. |
| 38 | + - **Metric Example**: `apiserver_request_duration_seconds_sum` |
| 39 | + |
| 40 | +## What is PromQL? |
| 41 | +- PromQL (Prometheus Query Language) is a powerful and flexible query language used to query data from Prometheus. |
| 42 | +- It allows you to retrieve and manipulate time series data, perform mathematical operations, aggregate data, and much more. |
| 43 | + |
| 44 | +- Key Features of PromQL: |
| 45 | + - Selecting Time Series: You can select specific metrics with filters and retrieve their data. |
| 46 | + - Mathematical Operations: PromQL allows for mathematical operations on metrics. |
| 47 | + - Aggregation: You can aggregate data across multiple time series. |
| 48 | + - Functionality: PromQL includes a wide range of functions to analyze and manipulate data. |
| 49 | + |
| 50 | +## Basic Examples of PromQL |
| 51 | +- `container_cpu_usage_seconds_total` |
| 52 | + - Return all time series with the metric container_cpu_usage_seconds_total |
| 53 | +- `container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}` |
| 54 | + - Return all time series with the metric `container_cpu_usage_seconds_total` and the given `namespace` and `pod` labels. |
| 55 | +- `container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}[5m]` |
| 56 | + - Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector. |
| 57 | + |
| 58 | +## Aggregation & Functions in PromQL |
| 59 | +- Aggregation in PromQL allows you to combine multiple time series into a single one, based on certain labels. |
| 60 | +- **Sum Up All CPU Usage**: |
| 61 | + ```bash |
| 62 | + sum(rate(node_cpu_seconds_total[5m])) |
| 63 | + ``` |
| 64 | + - This query aggregates the CPU usage across all nodes. |
| 65 | + |
| 66 | +- **Average Memory Usage per Namespace:** |
| 67 | + ```bash |
| 68 | + avg(container_memory_usage_bytes) by (namespace) |
| 69 | + ``` |
| 70 | + - This query provides the average memory usage grouped by namespace. |
| 71 | + |
| 72 | +- **rate() Function:** |
| 73 | + - The rate() function calculates the per-second average rate of increase of the time series in a specified range. |
| 74 | + ```bash |
| 75 | + rate(container_cpu_usage_seconds_total[5m]) |
| 76 | + ``` |
| 77 | + - This calculates the rate of CPU usage over 5 minutes. |
| 78 | +- **increase() Function:** |
| 79 | + - The increase() function returns the increase in a counter over a specified time range. |
| 80 | + ```bash |
| 81 | + increase(kube_pod_container_status_restarts_total[1h]) |
| 82 | + ``` |
| 83 | + - This gives the total increase in container restarts over the last hour. |
| 84 | + |
| 85 | +- **histogram_quantile() Function:** |
| 86 | + - The histogram_quantile() function calculates quantiles (e.g., 95th percentile) from histogram data. |
| 87 | + ```bash |
| 88 | + histogram_quantile(0.95, sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le)) |
| 89 | + ``` |
| 90 | + - This calculates the 95th percentile of Kubernetes API request durations. |
0 commit comments