Skip to content
This repository was archived by the owner on Jun 6, 2024. It is now read-only.

Commit 72dc204

Browse files
committed
add cluster-utilization report doc
1 parent ea3d044 commit 72dc204

File tree

4 files changed

+26
-3
lines changed

4 files changed

+26
-3
lines changed

contrib/kubespray/quick-start/services-configuration.yaml.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ authentication:
225225
# smtp-auth-username: alert-sender@example.com
226226
# smtp-auth-password: password-for-alert-sender
227227
# cluster-utilization: # cluster-utilization is a k8s CronJob which reports the GPU utilization of the cluster
228-
# # for schedule syntex, refer to https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax
228+
# # for schedule syntax, refer to https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax
229229
# schedule: "0 0 * * *" # daily report at UTC 00:00
230230
# customized-routes:
231231
# routes:

deployment/quick-start/services-configuration.yaml.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ rest-server:
8585
# smtp-auth-username: alert-sender@example.com
8686
# smtp-auth-password: password-for-alert-sender
8787
# cluster-utilization: # cluster-utilization is a k8s CronJob which reports the GPU utilization of the cluster
88-
# # for schedule syntex, refer to https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax
88+
# # for schedule syntax, refer to https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax
8989
# schedule: "0 0 * * *" # daily report at UTC 00:00
9090
# customized-routes:
9191
# routes:

docs/manual/cluster-admin/how-to-use-alert-system.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,3 +248,26 @@ Remember to re-build and push the docker image, and restart the `alert-manager`
248248
./paictl.py config push -p /cluster-configuration -m service
249249
./paictl.py service start -n alert-manager
250250
```
251+
252+
## Cluster GPU Utilization Report
253+
254+
We provide the functionality to send cluster GPU utilization report regularly to admin users.
255+
256+
The report includes the statistics for:
257+
- Cluster GPU utilization
258+
- User GPU utilization
259+
- Job GPU utilization
260+
261+
To enable this feature, you should configure the `alert-manager` field in `services-configuration.yml`.
262+
`pai-bearer-token` & `cluster-utilization`->`schedule` are necessary fields for this feature.
263+
For the syntax of `schedule`, please refer to [Cron Schedule Syntax](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax).
264+
For example, `"0 0 * * *"` means daily report at UTC 00:00.
265+
Please also make sure that the [`email-admin`](#Existing-Actions-and-Matching-Rules) action is enabled.
266+
267+
```yaml
268+
alert-manager:
269+
pai-bearer-token: 'your-application-token-for-pai-rest-server'
270+
cluster-utilization: # cluster-utilization is a k8s CronJob which reports the GPU utilization of the cluster
271+
# for schedule syntax, refer to https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax
272+
schedule: "0 0 * * *" # daily report at UTC 00:00
273+
```

examples/cluster-configuration/services-configuration.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ rest-server:
118118
# smtp-auth-username: alert-sender@example.com
119119
# smtp-auth-password: password-for-alert-sender
120120
# cluster-utilization: # cluster-utilization is a k8s CronJob which reports the GPU utilization of the cluster
121-
# # for schedule syntex, refer to https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax
121+
# # for schedule syntax, refer to https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-schedule-syntax
122122
# schedule: "0 0 * * *" # daily report at UTC 00:00
123123
# customized-routes:
124124
# routes:

0 commit comments

Comments
 (0)