Skip to content
This repository was archived by the owner on Jun 6, 2024. It is now read-only.

Prometheus#4940

Merged
suiguoxin merged 3 commits intomicrosoft:masterfrom
shaiic-pai:prometheus
Sep 29, 2020
Merged

Prometheus#4940
suiguoxin merged 3 commits intomicrosoft:masterfrom
shaiic-pai:prometheus

Conversation

@shaiic-pai
Copy link
Contributor

@shaiic-pai shaiic-pai commented Sep 29, 2020

  • Alert-manager: Kill low-gpu-utilization jobs, tag abnormal jobs
    • add virtual cluster info in job-exporter
    • config monitor rules in prometheus
    • send action request through webhook
    • job-handler: deal with webhook request & redirect to RestServer
    • realize customized SMTP service in alert-handler, send alert email to user when possible, change email template to ejs
    • document how to customize alerts/actions

@coveralls
Copy link

Coverage Status

Coverage remained the same at 34.383% when pulling c573146 on shaiic-pai:prometheus into 9755553 on microsoft:master.

@suiguoxin suiguoxin merged commit cf4e6a8 into microsoft:master Sep 29, 2020
@shaiic-pai shaiic-pai deleted the prometheus branch September 29, 2020 08:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants