-
Notifications
You must be signed in to change notification settings - Fork 960
DevX: Track the benchmark infra health and usage #8247
Copy link
Copy link
Open
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: benchmarkIssues related to the benchmark infrastructureIssues related to the benchmark infrastructuremodule: user experienceIssues related to reducing friction for usersIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Metadata
Metadata
Assignees
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: benchmarkIssues related to the benchmark infrastructureIssues related to the benchmark infrastructuremodule: user experienceIssues related to reducing friction for usersIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
Cold Storage
Status
Backlog
Status
Ready
Today I'm monitoring the infra health only via the HUD by filtering jobs with "-perf": https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true
I'm wondering if there is a better way to monitor the health and with detailed metrics. It could be something like this: https://hud.pytorch.org/metrics, where I can see the historical run and success rate of the benchmark jobs, nightly runs vs. on-demand. High frequent failures, hotspot devices, etc.
cc: @kimishpatel @digantdesai
cc @huydhn @kirklandsign @shoumikhin @mergennachin @byjlw