From 778a2180c8fd506dc80d7ceaffd98da8cd282cd6 Mon Sep 17 00:00:00 2001 From: Praveen Kumar Shanmugam <58961022+spraveenio@users.noreply.github.com> Date: Sat, 11 Apr 2026 17:06:51 -0700 Subject: [PATCH] metricsclient cli change (#1293) * metricsclient cli change * add test/e2e dependency to e2e sim * increase timeout (cherry picked from commit 5b1e46ae25062860ef1dc2e1ec9facdc814bde4d) --- docs/metrics/ecc-error-injection.md | 6 +++--- tests/e2e/Makefile | 2 +- tests/e2e/utils/utils.go | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/metrics/ecc-error-injection.md b/docs/metrics/ecc-error-injection.md index f3f179260..f35709892 100644 --- a/docs/metrics/ecc-error-injection.md +++ b/docs/metrics/ecc-error-injection.md @@ -56,7 +56,7 @@ ID Health Associated Workload ### 4. Inject ECC Errors on GPU 0 -In order to simulate errors on a GPU we will be using a json file that specifies a GPU ID along with counters for several ECC Uncorrectable error fields that are being monitored by the Device Metrics Exporter. In the below example you can see that we are specifying `GPU 0` and injecting 1 `GPU_ECC_UNCORRECT_SEM` error and 2 `GPU_ECC_UNCORRECT_FUSE` errors. We use the `metricslient -ecc-file-path ` command to specify the json file we want to inject into the metrics table. To create the json file and execute the metricsclient command all in in one go run the following: +In order to simulate errors on a GPU we will be using a json file that specifies a GPU ID along with counters for several ECC Uncorrectable error fields that are being monitored by the Device Metrics Exporter. In the below example you can see that we are specifying `GPU 0` and injecting 1 `GPU_ECC_UNCORRECT_SEM` error and 2 `GPU_ECC_UNCORRECT_FUSE` errors. We use the `metricsclient --ecc-file-path ` command to specify the json file we want to inject into the metrics table. To create the json file and execute the metricsclient command all in in one go run the following: ```bash kubectl exec -n kube-amd-gpu $METRICS_POD -c metrics-exporter-container -- sh -c 'cat > /tmp/ecc.json <