Add optional energy efficiency reporting schema for inference benchmarks by hongping-zh · Pull Request #2587 · mlcommons/inference

hongping-zh · 2026-05-14T08:36:28Z

Summary

This PR proposes an optional energy-efficiency reporting schema for MLPerf Inference results.

It adds a standalone schema package under energy-reporting/ and does not modify existing benchmark logic, submission flow, or current compliance requirements.

Motivation

During multi-round technical discussion in Issue #2558, several design directions converged:

Task-appropriate normalization:
- energy_per_token_joules for LLM workloads
- energy_per_query_joules for CV workloads
Prefill vs. generation energy separation for LLM
Static vs. active energy separation
Architecture-agnostic metric design
Multiple measurement backends (nvml, dcgm, rocm_smi, rapl, external_analyzer)

This PR translates those discussion outcomes into a concrete, reviewable schema artifact.

Scope of this PR (intentionally minimal)

This PR includes only:

energy-reporting/mlperf_energy_schema_v6.json
(JSON Schema, draft 2020-12)
energy-reporting/README.md
(field definitions, examples, validation-rule summary)

This PR does not include:

toolkit/runtime measurement code
reference result uploads
modifications to existing submission checker behavior
changes to current required fields

Compatibility / Impact

Backward compatible: Yes
Breaking change: No
Existing submitters affected: No (all fields are optional)
Compliance behavior changed: No (RFC proposal stage)

Validation

Schema and examples were validated locally:

JSON syntax/schema validity: ✅
LLM single-accelerator valid sample: ✅
CV multi-accelerator valid sample: ✅
LLM sample missing conditional required fields: ✅ correctly rejected

(Validation logs can be provided if reviewers request them.)

Request for Comments (RFC)

This PR is submitted as an RFC to collect Working Group feedback on field design and integration direction before any broader implementation steps.

Feedback is especially welcome on:

field naming and granularity
conditional requirements by task type
whether variability fields (e.g., std) should become mandatory in future revisions

cc @JiwaniZakir @arav-agarwal2

References

MLPerf Inference issue (primary thread): [Discussion] Adding energy consumption metrics to MLPerf Inference Benchmark #2558
Discussion context (specific comment): [Discussion] Adding energy consumption metrics to MLPerf Inference Benchmark #2558 (comment)
Zenodo dataset (360+ configurations): https://doi.org/10.5281/zenodo.19647290
DOI: 10.5281/zenodo.19647290
Paper / release artifact: https://github.com/hongping-zh

github-actions · 2026-05-14T08:36:43Z

MLCommons CLA bot:
Thank you very much for your submission; we really appreciate it. Before we can accept your contribution,
we ask that you sign the MLCommons CLA (Apache 2). Please submit your GitHub ID to our onboarding form to initiate
authorization. If you are from a MLCommons member organization, we will request that you be added to the CLA.
If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact
support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
❌ @hongping
hongping seems not to be a GitHub user. You need a GitHub account after you become MLCommons member. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request}

hongping-zh · 2026-05-14T08:43:09Z

recheck

hongping-zh · 2026-05-14T09:05:57Z

recheck

hongping-zh · 2026-05-15T03:53:48Z

Quick update: I have contacted support@mlcommons.org to resolve CLA mapping for GitHub account "hongping-zh". Waiting for support-side refresh, then I will run recheck immediately.

hongping-zh · 2026-05-26T08:30:12Z

recheck

dslik · 2026-05-26T14:11:59Z

Has there been a discussion in the inference WG regarding if solely measuring the accelerator power consumption is a useful (and non-misleading) reporting metric? High-performance inference requires coordination, processing and data movement tasks to be performed on the CPUs, and system DRAM and network usage also consumes significant power. I can see how this data would be valuable to augment entire-system power measurements, but I have concerns about it being presented on its own.

Also, it is important to ensure that measurements are taken of cumulative power draw, rather than instantaneous power draw, since the latter can easily result in misleading results. Careful rules (and verified implementations) are needed to prevent power measurements from easily being gamed.

hongping-zh · 2026-05-27T01:04:40Z

Thank you, David — this is an important concern, and I agree.

My intent is not for accelerator-only measurements to replace whole-system power or energy measurements. For high-performance inference, CPU coordination, host DRAM, networking, storage, and data movement can all be significant, and accelerator-only numbers would be misleading if presented as total system energy efficiency.

A better framing for this PR is therefore as an optional accelerator-level energy breakdown / supplementary reporting schema. The intended use is to augment whole-system measurements where available, and to provide attribution/debugging information about the accelerator-side behavior of a run, rather than to define a standalone system-level efficiency metric.

I also agree on cumulative energy. The schema should define fields such as total_energy_joules, active_energy_joules, and energy_per_token_joules as integrated energy over the benchmark measurement window, not as instantaneous power snapshots. Instantaneous power samples should only be intermediate samples used for integration, with the measurement window, sampling rate, and integration method documented.

I can update the README/schema wording to make this explicit, for example:

accelerator-only measurements must not be interpreted as whole-system energy efficiency;
whole-system power/energy should be reported where available;
reported energy fields are cumulative/integrated over the benchmark window;
validation rules should check measurement-window consistency and discourage easily gameable reporting.

Would this framing address your concern, or would you prefer that the fields be renamed more explicitly as accelerator-level fields to avoid ambiguity?

Add optional energy efficiency reporting schema for inference benchmarks

e87dc60

hongping-zh requested a review from a team as a code owner May 14, 2026 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional energy efficiency reporting schema for inference benchmarks#2587

Add optional energy efficiency reporting schema for inference benchmarks#2587
hongping-zh wants to merge 1 commit into
mlcommons:masterfrom
hongping-zh:energy-reporting-schema-rfc

hongping-zh commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

hongping-zh commented May 14, 2026

Uh oh!

hongping-zh commented May 14, 2026

Uh oh!

hongping-zh commented May 15, 2026

Uh oh!

hongping-zh commented May 26, 2026

Uh oh!

dslik commented May 26, 2026

Uh oh!

hongping-zh commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hongping-zh commented May 14, 2026

Summary

Motivation

Scope of this PR (intentionally minimal)

Compatibility / Impact

Validation

Request for Comments (RFC)

References

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

hongping-zh commented May 14, 2026

Uh oh!

hongping-zh commented May 14, 2026

Uh oh!

hongping-zh commented May 15, 2026

Uh oh!

hongping-zh commented May 26, 2026

Uh oh!

dslik commented May 26, 2026

Uh oh!

hongping-zh commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants