Skip to content

profiles: improve JFR export example#8349

Open
jhalliday wants to merge 2 commits into
open-telemetry:mainfrom
jhalliday:jh-profiling-t
Open

profiles: improve JFR export example#8349
jhalliday wants to merge 2 commits into
open-telemetry:mainfrom
jhalliday:jh-profiling-t

Conversation

@jhalliday

Copy link
Copy Markdown
Contributor

Add metadata to the OTLP message so as to make it more interpretable by receiving backends.

@jhalliday jhalliday requested a review from a team as a code owner April 30, 2026 13:18
@codecov

codecov Bot commented Apr 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.76%. Comparing base (824334c) to head (5e09b77).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8349      +/-   ##
============================================
- Coverage     78.77%   78.76%   -0.02%     
  Complexity     8579     8579              
============================================
  Files          1009     1009              
  Lines         28993    28993              
  Branches       3599     3599              
============================================
- Hits          22839    22836       -3     
- Misses         5311     5312       +1     
- Partials        843      845       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jhalliday

Copy link
Copy Markdown
Contributor Author

@open-telemetry/profiling-maintainers This PR is somewhat interesting, as one of the early examples of an end-to-end interop where the sender and receiver are written by different people working more or less in isolation from one another. It exposes some rough edges that may represent opportunities/requirements for spec changes to make it a smoother process.

  • Some metadata is required by the backend, but not by the spec. On the one had the backend (devfiler) is behaving reasonably - it can't interpret data and render visualizations without it. On the other hand, unless I missed something it's not specified or required by the spec at present. "thread.name", for example. Opportunity to add 'SHOULD'-level recommendations to the semconv?
  • The 'null' dict_table[0] element is defined as a wire-level empty value to minimize wire bytes, but in the case of Link the trace spec already defined that a) the span/trace are of fixed, non-zero length and b) specifically the invalid value of each is (in string form) "0000...". These requirements are contradictory. I think the profiling wire spec should go with the trace spec definition, despite the extra bytes. In practice it will compress anyhow and the hassle of retrofitting trace codecs to deal with zero length strings does not appeal. Here for example the existing Java SDK will throw if "" is used and it's hard to call that wrong, so code duplication or config flags are the only available routes to evolve it.

@zeitlinger zeitlinger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice metadata additions. Two suggestions:

Hoist hot-loop dictionary lookups. In JfrExecutionSampleEventConverter.accept() the "thread.name" key index and KeyValueAndUnitData are rebuilt for every sample event. Same in JfrLocationDataCompositor.frameToLocation() for "profile.frame.type"/"jvm" per frame. The dict dedupes so output is correct, but each call still allocates a string + KeyValueAndUnitData. Since the key/value pair is constant per converter, compute it once (e.g. in the constructor or lazily cached) and reuse the int index.

For the thread sample, only threadName/threadNameData vary — pre-compute the "thread.name" key index once.

Null sampledThread. recordedEvent.getValue("sampledThread") can be null for some ExecutionSample variants. A null guard (skip or fall back to "unknown") would harden the converter against truncated/synthetic events.

LGTM otherwise — the ValueTypeData fix and frame-type attribute look right.

@jhalliday

Copy link
Copy Markdown
Contributor Author

Hi Gregor

Thanks for taking a look.

Hoist hot-loop dictionary lookups.

Right. There are two subtly different cases here, where one KV is entirely constant and the other is dependent on the event's thread value. That's partly an artifact of an over-simplification I made to assume all frames are "jvm" type. If native code is involved then the value there also becomes event-dependent.

Nevertheless there is still a performance argument for caching, since the number of thread names / thread types is considerably smaller than the number of events, but it's a classic space/time tradeoff to add a HashMap for these and at this early stage I'm lacking data to support it. The thread name case in particularly is concerning, as it's an unbounded key space and thus unbounded cache size. I'm going with 'the cost of churning a short lived key object is tolerable', especially since the frame's nameFrom computation is a worse example of the same issue and will likely dominate either of the others.

Null sampledThread. recordedEvent.getValue("sampledThread") can be null for some ExecutionSample variants.

Can it? The asserts in OpenJDK's jfrThreadSampling.cpp seemed to indicate it's always set, but ok, no real downside to hedging anyhow. I think the main takeaway here is the testing thus far is just a handful of old JFR files I had lying around and they don't contain some obvious alternative cases - the frame name code will break on non-java frames I think, but there aren't any in the test set... The JFR event APIs make it next to impossible to mock JFR data cleanly, which is a colossal pain for testing. OpenJDK itself seems to do it by having a curated collection of (hand crafted?) JFR files in version control instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants