Core: Use zero-copy wrapper for equalityFieldIds by bvolpato · Pull Request #13668 · apache/iceberg

bvolpato · 2025-07-25T01:03:57Z

In one of our Trino -> Iceberg use case, which relies on equality deletes, we observed that a lot of time and allocations are being spent in BaseFile.equalityFieldIds() (~34% of our overall allocs).

Checking the implementation, it seems that the current implementation based on streams has to copy and box the data, which is very inefficient.

There is already prior art (#8336) to use Guava 0-copy wrappers for longs / split offsets in the same class, I'm doing that same change.

Drafted a quick JMH to show the difference, and it clearly makes a huge difference:

Benchmark                              (arraySize)  Mode  Cnt        Score         Error  Units
IntListBenchmark.guavaImplementation            10    ss   10     1266.800 ±     199.548  ns/op
IntListBenchmark.guavaImplementation          1000    ss   10     1387.200 ±     531.909  ns/op
IntListBenchmark.guavaImplementation        100000    ss   10     1262.500 ±     188.043  ns/op
IntListBenchmark.guavaImplementation       1000000    ss   10     1600.000 ±    1509.810  ns/op
IntListBenchmark.streamImplementation           10    ss   10     8883.500 ±    1858.339  ns/op
IntListBenchmark.streamImplementation         1000    ss   10    46229.000 ±   25265.952  ns/op
IntListBenchmark.streamImplementation       100000    ss   10   736904.100 ±  299912.894  ns/op
IntListBenchmark.streamImplementation      1000000    ss   10  7321966.800 ± 7146920.166  ns/op

amogh-jahagirdar

This is a great find @bvolpato, the improvement makes a lot of sense to me!

amogh-jahagirdar · 2025-07-25T16:19:38Z

Thanks for the PR @bvolpato and thank you @mrcnc @pvary for reviewing. I'll go ahead and merge

findinpath · 2025-07-30T05:33:51Z

Could you pls share the code of IntListBenchmark ?

findinpath · 2025-07-30T05:38:45Z


  @Override
  public List<Integer> equalityFieldIds() {
-    return ArrayUtil.toIntList(equalityIds);


It is worth considering rewriting org.apache.iceberg.util.ArrayUtil#toIntList as well even though it is used at the moment only in tests.

Just found this and I agree.

wendigo · 2025-07-30T05:46:10Z

@bvolpato off topic: what’s the app from the screenshot?

bvolpato · 2025-07-30T18:11:40Z

Could you pls share the code of IntListBenchmark ?

@findinpath Sorry, it was a bit of throwaway code, but I still had it saved here, pushed to https://github.com/bvolpato/guava-temp-benchmark/blob/main/src/jmh/java/com/benchmark/IntListBenchmark.java

@bvolpato off topic: what’s the app from the screenshot?

@wendigo (Disclaimer: I work at Datadog) It's from Datadog continuous profiling https://docs.datadoghq.com/profiler/, it has awesome instrumentation/tooling to proactively sample applications with very little overhead, which allows us to just go back in time for a particular time that had CPU/memory pressure and look at the flamegraphs - which was the case here.

github-actions Bot added the core label Jul 25, 2025

bvolpato force-pushed the equalityfieldids-zerocopy branch from 171652f to dafcb3d Compare July 25, 2025 01:09

Core: Use zero-copy wrapper for equalityFieldIds

a70413b

bvolpato force-pushed the equalityfieldids-zerocopy branch from dafcb3d to a70413b Compare July 25, 2025 01:10

pvary approved these changes Jul 25, 2025

View reviewed changes

mrcnc approved these changes Jul 25, 2025

View reviewed changes

amogh-jahagirdar approved these changes Jul 25, 2025

View reviewed changes

amogh-jahagirdar merged commit fb81fcf into apache:main Jul 25, 2025
42 checks passed

findinpath reviewed Jul 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Use zero-copy wrapper for equalityFieldIds#13668

Core: Use zero-copy wrapper for equalityFieldIds#13668
amogh-jahagirdar merged 1 commit into
apache:mainfrom
bvolpato:equalityfieldids-zerocopy

bvolpato commented Jul 25, 2025

Uh oh!

amogh-jahagirdar left a comment

Uh oh!

amogh-jahagirdar commented Jul 25, 2025

Uh oh!

Uh oh!

findinpath commented Jul 30, 2025

Uh oh!

findinpath Jul 30, 2025

Uh oh!

rdblue Apr 23, 2026

Uh oh!

wendigo commented Jul 30, 2025

Uh oh!

bvolpato commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

bvolpato commented Jul 25, 2025

Uh oh!

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar commented Jul 25, 2025

Uh oh!

Uh oh!

findinpath commented Jul 30, 2025

Uh oh!

findinpath Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

rdblue Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

wendigo commented Jul 30, 2025

Uh oh!

bvolpato commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants