Core: Use zero-copy wrapper for equalityFieldIds#13668
Conversation
171652f to
dafcb3d
Compare
dafcb3d to
a70413b
Compare
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
This is a great find @bvolpato, the improvement makes a lot of sense to me!
|
Could you pls share the code of |
|
|
||
| @Override | ||
| public List<Integer> equalityFieldIds() { | ||
| return ArrayUtil.toIntList(equalityIds); |
There was a problem hiding this comment.
It is worth considering rewriting org.apache.iceberg.util.ArrayUtil#toIntList as well even though it is used at the moment only in tests.
There was a problem hiding this comment.
Just found this and I agree.
|
@bvolpato off topic: what’s the app from the screenshot? |
@findinpath Sorry, it was a bit of throwaway code, but I still had it saved here, pushed to https://github.com/bvolpato/guava-temp-benchmark/blob/main/src/jmh/java/com/benchmark/IntListBenchmark.java
@wendigo (Disclaimer: I work at Datadog) It's from Datadog continuous profiling https://docs.datadoghq.com/profiler/, it has awesome instrumentation/tooling to proactively sample applications with very little overhead, which allows us to just go back in time for a particular time that had CPU/memory pressure and look at the flamegraphs - which was the case here. |
In one of our Trino -> Iceberg use case, which relies on equality deletes, we observed that a lot of time and allocations are being spent in
BaseFile.equalityFieldIds()(~34% of our overall allocs).Checking the implementation, it seems that the current implementation based on streams has to copy and box the data, which is very inefficient.
There is already prior art (#8336) to use Guava 0-copy wrappers for longs / split offsets in the same class, I'm doing that same change.
Drafted a quick JMH to show the difference, and it clearly makes a huge difference: