Avoid creating new RecordBatches to simplify expressions#20534
Conversation
| let col = new_null_array(&DataType::Null, 1); | ||
| Ok(RecordBatch::try_new(dummy_schema, vec![col])?) | ||
| pub(crate) fn create_dummy_batch() -> Result<&'static RecordBatch> { | ||
| static DUMMY_BATCH: std::sync::OnceLock<Result<RecordBatch>> = |
There was a problem hiding this comment.
The idea is to compute the result once and then reuse it each time rather than create a new batch and array each time
|
I don't really expect to be able to measure the improvement here but I'll run benchmarks to check |
|
run benchmark sql_planner |
AdamGS
left a comment
There was a problem hiding this comment.
LGTM :) I think I had this at some point, but I couldn't measure a difference, even though it's obviously strictly less work
|
🤖 |
|
🤖: Benchmark completed Details
|
Yeah, I agree I will be surprised if this shows up in the traces. The first benchmark run seems to show improvements but I suspect it is just noise. I will rerun to see if I can reproduce |
|
run benchmark sql_planner |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
TLDR is this doesn't show up in any specific benchmarksing |
|
@alamb shall we close this then? |
I personally think it is a worthwhile opimtization as it doesn't add much complexity and saves calls to the allocator, etc However, I do agree it doesn't make a measurable difference |
adriangb
left a comment
There was a problem hiding this comment.
Yeah I agree, I'm good with it either way. I've approved so if you think it's worth doing even if it doesn't show up in benches we can merge it. But I suggest we either merge or close 😄
Thank you -- let's merge it! |
## Which issue does this PR close? - part of apache#19795 - follow on to apache#20234 ## Rationale for this change While reviewing apache#20234 from @AdamGS I wondered why we were creating new `RecordBatch`es to simplify expressions. I looked into it a bit and I think we can avoid a bunch of small allocations / deallocations ## What changes are included in this PR? 1. Create the dummy batch once and reuse it ## Are these changes tested? Yes by CI. I will also run benchmarks on it ## Are there any user-facing changes? No this is entirely internal
Which issue does this PR close?
simplify_const_expr#20234Rationale for this change
While reviewing #20234 from @AdamGS I wondered why we were creating new
RecordBatches to simplify expressions. I looked into it a bit and I think we can avoid a bunch of small allocations / deallocationsWhat changes are included in this PR?
Are these changes tested?
Yes by CI. I will also run benchmarks on it
Are there any user-facing changes?
No this is entirely internal