Fix ice-disk table scans#491
Conversation
|
@adsharma could you PTAL? Re: duplicate boundNodes in unordered_map |
|
dataset PR: LadybugDB/dataset#3 |
|
@adsharma should we add a get_icebug_disk_supported_version CALL? |
|
We already have |
| } | ||
|
|
||
| // Load shared indptr data - thread-safe to read | ||
| if (!indptrFilePath.empty()) { |
There was a problem hiding this comment.
This guard was significant?
There was a problem hiding this comment.
indptr and indices path validation is done during table creation phase
| // calc current global row index based on assigned row group and local row index within that | ||
| // group | ||
| auto metadata = iceDiskScanState.parquetReader->getMetadata(); | ||
| offset_t startOffset = 0; |
There was a problem hiding this comment.
startOffset for a given nodeGroupIdx is constant?
There was a problem hiding this comment.
startOffset for a nodeGroupIdx(rowGroup) is calc just below. We can avoid this repeated calc by populating startOffsets in initGlobalStateInternal. I will add it in refactor post release. Keeping changes minimal right now
|
|
||
| // Create DataChunk matching the indices parquet file schema | ||
| auto numIndicesColumns = indicesReader->getNumColumns(); | ||
| cachedBatchData = std::make_unique<DataChunk>(numIndicesColumns); |
There was a problem hiding this comment.
Can these allocations be done once on reset() and reused?
There was a problem hiding this comment.
DataChunk doesn't offer a reset out of the box. All it offers is resetAuxiliaryBuffer. We need to manually reset state in DataChunk and other state objects in ValueVectors which requires tinkering with ParquetReader and/or ValueVector. Maybe refactor it later?
| for (uint32_t colIdx = 0; colIdx < numIndicesColumns; ++colIdx) { | ||
| const auto& columnTypeRef = indicesReader->getColumnType(colIdx); | ||
| auto columnType = columnTypeRef.copy(); | ||
| auto vector = std::make_shared<ValueVector>(std::move(columnType), memoryManager); |
|
new dataset PR: LadybugDB/dataset#4 |
|
Nice improvements! Ok to handle minor unresolved comments separately. |
context: #476 (review)