[WIP] Add icebug mem impl by aheev · Pull Request #496 · LadybugDB/ladybug

aheev · 2026-05-18T13:40:29Z

Closes: #329

adsharma · 2026-05-18T23:42:51Z

I realize this is not ready for review. But thanks for sharing it early for feedback!

Two high level concerns and details:

Adds a lot of new code. One of my concerns as a maintainer is to minimize the amount of code to maintain
Ice-mem branding. These things change and I would keep them on the docs/communication and use more technical names such as arrow-csr in the code.

Recommendations:

Keep ArrowNodeTable; do not add IceMemNodeTable.
Extend ArrowRelTable with a layout enum, e.g. TRIPLES vs CSR.
Put the layout-specific edge cursor behind a small helper, not a new table class.
For TRIPLES, keep current behavior: read from/to, lookup node PKs, emit matches.
For CSR, store indices and indptr Arrow arrays in the Arrow rel table data. For FWD scans, use the bound node offset to directly slice CSR. For BWD scans, either require/provide reverse CSR or explicitly fall back to a full scan.
Extend ArrowTableSupport registration with a tagged rel payload, rather than inventing icebug-memory catalog format.

aheev · 2026-05-19T04:41:53Z

Ice-mem branding. These things change and I would keep them on the docs/communication and use more technical names such as arrow-csr in the code.

Brand names tend to change far less frequently than technical details, wouldn’t you agree? Perhaps only on rare occasions. Also, if we were to name it arrow-csr, it might give the impression that we are confining the ice-mem format within both arrow and CSR

Recommendations:

Keep ArrowNodeTable; do not add IceMemNodeTable.

Extend ArrowRelTable with a layout enum, e.g. TRIPLES vs CSR.

Put the layout-specific edge cursor behind a small helper, not a new table class.

For TRIPLES, keep current behavior: read from/to, lookup node PKs, emit matches.

For CSR, store indices and indptr Arrow arrays in the Arrow rel table data. For FWD scans, use the bound node offset to directly slice CSR. For BWD scans, either require/provide reverse CSR or explicitly fall back to a full scan.

There is already a significant amount of redundant state and data in both the table and scan state classes due to inheritance from the native classes. For instance, if you look at setToTable, much of the code is not strictly necessary but is retained to remain compliant with the native class structure. Consequently, any changes to the native classes can propagate and affect the external storage classes as well. I have faced these issues while working on icebug-disk implementation

void ArrowRelTableScanState::setToTable(const transaction::Transaction* transaction, Table* table_,
    std::vector<column_id_t> columnIDs_, std::vector<ColumnPredicateSet> columnPredicateSets_,
    RelDataDirection direction_) {
    // Same behavior as IceDiskRelTable: no local table for external data sources.
    TableScanState::setToTable(transaction, table_, std::move(columnIDs_),
        std::move(columnPredicateSets_));
    columns.resize(columnIDs.size());
    direction = direction_;
    for (size_t i = 0; i < columnIDs.size(); ++i) {
        auto columnID = columnIDs[i];
        if (columnID == INVALID_COLUMN_ID || columnID == ROW_IDX_COLUMN_ID) {
            columns[i] = nullptr;
        } else {
            columns[i] = table->cast<RelTable>().getColumn(columnID, direction);
        }
    }
    csrOffsetColumn = table->cast<RelTable>().getCSROffsetColumn(direction);
    csrLengthColumn = table->cast<RelTable>().getCSRLengthColumn(direction);
    nodeGroupIdx = INVALID_NODE_GROUP_IDX;
}

Even with the ColumnarTableBase abstraction and the majority of common utilities in ArrowUtils, the Arrow classes remain large and complex. Introducing new data and states from icebug-memory only adds to this complexity, increasing redundant compliance code, more conditional checks, and potential risk.

Extend ArrowTableSupport registration with a tagged rel payload, rather than inventing icebug-memory catalog format.

didn't get this part

adsharma · 2026-05-19T05:36:44Z

Extend ArrowTableSupport registration with a tagged rel payload, rather than inventing icebug-memory catalog format.

The Tag in this context is TRIPLES or CSR (feel free to rename). The status right now:

	Parquet	Arrow
Triples	❌	✅
CSR	✅	✅

In the future there may be a legit use case for running cypher over a triple table stored in parquet (we support this for duckdb/sqlite/postgres).

I'm thinking about how to implement this matrix with the most code reuse and maintainability. I agree that sometimes its necessary to duplicate code for clarity. Inheritance is not the best way to share code etc.

But I'm not convinced that we should have 4 cases above * {node, rel} = 8 classes.

aheev added 2 commits May 18, 2026 11:43

add ice-mem node tables

3bed0ed

add ice-mem rel tables

69ff7fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add icebug mem impl#496

[WIP] Add icebug mem impl#496
aheev wants to merge 2 commits into
LadybugDB:mainfrom
aheev:add-icebug-mem-impl

aheev commented May 18, 2026

Uh oh!

adsharma commented May 18, 2026

Uh oh!

aheev commented May 19, 2026

Uh oh!

adsharma commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aheev commented May 18, 2026

Uh oh!

adsharma commented May 18, 2026

Uh oh!

aheev commented May 19, 2026

Uh oh!

adsharma commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants