Implement Adaptive Radix Tree (ART) Indexes by adsharma · Pull Request #492 · LadybugDB/ladybug

adsharma · 2026-05-16T02:02:24Z

Details in the included docs.

du -sh /tmp/test1*.db
634M    /tmp/test1-hash.db
512M    /tmp/test1-noindex.db
604M    /tmp/test1-art.db

➜  ladybug git:(art_index) ✗ ./build/release/tools/shell/lbug -r /tmp/test1-hash.db
Opening the database at path: /tmp/test1-hash.db in read-only mode.
Enter ":help" for usage hints.
lbug> CALL show_indexes() return *;
┌────────────┬──────────────┬────────────┬────────────────┬──────────────────┬──────────────────────────────────────────────────────────────┐
│ table_name │ index_name   │ index_type │ property_names │ extension_loaded │ index_definition                                             │
│ STRING     │ STRING       │ STRING     │ STRING[]       │ BOOL             │ STRING                                                       │
├────────────┼──────────────┼────────────┼────────────────┼──────────────────┼──────────────────────────────────────────────────────────────┤
│ User       │ user_hash_pk │ HASH       │ [id]           │ True             │ CREATE HASH INDEX `user_hash_pk` FOR (n:`User`) ON (n.`id`); │
└────────────┴──────────────┴────────────┴────────────────┴──────────────────┴──────────────────────────────────────────────────────────────┘
(1 tuple)
(6 columns)
Time: 5.56ms (compiling), 1.69ms (executing)
lbug>
➜  ladybug git:(art_index) ✗ ./build/release/tools/shell/lbug -r /tmp/test1-art.db
Opening the database at path: /tmp/test1.db in read-only mode.
Enter ":help" for usage hints.
lbug> CALL show_indexes() return *;
┌────────────┬───────────────┬────────────┬────────────────┬──────────────────┬──────────────────────────────────────────────────────────────┐
│ table_name │ index_name    │ index_type │ property_names │ extension_loaded │ index_definition                                             │
│ STRING     │ STRING        │ STRING     │ STRING[]       │ BOOL             │ STRING                                                       │
├────────────┼───────────────┼────────────┼────────────────┼──────────────────┼──────────────────────────────────────────────────────────────┤
│ User       │ idx_person_pk │ ART        │ [id]           │ True             │ CREATE ART INDEX `idx_person_pk` FOR (n:`User`) ON (n.`id`); │
└────────────┴───────────────┴────────────┴────────────────┴──────────────────┴──────────────────────────────────────────────────────────────┘
(1 tuple)
(6 columns)
Time: 2.08ms (compiling), 0.21ms (executing)
lbug>
➜  ladybug git:(art_index) ✗ ./build/release/tools/shell/lbug -r /tmp/test1-noindex.db
Opening the database at path: /tmp/test1-save.db in read-only mode.
Enter ":help" for usage hints.
lbug> CALL show_indexes() return *;
┌────────────┬────────────┬────────────┬────────────────┬──────────────────┬──────────────────┐
│ table_name │ index_name │ index_type │ property_names │ extension_loaded │ index_definition │
│ STRING     │ STRING     │ STRING     │ STRING[]       │ BOOL             │ STRING           │
├────────────┼────────────┼────────────┼────────────────┼──────────────────┼──────────────────┤
└────────────┴────────────┴────────────┴────────────────┴──────────────────┴──────────────────┘
(0 tuples)
(6 columns)
Time: 1.50ms (compiling), 0.16ms (executing)

adsharma · 2026-05-16T02:19:26Z

random_lookup_bench.py --backend pybind --literal --lookups 500 --warmup 50


  Literal benchmark cross-check, 500 lookups:

  base  1996.8/s  avg=0.501ms p95=0.615ms
  hash  1227.3/s  avg=0.815ms p95=1.802ms
  art  12029.0/s  avg=0.083ms p95=0.095ms

Without --literal  
  
  base  1405.7/s avg=0.711ms p95=0.842ms
  hash  1350.4/s avg=0.740ms p95=0.959ms
  art   1424.0/s avg=0.702ms p95=0.817ms

So with literal constants, ART is clearly being used and is much faster. The prepared benchmark is still mostly tied, which suggests the prepared/parameter path or Python execution overhead is masking the index difference. Hash being slower than base in the literal run is also notable.

https://gist.github.com/adsharma/83a1b7c9c320d829e349135d396ab4d3

adsharma · 2026-05-16T02:20:35Z

ART index also enables range scans. Previously ladybug/kuzu didn't support range scans via indexes.

aheev

queries after COPY FROM are failing

-CASE ArtIndexCopyFrom
-STATEMENT CALL enable_default_hash_index=false;
---- ok
-STATEMENT CREATE NODE TABLE art_copy_person (ID INT64, name STRING, PRIMARY KEY(ID));
---- 1
Table art_copy_person has been created.
-STATEMENT CREATE ART INDEX art_copy_person_pk FOR (p:art_copy_person) ON (p.ID);
---- 1
Index art_copy_person_pk has been created.
-STATEMENT COPY art_copy_person FROM "${LBUG_ROOT_DIRECTORY}/dataset/art-index-test/person.csv";
---- 1
4 tuples have been copied to the art_copy_person table.
-STATEMENT MATCH (p:art_copy_person) WHERE p.ID = 2 RETURN p.name;
---- 1
Grace
-STATEMENT MATCH (p:art_copy_person) WHERE p.ID >= 1 AND p.ID <= 10 RETURN p.ID, p.name ORDER BY p.ID;
---- 3
1|Ada
2|Grace
10|Barbara
-STATEMENT MATCH (p:art_copy_person) WHERE p.ID < 2 RETURN p.ID, p.name ORDER BY p.ID;
---- 2
-1|Edsger
1|Ada
-STATEMENT CALL enable_default_hash_index=true;
---- ok

person.csv

We should also look at thread safety

adsharma · 2026-05-17T01:41:34Z

Key changes:

ART index creation now validates the physical contract: built-in primary-key index, exactly one indexed column, exactly one key type, supported scalar key
type.
Range pushdown now only uses ART range scan for validated simple shapes: at most one constant lower bound and one constant upper bound. Duplicate same-side
bounds or complex/non-constant predicates stay as normal filters.
Consumed valid range predicates are now removed from the residual predicate set.
ART public lookup/scan/insert/discard/checkpoint paths now take a coarse mutex for thread safety.
Moved initInsertState out of the header and marked ArtPrimaryKeyIndex::load with LBUG_API.
Reworked range traversal to iterate actual children instead of probing all 256 byte values through getChild.
Rollback/discard cleanup now prunes empty child nodes; it does not shrink node kind layouts, which I’d keep as a separate memory-layout refactor.
Updated docs/art_index.md with unsupported shapes and optimizer limitations.
Added e2e coverage for unsupported ART creation, duplicate/complex range predicate behavior, and kept the COPY regression.

adsharma · 2026-05-17T03:32:41Z

Need refactor the pathological unique_ptr<> deallocation pattern in this PR. Eventually we should use a more efficient purpose designed memory allocator for the nodes. But for now, I'll try to avoid the long pause on shell exit.

- Allocation: one new Node() per inserted trie child. - Storage: each Node has a tagged active child layout: small, node48, or node256. It only keeps the child array for its current kind. - Ownership: parent owns its child pointers. There is no shared ownership. - Destruction: ArtPrimaryKeyIndex::clear() walks the tree iteratively with an explicit stack and deletes every reachable node. This avoids recursive destructor chains. - Child deletion: removeChild() calls deleteTree(child) before removing the pointer from the parent, so removed subtrees are freed immediately. - Growth/rebalancing: when NODE16 -> NODE48 or NODE48 -> NODE256, the code moves raw child pointers into the new active layout and then switches kind. It does not delete child nodes during growth, because ownership is transferred in-place to the new layout. What prevents leaks is the invariant that every allocated child pointer is either reachable from exactly one parent layout or is immediately passed to deleteTree() during deletion. Growth only relocates pointers; deletion nulls/compacts after freeing.

adsharma added 4 commits May 15, 2026 18:16

Add ART primary key index

48d0b9e

Add ART primary key range scans

4ca87db

Use generic primary key index for rel lookup

1469e77

Use primary key scan only with indexes

90b3e81

Fix ART range scan schema cardinality

2864f77

aheev suggested changes May 16, 2026

View reviewed changes

adsharma added 5 commits May 16, 2026 18:26

Populate ART index during COPY

8f41e8c

Harden ART index internals

b04392f

Guard ART range predicate pushdown

0ae3894

Test unsupported ART index creation

47ad449

Document ART index limitations

29c4a33

aheev approved these changes May 17, 2026

View reviewed changes

adsharma added 2 commits May 16, 2026 22:17

Allocate ART nodes from arena blocks

0e32641

adsharma merged commit 7d6335a into main May 17, 2026
4 checks passed

adsharma deleted the art_index branch May 17, 2026 05:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Adaptive Radix Tree (ART) Indexes#492

Implement Adaptive Radix Tree (ART) Indexes#492
adsharma merged 12 commits into
mainfrom
art_index

adsharma commented May 16, 2026 •

edited

Loading

Uh oh!

adsharma commented May 16, 2026 •

edited

Loading

Uh oh!

adsharma commented May 16, 2026

Uh oh!

aheev left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adsharma commented May 17, 2026

Uh oh!

adsharma commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adsharma commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adsharma commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adsharma commented May 16, 2026

Uh oh!

aheev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adsharma commented May 17, 2026

Uh oh!

adsharma commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adsharma commented May 16, 2026 •

edited

Loading

adsharma commented May 16, 2026 •

edited

Loading