-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathamdahl-project.txt
More file actions
127 lines (91 loc) · 5.9 KB
/
amdahl-project.txt
File metadata and controls
127 lines (91 loc) · 5.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
Project: Caliby Vector Search Library
Location: /home/zxjcarrot/Workspace/caliby
Language: C++
How to build:
This is a C++/python project. You should always use rebuild.sh to build everything from scratch and install the caliby python package in the system.
How to benchmark:
run python3 ./benchmark/compare_hnsw.py --skip usearch,faiss
which outputs the something like the following
=========================================================================================================
zxjcarrot@zxjcarrot-MS-7E07:~/Workspace/caliby$ python3 ./benchmark/compare_hnsw.py --skip usearch,faiss
Caliby version: 0.1.0.dev20251222.180901
====================================================================================================
HNSW BENCHMARK: Caliby vs Usearch vs Faiss
====================================================================================================
Dataset: SIFT1M (1M vectors, 128 dimensions)
Parameters: M=16, ef_construction=100, ef_search=50
Libraries enabled:
caliby ✓ Enabled
usearch ✗ Disabled
faiss ✗ Disabled
✓ SIFT1M dataset already exists in ./sift1m
Loading SIFT1M dataset...
Base vectors: (1000000, 128)
Query vectors: (10000, 128)
Ground truth: (10000, 100)
======================================================================
Benchmarking Caliby HNSW
======================================================================
Building index with 1000000 vectors (dim=128)...
M=16, ef_construction=100
[IndexTranslationArray] Created for index 0 with capacity=6291456 pages (12288 ref count groups)
[TwoLevelPageStateArray] Created multi-index translation array with 65536 index slots, default index capacity=6291456
[IndexCatalog] Loaded 1 entries from ./catalog.dat
calico BufferManager initialized: blk:./heapfile virtgb:24 physgb:3 traditional:1 mmap_os_pagecache:0 trad_hash:5 exmap:0 hugepage: 1 hugepage for translation array: 0 num_threads:1 translation_specialization:calico
HNSW Initialization: Dim=128, M=16, M0=32, efConstruction=100, MaxLevel=6, FixedNodeSize=972 bytes, NodesPerPage=4, enable_prefetch=1
[IndexTranslationArray] Created for index 1 with capacity=277049 pages (542 ref count groups)
[TwoLevelPageStateArray] Registered index 1 with capacity 277049
[BufferManager] Registered index 1 in TwoLevelPageStateArray with max_pages=277049 initial_alloc=1
[BufferManager] Created allocator for index 1 starting at 1
[HNSW Recovery] skip_recovery=false has_existing_meta=false
[HNSW Recovery] params_match=false
[HNSW Recovery] can_reuse_storage=false
[HNSW Recovery] Allocated new metadata page 4294967297 base_pid=4294967298 total_pages=250000
[HNSW Recovery] Metadata page updated and marked valid
Adding 1000000 vectors...
✓ Build time: 21.55s
BufferManager flushed 0 pages.
✓ Estimated index size: 488.28 MB
Warming up...
Running search benchmark (ef_search=50)...
Processed 10000/10000 queries
====================================================================================================
BENCHMARK RESULTS SUMMARY
====================================================================================================
Library Build(s) Size(MB) QPS P50(ms) P95(ms) P99(ms) Recall@10
----------------------------------------------------------------------------------------------------
Caliby 21.55 488.28 9933.4 0.102 0.135 0.155 0.8977
====================================================================================================
WINNERS:
🏆 Highest QPS: Caliby (9933.4 queries/sec)
🏆 Best Recall@10: Caliby (0.8977)
🏆 Lowest P50 Latency: Caliby (0.102 ms)
🏆 Smallest Index: Caliby (488.28 MB)
🏆 Fastest Build: Caliby (21.55 s)
====================================================================================================
Calico module unloading: Shutting down system...
BufferManager flushed 0 pages.
[BufferManager] Total hole punches (madvise count): 0
[IndexCatalog] Persisted 1 entries to ./catalog.dat
=========================================================================================================
What to optimize:
You should optimize the QPS metric in the above output.
How to profile:
You may use perf to collect stats when the benchmark is started after 10 seconds and profile for at most 3 seconds.
For example: sudo perf record -g -p 993187 sleep 3
You can assume perf is installed and FlameGraph tool is also available at /home/zxjcarrot/Workspace/FlameGraph
Please use `sudo perf report --stdio -n folded` to output text-based hotspot for easier analysis for LLM. You should feed the top-100 lines of the perf report output.
In addition to using perf record, you can also use `perf stat -p <pid> sleep 3` to collect microarch metrics for the program to help guide you guide the optimization. Please use perf stat after perf record phase has been finished. Please upload both perf-record/perf-report results and perf-stat results to LLM
How to test:
run bash ./run_tests.sh
The above command will produce something that contains the following :
....
======================================================================================== short test summary info =========================================================================================
FAILED ../../home/zxjcarrot/Workspace/caliby/tests/test_multi_index_direct.py::TestMultiIndexIsolation::test_no_cross_contamination_varying_dims
FAILED ../../home/zxjcarrot/Workspace/caliby/tests/test_multi_index_direct.py::TestMultiIndexIsolation::test_interleaved_operations
===================================================================================== 2 failed, 143 passed in 23.67s =====================================================================================
...
You are only allow to have up to 5 failing tests.
Optimization Hints
searchLayer function in src/hnsw.cpp:433 is the hottest function.
Please consider how to improve it as a starting point.