- High-performance, parallel file I/O with RAII semantics
- High-performance on-disk data structures
- A bunch of utility functions and classes
Create a CMakeLists.txt:
cmake_minimum_required(VERSION 3.14)
project(BTreeDemo)
# Adds excessive to the project
include(FetchContent)
FetchContent_Declare(
excessive
GIT_REPOSITORY https://gitea.com/komozoi/excessive.git
GIT_TAG v0.2.1
GIT_SHALLOW TRUE
GIT_PROGRESS ON
SYSTEM
)
FetchContent_MakeAvailable(excessive)
# Create the demo executable and link excessive into it
add_executable(demo demo.cpp)
target_link_libraries(demo PRIVATE excessive)And create a file demo.cpp:
#include <fs/FdHandle.h>
#include <fs/BTree.h>
#include <fcntl.h>
#include <random>
#include <cstdio>
struct btree_entry_s {
int key;
int value;
static int compare(const btree_entry_s &a, const btree_entry_s &b) {
return a.key - b.key;
}
};
int main() {
FdHandle file = FdHandle::open("btree.bin", O_RDWR | O_CREAT, 0644);
if (!file) {
printf("Failed to open file!\n");
return 1;
}
// It is easy to check if the file already existed or was just created
printf("File is %s.\n", file.isNew() ? "newly created" : "existing");
BTree<btree_entry_s> tree(file, 0, btree_entry_s::compare);
// Add 100 random elements to the tree
std::random_device rd;
for (int i = 0; i < 100; ++i) {
btree_entry_s new_entry{(int) rd() % 5000, (int) rd()};
tree.insert(new_entry);
}
// See what the next highest values are for given inputs
// Each run this would change as the BTree grows
for (int i = 0; i < 5000; i += 500) {
btree_entry_s result{i, 0};
if (tree.findNext(result)) {
printf("Next highest entry from %i is (%i, %i)\n", i, result.key, result.value);
} else {
printf("No next highest entry found for %i\n", i);
}
}
// All data is already written to the file (although not necessarily flushed)
// For this reason, no cleanup is needed here for the BTree.
// File automatically closes when all references go out of scope,
// but can be closed manually with:
// write_handle.close();
return 0;
}If you are using an IDE, this may be enough to import the project and run it - very convenient!
If not, run these commands to compile and run:
# Setup
mkdir build && cd build && cmake ..
# Compile
make
# Run the demo to create the data file
./demo
# Run the demo again to add more to the data file and see the effects
./demoAnd that's it - efficient and persistent data storage in less than 60 lines of code. No extra installation steps or complex APIs. It just works.
LibExcessive is intended for large, data-heavy backend applications such as servers and data processing tools where speed and reliability matter. The design goals and features are:
- Provide familiar, Java-like APIs and richer helper types, especially containers
- With less verbosity than Java APIs, thankfully.
- Provide rich, threadsafe, and extremely efficient utilities for interacting with files
- Threadsafe file handles and transactions
- Mmap handles
- Open file reference counting
- Utilities for keeping data on-disk
- BTree
- Files with dynamically allocated regions
- (planned) Utilities for building on-disk indexes and databases
- Favor explicit memory and performance control. Many components are designed to be friendly to
custom allocators and memory pools.
- Tracking memory separately for different components of an application, which helps to find memory hogs
- Safer allocation and memory management with less heap fragmentation
- Small, focused algorithms and helpers for string handling, byte buffers, serialization, and more
- Support modern C++ compilers from C++17 and up.
Much of the code was originally written to run on the Teensy 4.1, which is extremely memory constrained compared to our desktop computers, having a mere 1MiB of RAM. For this reason, there is a lot of code for tightly controlling memory usage and performance.
Everything is tested on Debian Linux currently, although any Unix flavor should work. I do not plan on supporting Windows.
Main repository is on Gitea at https://gitea.com/komozoi/excessive, but is mirrored to GitHub at https://github.com/komozoi/libexcessive.
I develop this with CLion, which imports the CMake project for me, but if you prefer to build on the terminal, it's not hard:
git clone https://gitea.com/komozoi/excessive.git
cd libexcessive
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
makeTests are included for almost everything, which you can use as a larger reference if needed.
#include "fs/FdHandle.h"
int main() {
const char* temp_filename = "mmap_test.tmp";
// All file handles are smart pointers!
FdHandle write_handle = FdHandle::open(temp_filename, O_RDWR | O_CREAT, 0660);
MmapHandle write_mmap = write_handle.getMmapHandle(0, sizeof(my_struct_t));
// Writing structs with mmap is easy and safe
my_struct_t value{1, true, {}};
write_mmap.write(value);
// File automatically closes when all references go out of scope,
// but can be closed manually with:
// write_handle.close();
}Unlike in other libraries, the bigint implementation uses a fixed width. The bigint code was originally designed for use in an EVM implementation, where most values are 256-bit. This is designed to work like a typical register or even normal fixed-width datatype, just bigger, and acts like you would expect with truncation and such.
There are plenty of great libraries that implement variable-width bigint; there is no reason to add that to this library. Originally I wasn't going to include my own bigint implementation as I figured existing libraries were sufficient, but I changed my mind when I saw that they were all variable-length, which does not work for my typical applications.
Examples:
#include "bigint.h"
int main() {
uint256_t a = uint256_t("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF");
uint256_t b("0x1234567890ABCDEF1234567890ABCDEF");
uint256_t sum = a + b;
uint256_t prod = a * b;
uint256_t pow = b.pow(4);
printf("a = %s\n", a.toHexString().c_str());
printf("b = %s\n", b.toHexString().c_str());
printf("a + b = %s\n", sum.toHexString().c_str());
printf("a * b = %s\n", prod.toHexString().c_str());
printf("b**4 = %s\n", pow.toHexString().c_str());
// Output:
// a = ffffffffffffffffffffffffffffffff
// b = 1234567890abcdef1234567890abcdef
// a + b = 0000000000000000000000001234567890abcdef1234567890abcdefffffffff
// a * b = 1234567890abcdef1234567890abcdeeffffffffffffffedcba98765432111
// b**4 = 33ee0e405772f4bd1fa6d7a4e8c14117ea371272c23e2b10
// Default sizes include uint128_t, uint192_t, and uint256_t. Custom sizes are also possible.
uint192_t threeWordValue = "0x6935282358963433459348abcdef1ee7";
UnsignedFixedWidthBigInt<7> sevenWordValue = "0x8b20159b1c579b1088048f054bedebfd02de6b23919371be36d872ec46fe9cebe684edd2675ab1101262b78877b3c09966366c07df0fcccf";
// It is also possible to multiply directly with doubles
// The result of the multiplication is floored and almost exact
uint192_t productWithDouble = threeWordValue * 0.7311;
// productWithDouble would be something like 0x4cead4b7be06ccb79c0e92711d09028c
uint256_t x3("0x783924abc37678847777fcba");
x3 = x3.root(3); // 0xc6fc718e
}#include <alloc/pointer.h>
struct Data {
int value;
};
int main() {
// UNIQUE (default-style ownership)
sp<Data> x(SpPointerType::UNIQUE, Data{10});
// Copying a UNIQUE pointer does NOT immediately copy the data
sp<Data> y = x;
// x stays UNIQUE
// y becomes COPY_ON_WRITE
// both point to the same underlying object (for now)
// First write triggers a deep copy
y.mut().value = 20;
// now:
// x->value == 10
// y->value == 20
// they no longer share memory
// You can keep copying before mutation is needed
sp<Data> z = x;
// still sharing with x
z.mut().value = 30;
// z detaches and becomes independent
// x is still unchanged
// SHARED mode = always shared, no copy-on-write
sp<Data> sharedA(SpPointerType::SHARED, Data{100});
sp<Data> sharedB = sharedA;
sharedB.mut().value = 200;
// both see the change:
// sharedA->value == 200
// sharedB->value == 200
// Move = transfer ownership, no copies
sp<Data> moved = std::move(sharedA);
// sharedA is now null
// moved owns the data
// Scoped lifetime (RAII)
{
sp<Data> temp(SpPointerType::UNIQUE, Data{5});
sp<Data> alias = temp;
// alias is COPY_ON_WRITE
// object is destroyed exactly once when both go out of scope
}
return 0;
}Simple examples showing the style of usage that matches the library design:
#include "ds/ArrayList.h"
int main() {
ArrayList<int> list;
list.add(1);
list.add(2);
list.addCopies(5, 3); // add three copies of 5
for (int element: list)
printf("value %d\n", element);
return 0;
}#include "ds/ArrayList.h"
int main() {
// Initialize list as {2, 3, 4}
ArrayList<int> list{2,3,4};
// Add 1 to the beginning
list.addFirst(1);
// 1, 2, 3, 4
for (int element: list)
printf("value %d\n", element);
}