Skip to content

Bug Fixes and Performance Improvements#34

Open
ocalasans wants to merge 2 commits into
mtrebi:masterfrom
ocalasans:master
Open

Bug Fixes and Performance Improvements#34
ocalasans wants to merge 2 commits into
mtrebi:masterfrom
ocalasans:master

Conversation

@ocalasans
Copy link
Copy Markdown

Bug Fixes

FreeListAllocator::FindBest — Incorrect previousNode Tracking

The original implementation maintained a single previousNode variable that was advanced unconditionally on every iteration. Upon loop termination, this variable held the predecessor of the last visited node, not the predecessor of the node selected as bestBlock. Any subsequent call to Coalescence would therefore operate on a structurally incorrect position in the free list, producing silent heap corruption on deallocation.

Warning

This defect does not produce an immediate crash. Heap corruption is deferred to the next deallocation cycle, making the root cause difficult to trace.

The fix introduces a dedicated bestPrev variable that is only updated when a new best candidate is found, ensuring that previousNode always refers to the correct predecessor of foundNode at the point of return.

FreeListAllocator::Free — Silent Memory Leak on Tail Insertion

When the pointer being freed resided at a higher address than all existing nodes in the free list, the traversal loop would exhaust without ever invoking m_freeList.insert. The freed block was permanently lost, producing an undetectable memory leak that accumulated silently across deallocations.

Warning

Because the leak occurs at the structural level of the free list — not at the OS allocation level — no external memory profiler will surface it. The arena appears consumed even after all user-level frees have been issued.

The fix restructures the insertion logic so that the positional search and the insertion are decoupled, guaranteeing that insert is always called regardless of the freed block's address relative to the list.

PoolAllocator::Init — Memory Leak on Reinitialization

A second invocation of Init would unconditionally call malloc without first releasing the previously allocated arena, permanently losing the reference to the previous allocation. The fix adds a NULL check at the entry of Init that releases any existing arena prior to allocating a new one, ensuring that reinitialization does not silently leak memory.

PoolAllocator.h — Missing Include Guard

PoolAllocator.h was the only header in the project lacking an #ifndef / #define / #endif include guard.

Caution

Any translation unit that included this header more than once — directly or transitively — would produce a redefinition error at compile time. This is a deterministic failure, not a latent defect.

The fix adds the standard include guard, consistent with every other header in the codebase.

Utils::CalculatePadding — Incorrect Result on Already-Aligned Addresses

The original formula always produced a non-zero result, even when baseAddress was already aligned to the requested boundary. This caused every allocator that called CalculatePadding to insert unnecessary padding bytes on aligned addresses, inflating memory consumption and producing incorrect offset arithmetic. The formula was replaced with the standard power-of-two alignment expression:

const std::size_t aligned = (baseAddress + alignment - 1) & ~(alignment - 1);

return aligned - baseAddress;

This expression returns zero when baseAddress is already aligned, and computes the correct minimal padding otherwise. It also eliminates the division and modulo operations present in the original, reducing the operation to two additions and a bitwise AND.

DoublyLinkedList::insert — Missing Back-Link on Last-Node Insertion

When inserting a node at the tail of the list (previousNode->next == nullptr), the original implementation linked previousNode->next to newNode but never set newNode->previous to point back to previousNode. The newly appended node's previous pointer was left uninitialised.

Warning

Any backward traversal originating from the tail — or any operation relying on previous linkage — constitutes undefined behavior on the affected node.

The fix unconditionally assigns newNode->previous in all insertion paths.

Benchmark::SingleFree — Fixed-Size VLA Ignoring m_nOperations

The address array used to store allocation results was declared as a C-style array with a hardcoded size of 10 (void* addresses[OPERATIONS], where the macro OPERATIONS was set to 10 as an acknowledged workaround in the original source).

Caution

When m_nOperations exceeded 10, the loop wrote beyond the array bounds. This was a known issue explicitly commented in the original source, left unresolved.

The fix replaces the fixed-size array with a std::vector<void*> sized at construction time from m_nOperations.

main.cpp — Non-Zero Exit Code on Successful Execution

The main function unconditionally returned 1, signalling failure to the operating system and any invoking shell or build system, regardless of whether execution had completed successfully. The fix changes the return value to 0.

Performance Improvements

Benchmark — Timer Precision Upgraded to Nanoseconds

The original benchmark reported elapsed time in milliseconds using std::chrono::milliseconds. For allocators with O(1) complexity — particularly LinearAllocator and StackAllocator, which typically execute in the range of 3–10 nanoseconds per operation — millisecond resolution produced zero for all measurements, rendering the benchmark meaningless for comparative analysis. The fix replaces the duration type with std::chrono::nanoseconds and adjusts all output formatting accordingly.

Benchmark — Replacement of rand with std::mt19937

The original implementation seeded the C standard library PRNG with srand(1) and sampled via rand() % n. This approach exhibits well-documented statistical deficiencies: modulo reduction introduces bias when the range does not evenly divide RAND_MAX, and the underlying LCG generator produces sequences with poor uniformity in the low-order bits. The fix substitutes std::mt19937 seeded at construction with a fixed value, paired with std::uniform_int_distribution, providing uniform distribution and reproducible sequences without bias.

Dead Code Removal

StackAllocator::Push — Unreachable Public Method

Push was declared in the public interface and implemented in the corresponding translation unit, but had no call sites anywhere in the project. Furthermore, its semantics were already subsumed by Allocate, which unconditionally appends the current offset to m_markers on every allocation.

Warning

An external caller invoking Push would insert a duplicate marker into m_markers, corrupting the LIFO invariant on all subsequent calls to Free or Pop.

The method has been removed from both the declaration and the implementation.

Allocator.cpp — Empty Translation Unit

Allocator.cpp contained a single preprocessor directive (#include "Allocator.h") and contributed no compiled definitions to the build. All members of Allocator are either pure virtual, inline, or defined in the header. The file was removed and the corresponding entry was deleted from CMakeLists.txt.

Improvement: CAllocator — Alignment Parameter Now Honoured

Motivation

The original implementation of CAllocator::Allocate discarded the alignment parameter entirely and unconditionally delegated to malloc:

void* CAllocator::Allocate(const std::size_t size, const std::size_t alignment) {
    return malloc(size);
}

While malloc guarantees alignment suitable for any fundamental type, it provides no guarantee for over-aligned types (e.g. SIMD vector types requiring 16- or 32-byte boundaries). Silently ignoring the alignment argument violated the contract established by the Allocator base class interface, making CAllocator semantically incorrect whenever a non-trivial alignment was requested.

Design Decision

The updated implementation dispatches to a platform-appropriate aligned allocation function when alignment > 1, and falls back to malloc otherwise. Platform selection is encapsulated behind two macros:

#if defined(_WIN32) || defined(_WIN64)
    #define PLATFORM_ALIGNED_ALLOC(ptr, align, size) (((ptr) = _aligned_malloc((size), (align))) == NULL ? -1 : 0)
    #define PLATFORM_ALIGNED_FREE(ptr) _aligned_free((ptr))
#else
    #define PLATFORM_ALIGNED_ALLOC(ptr, align, size) posix_memalign(&(ptr), (align), (size))
    #define PLATFORM_ALIGNED_FREE(ptr) std::free((ptr))
#endif

Note

On Windows, memory allocated with _aligned_malloc must be released with _aligned_free — passing it to free is undefined behavior. For this reason, a boolean member m_lastWasAligned was introduced to record which allocation path was taken, allowing Free to dispatch to the correct deallocation function.

This approach preserves the single-allocation-path abstraction expected of CAllocator while correctly fulfilling the alignment contract on both Windows and POSIX targets.

Build System

CMakeLists.txt — Standard and Warning Configuration

The CMake configuration was updated to explicitly enforce C++11 (CMAKE_CXX_STANDARD 11, CMAKE_CXX_STANDARD_REQUIRED ON, CMAKE_CXX_EXTENSIONS OFF), preventing silent fallback to implementation-defined language extensions. Compiler warning flags (-Wall -Wextra -Wpedantic -Wshadow -Wnon-virtual-dtor -Wcast-align -Woverloaded-virtual -Wnull-dereference) were added for GCC and Clang targets. Release builds enable -O3 -march=native; Debug builds enable AddressSanitizer and UndefinedBehaviorSanitizer.

Observed Impact

Original Codebase — Runtime Crash and Degenerate Benchmark Output

Executing the original binary produced an access violation before the benchmark suite could complete. The crash manifested in StackLinkedList::pop() at the following site in StackLinkedListImpl.h:

head = head->next;  // head was nullptr — access violation

This fault is a direct consequence of a defect in PoolAllocator::Reset: the original implementation zeroed the usage counters but never iterated the arena to build the free list, leaving m_freeList.head as NULL. Since Init delegated population to Reset, calling Allocate immediately after Init invoked pop on an empty list, dereferencing a null pointer. The process terminated with an unhandled exception before reaching the FreeListAllocator benchmarks entirely.

Beyond the crash, the measurements that did complete were rendered meaningless by the millisecond timer resolution. Every allocator reported 0 ms elapsed and inf ops/ms throughput, providing no actionable data whatsoever:

BENCHMARK: ALLOCATION
    Size:         32
    Alignment     8
    RESULTS:
        Operations:     10
        Time elapsed:   0 ms
        Op per sec:     inf ops/ms
        Timer per op:   0 ms/ops
        Memory peak:    0 bytes

The operation count of 10 and the hardcoded array of the same size further ensured that no meaningful stress was applied to any allocator.

Corrected Codebase — Stable Execution and Meaningful Measurements

After applying all corrections, the benchmark suite runs to completion without errors across all allocators and all operation modes. Timing is reported in nanoseconds, the operation count is 1000, and peak memory usage is tracked correctly throughout.

Representative results from the corrected build:

LinearAllocator — O(1) bump allocation, consistent ~3 ns per operation:

BENCHMARK: ALLOCATION
    Size:         32
    Alignment     8
    RESULTS:
        Operations:     1000
        Time elapsed:   0.003322 ms
        Time per op:    3.322 ns/op
        Memory peak:    32000 bytes

StackAllocator — O(1) LIFO allocation and deallocation, ~5–7 ns per operation:

BENCHMARK: ALLOCATION/FREE
    Size:         4096
    Alignment     8
    RESULTS:
        Operations:     1000
        Time elapsed:   0.007669 ms
        Time per op:    7.669 ns/op
        Memory peak:    4096000 bytes

FreeListAllocator — O(N) search, cost scales with allocation size as fragmentation grows:

BENCHMARK: ALLOCATION/FREE
    Size:         4096
    Alignment     8
    RESULTS:
        Operations:     1000
        Time elapsed:   1.208970 ms
        Time per op:    1208.97 ns/op
        Memory peak:    4112000 bytes

The corrected results expose a clear and expected performance hierarchy: the linear allocator is fastest (no bookkeeping), the stack allocator adds only marker tracking overhead, and the free list allocator pays the cost of list traversal. This distinction was entirely invisible in the original output due to the combination of degenerate timer resolution, insufficient operation count, and the runtime crash that prevented full execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant