Bug Fixes and Performance Improvements#34
Open
ocalasans wants to merge 2 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug Fixes
FreeListAllocator::FindBest— IncorrectpreviousNodeTrackingThe original implementation maintained a single
previousNodevariable that was advanced unconditionally on every iteration. Upon loop termination, this variable held the predecessor of the last visited node, not the predecessor of the node selected asbestBlock. Any subsequent call toCoalescencewould therefore operate on a structurally incorrect position in the free list, producing silent heap corruption on deallocation.Warning
This defect does not produce an immediate crash. Heap corruption is deferred to the next deallocation cycle, making the root cause difficult to trace.
The fix introduces a dedicated
bestPrevvariable that is only updated when a new best candidate is found, ensuring thatpreviousNodealways refers to the correct predecessor offoundNodeat the point of return.FreeListAllocator::Free— Silent Memory Leak on Tail InsertionWhen the pointer being freed resided at a higher address than all existing nodes in the free list, the traversal loop would exhaust without ever invoking
m_freeList.insert. The freed block was permanently lost, producing an undetectable memory leak that accumulated silently across deallocations.Warning
Because the leak occurs at the structural level of the free list — not at the OS allocation level — no external memory profiler will surface it. The arena appears consumed even after all user-level frees have been issued.
The fix restructures the insertion logic so that the positional search and the insertion are decoupled, guaranteeing that
insertis always called regardless of the freed block's address relative to the list.PoolAllocator::Init— Memory Leak on ReinitializationA second invocation of
Initwould unconditionally callmallocwithout first releasing the previously allocated arena, permanently losing the reference to the previous allocation. The fix adds aNULLcheck at the entry ofInitthat releases any existing arena prior to allocating a new one, ensuring that reinitialization does not silently leak memory.PoolAllocator.h— Missing Include GuardPoolAllocator.hwas the only header in the project lacking an#ifndef/#define/#endifinclude guard.Caution
Any translation unit that included this header more than once — directly or transitively — would produce a redefinition error at compile time. This is a deterministic failure, not a latent defect.
The fix adds the standard include guard, consistent with every other header in the codebase.
Utils::CalculatePadding— Incorrect Result on Already-Aligned AddressesThe original formula always produced a non-zero result, even when
baseAddresswas already aligned to the requested boundary. This caused every allocator that calledCalculatePaddingto insert unnecessary padding bytes on aligned addresses, inflating memory consumption and producing incorrect offset arithmetic. The formula was replaced with the standard power-of-two alignment expression:This expression returns zero when
baseAddressis already aligned, and computes the correct minimal padding otherwise. It also eliminates the division and modulo operations present in the original, reducing the operation to two additions and a bitwise AND.DoublyLinkedList::insert— Missing Back-Link on Last-Node InsertionWhen inserting a node at the tail of the list (
previousNode->next == nullptr), the original implementation linkedpreviousNode->nexttonewNodebut never setnewNode->previousto point back topreviousNode. The newly appended node'spreviouspointer was left uninitialised.Warning
Any backward traversal originating from the tail — or any operation relying on
previouslinkage — constitutes undefined behavior on the affected node.The fix unconditionally assigns
newNode->previousin all insertion paths.Benchmark::SingleFree— Fixed-Size VLA Ignoringm_nOperationsThe address array used to store allocation results was declared as a C-style array with a hardcoded size of 10 (
void* addresses[OPERATIONS], where the macroOPERATIONSwas set to 10 as an acknowledged workaround in the original source).Caution
When
m_nOperationsexceeded 10, the loop wrote beyond the array bounds. This was a known issue explicitly commented in the original source, left unresolved.The fix replaces the fixed-size array with a
std::vector<void*>sized at construction time fromm_nOperations.main.cpp— Non-Zero Exit Code on Successful ExecutionThe
mainfunction unconditionally returned1, signalling failure to the operating system and any invoking shell or build system, regardless of whether execution had completed successfully. The fix changes the return value to0.Performance Improvements
Benchmark— Timer Precision Upgraded to NanosecondsThe original benchmark reported elapsed time in milliseconds using
std::chrono::milliseconds. For allocators with O(1) complexity — particularlyLinearAllocatorandStackAllocator, which typically execute in the range of 3–10 nanoseconds per operation — millisecond resolution produced zero for all measurements, rendering the benchmark meaningless for comparative analysis. The fix replaces the duration type withstd::chrono::nanosecondsand adjusts all output formatting accordingly.Benchmark— Replacement ofrandwithstd::mt19937The original implementation seeded the C standard library PRNG with
srand(1)and sampled viarand() % n. This approach exhibits well-documented statistical deficiencies: modulo reduction introduces bias when the range does not evenly divideRAND_MAX, and the underlying LCG generator produces sequences with poor uniformity in the low-order bits. The fix substitutesstd::mt19937seeded at construction with a fixed value, paired withstd::uniform_int_distribution, providing uniform distribution and reproducible sequences without bias.Dead Code Removal
StackAllocator::Push— Unreachable Public MethodPushwas declared in the public interface and implemented in the corresponding translation unit, but had no call sites anywhere in the project. Furthermore, its semantics were already subsumed byAllocate, which unconditionally appends the current offset tom_markerson every allocation.Warning
An external caller invoking
Pushwould insert a duplicate marker intom_markers, corrupting the LIFO invariant on all subsequent calls toFreeorPop.The method has been removed from both the declaration and the implementation.
Allocator.cpp— Empty Translation UnitAllocator.cppcontained a single preprocessor directive (#include "Allocator.h") and contributed no compiled definitions to the build. All members ofAllocatorare either pure virtual, inline, or defined in the header. The file was removed and the corresponding entry was deleted fromCMakeLists.txt.Improvement:
CAllocator— Alignment Parameter Now HonouredMotivation
The original implementation of
CAllocator::Allocatediscarded thealignmentparameter entirely and unconditionally delegated tomalloc:While
mallocguarantees alignment suitable for any fundamental type, it provides no guarantee for over-aligned types (e.g. SIMD vector types requiring 16- or 32-byte boundaries). Silently ignoring thealignmentargument violated the contract established by theAllocatorbase class interface, makingCAllocatorsemantically incorrect whenever a non-trivial alignment was requested.Design Decision
The updated implementation dispatches to a platform-appropriate aligned allocation function when
alignment > 1, and falls back tomallocotherwise. Platform selection is encapsulated behind two macros:Note
On Windows, memory allocated with
_aligned_mallocmust be released with_aligned_free— passing it tofreeis undefined behavior. For this reason, a boolean memberm_lastWasAlignedwas introduced to record which allocation path was taken, allowingFreeto dispatch to the correct deallocation function.This approach preserves the single-allocation-path abstraction expected of
CAllocatorwhile correctly fulfilling the alignment contract on both Windows and POSIX targets.Build System
CMakeLists.txt— Standard and Warning ConfigurationThe CMake configuration was updated to explicitly enforce C++11 (
CMAKE_CXX_STANDARD 11,CMAKE_CXX_STANDARD_REQUIRED ON,CMAKE_CXX_EXTENSIONS OFF), preventing silent fallback to implementation-defined language extensions. Compiler warning flags (-Wall -Wextra -Wpedantic -Wshadow -Wnon-virtual-dtor -Wcast-align -Woverloaded-virtual -Wnull-dereference) were added for GCC and Clang targets. Release builds enable-O3 -march=native; Debug builds enable AddressSanitizer and UndefinedBehaviorSanitizer.Observed Impact
Original Codebase — Runtime Crash and Degenerate Benchmark Output
Executing the original binary produced an access violation before the benchmark suite could complete. The crash manifested in
StackLinkedList::pop()at the following site inStackLinkedListImpl.h:head = head->next; // head was nullptr — access violationThis fault is a direct consequence of a defect in
PoolAllocator::Reset: the original implementation zeroed the usage counters but never iterated the arena to build the free list, leavingm_freeList.headasNULL. SinceInitdelegated population toReset, callingAllocateimmediately afterInitinvokedpopon an empty list, dereferencing a null pointer. The process terminated with an unhandled exception before reaching theFreeListAllocatorbenchmarks entirely.Beyond the crash, the measurements that did complete were rendered meaningless by the millisecond timer resolution. Every allocator reported
0 mselapsed andinf ops/msthroughput, providing no actionable data whatsoever:The operation count of 10 and the hardcoded array of the same size further ensured that no meaningful stress was applied to any allocator.
Corrected Codebase — Stable Execution and Meaningful Measurements
After applying all corrections, the benchmark suite runs to completion without errors across all allocators and all operation modes. Timing is reported in nanoseconds, the operation count is 1000, and peak memory usage is tracked correctly throughout.
Representative results from the corrected build:
LinearAllocator— O(1) bump allocation, consistent ~3 ns per operation:StackAllocator— O(1) LIFO allocation and deallocation, ~5–7 ns per operation:FreeListAllocator— O(N) search, cost scales with allocation size as fragmentation grows:The corrected results expose a clear and expected performance hierarchy: the linear allocator is fastest (no bookkeeping), the stack allocator adds only marker tracking overhead, and the free list allocator pays the cost of list traversal. This distinction was entirely invisible in the original output due to the combination of degenerate timer resolution, insufficient operation count, and the runtime crash that prevented full execution.