C++ game engine built to explore high-performance architecture.
Currently under active development, serves as both a learning platform and research project.
Or it might just be a playground to test my sanity.
Important
My original Bachelor's Thesis version is archived in the thesis branch.
Honestly? I just really love this stuff.
It started with my Bachelor's Thesis, where I designed a dual-renderer engine to benchmark Vulkan path tracing against traditional OpenGL PBR. The focus was purely on real-time graphics, so the underlying architecture was single-threaded. It worked, and I had a blast building it!
Then I watched Christian Gyrling’s GDC talk on Parallelizing the Naughty Dog Engine Using Fibers. Seeing how they saturated every single CPU core made me realize how much was left to explore.
So, I started Luth from scratch to explore high-performance architecture: fiber-based job systems, lock-free memory models, and bindless Vulkan rendering. It is absolutely over-engineered for a solo project, but that’s the point.
Prerequisites:
- OS: Windows 10 / 11
- Compiler: MSVC (v143+) or Clang (C++20-compliant)
- SDK: Vulkan SDK 1.3+. Needs
dynamicRendering,timelineSemaphore, and descriptor indexing with UBO update-after-bind (any GPU 2018+)
Steps:
- Clone with submodules
git clone --recursive https://github.com/Hekbas/Luth.git
- Generate the VS solution
scripts/setup/setup_windows.bat
- Build — either open
Luth.slnin Visual Studio 2022, or run the headless script:scripts/build/build_windows.bat
The editor binary lands at bin/windows-x86_64/Debug/Runtime/Luthien.exe.
Instead of dedicated OS threads per task ("Render Thread", "Audio Thread"), Luth treats the CPU as a generic worker pool.
- N:M Threading: One Worker Thread per CPU core. Logical tasks are wrapped in Fibers aka lightweight user-mode stacks that migrate freely between workers.
- Zero Blocking: When a job waits on a dependency (or the GPU), it yields to the scheduler, which swaps in another fiber. CPU saturation stays near 100%.
- Synchronization: SpinLocks (test-and-set +
_mm_pause()) and Atomic Counters keep critical sections short, never blocks the OS.
Three stages overlap. At any frame T, the engine is processing three frames at once:
time ──►
┌──────────┬──────────┬──────────┬──────────┐
CPU game │ N │ N+1 │ N+2 │ N+3 │
├──────────┼──────────┼──────────┼──────────┤
CPU render │ N-1 │ N │ N+1 │ N+2 │
├──────────┼──────────┼──────────┼──────────┤
GPU exec │ N-2 │ N-1 │ N │ N+1 │
└──────────┴──────────┴──────────┴──────────┘
- Game (N): Transform / animation updates, then captures a
RenderSnapshotPOD into the frame'sLogicMemoryarena — the immutable handoff to the next stage. - Render (N-1): Reads frame N-1's snapshot, builds the render graph, dispatches per-pass secondary cmd buffer recording in parallel, submits.
- GPU (N-2): Executes the commands submitted previously.
Game and render run concurrently on worker fibers from frame 2 onward (frames 0/1 are a sync warm-up against the current frame). The frame boundary is the snapshot, not shared mutable state — Game writes to one FrameContext slot, Render reads from another. Stage-isolated subsystems that retain mutexes (MaterialSystem, BoneMatrixBuffer) assert they're only mutated from the game stage.
new / delete are forbidden in the hot path. Two allocators handle everything that churns:
Page Pool (2 MB virtual pages)
├── TaggedPageAllocator — CPU side, tagged lifetime, bulk free
│ └── per-thread cache — lock-free hot-path allocations
├── GPUTaggedPageAllocator — host-mapped device pages, freed when GPU N-2 retires
│ └── per-frame UBO/SSBO regions, descriptors rebind via UPDATE_AFTER_BIND
└── LinearAllocator — per-frame, reset on Begin()
- Tagged Page Allocator — Naughty Dog–style. Allocations carry a tag (
LevelGeometry,Frame_N, …) and are freed in bulk by tag. - GPU Tagged Page Allocator — sibling of the CPU side. Vends 2 MB pages from host-mapped device backings; bulk-freed when the GPU N-2 timeline value retires.
- Linear Allocator — bump-allocate transient frame data (command lists, UI state); resets each frame, no per-object destructors.
Persistent SSBOs (Material Set 2, Light Set 3, Object Set 5) are triple-buffered so frame N writes never overlap frame N-1 GPU reads.
Modern hardware, minimal driver overhead.
- Bindless Descriptors:
VK_EXT_descriptor_indexingbinds all engine textures to one global array (Set 0). Materials store an integer index — any draw call can sample any texture without rebinding. - Dynamic Rendering: No
VkRenderPass/VkFramebuffer— passes usevkCmdBeginRenderingdirectly. - Timeline Semaphores: Replace
vkWaitForFences. A dedicated Poller Job queries semaphore values and wakes dependent fibers only when the GPU finishes their workload. - Update-After-Bind: Per-frame UBO/SSBO descriptor sets are rewritten each frame as their backing GPU pages cycle, eliminating CPU-GPU sync on those bindings.
- VMA: Vulkan Memory Allocator handles all device-memory placement (buffers, images, staging).
Each frame, Luth builds a DAG of render passes. Passes declare reads and writes through a RenderPassBuilder; the graph solves pipeline barriers, culls unused passes, and computes resource lifetimes automatically.
graph.AddPass<GeometryPassData>("GeometryPass",
[&](GeometryPassData& data, RG::RenderPassBuilder& builder) {
data.depthTex = builder.WriteDepth(sceneDepth, ...);
data.outputTex = builder.Write(sceneColor);
data.indirect = builder.ReadIndirectBuffer(indirectBuffer);
},
[=](GeometryPassData& data, RG::RenderPassContext& ctx) {
// record draw commands on ctx.commandBuffer
});Passes execute in topological order; command-buffer recording inside each pass parallelizes across worker threads.
| PBR | Cook-Torrance BRDF, metallic/roughness, render-mode variants (Opaque/Cutout/Transparent) |
| Lighting | 1 directional + up to 64 point lights, ECS-driven |
| Shadows | 4-cascade PSSM, per-cascade GPU cull, PCF, cascade blending |
| Ambient Occlusion | GTAO half-res compute (prefilter → integrate → bilateral denoise) |
| GPU Culling | Compute frustum cull per cascade + main scene, indirect draws everywhere |
| IBL | HDR skybox, diffuse irradiance + pre-filtered specular + BRDF LUT, split-sum ambient |
| Post-Processing | HDR pipeline, bloom, 4 tonemap operators, vignette, grain, chromatic aberration |
| Shaders | Single-stage SPIR-V asset pipeline with UUIDs, hot-reload, SPIRV-Cross reflection |
| Pipeline Cache | Disk-persisted, lazy variant creation, targeted hot-reload invalidation |
| Mipmaps | Per-texture pipeline with sampler maxLod control |
| Sampling | Fiber-parallel keyframe evaluation |
| GPU Skinning | Bone matrix SSBO, vertex shader skinning |
| Blending | SQT interpolation, crossfade transitions, layered override with bone masks |
| Root Motion | Automatic extraction and application to entity transform |
| Debug | Bone overlay visualization in editor viewport |
| Backend | Jolt Physics 5.5.0, jobified onto the fiber scheduler |
| Rigid Bodies | Static / Kinematic / Dynamic with CCD, primitive + ConvexHull + Mesh shapes |
| Materials | UUID-keyed friction / restitution / density with hot-reload |
| Character Controller | Kinematic capsule via JPH::CharacterVirtual, default stair + stick-to-floor |
| Queries | Raycast + Overlap (box / sphere / capsule), layer-mask filtered |
| Events | Contact + trigger Add / Remove, drained per frame |
| Debug Draw | Wire colliders colored by motion state or character ground state |
| Asset Database | UUID-based registry with .meta sidecars, importers for shaders/textures/models/materials/animations |
| Smart Import | Multi-strategy texture discovery, drag-and-drop with eager import, texture remap dialog |
| Hot Reload | FileWatcher-based live reload for shaders, textures, and project files |
| Scene Format | Custom JSON .luth format with dirty tracking and native file dialogs |
| Scene Interaction | Mouse picking (ID buffer), selection outlines with occluded fade, shade modes (Lit/Wireframe/Unlit) |
| Inspector | Material editor, animation controls, light/shadow settings, Add Component workflow |
| Inspector Preview | Live orbit-camera 3D preview for Material/Model assets |
| Play Mode | Editing/Playing/Paused state machine, JSON scene snapshot, animation gating, transport bar |
| Game Panel | Dedicated camera-driven runtime view with letterbox, no overlays |
| Project Panel | Folder navigation, search, hot reload, context menus for entity/primitive creation |
| Thumbnails | Rendered previews for textures/meshes/materials in Project panel |
| Undo / Redo | Command pattern with UUID-based entity resolution, gizmo drag coalescing, compound commands, material snapshot undo |
| Frame Debugger | Freeze a frame, scrub through every draw, replay any single one to see what it did |
| Profiler | Per-system timing breakdown with fiber-aware instrumentation |
| Persistence | Window layouts, editor settings, and panel state saved across sessions |
See the full development roadmap for completed phases and version history.
Rendering — Forward+ clustered lighting, FXAA/TAA, deferred GBuffer, global illumination, volumetric fog, SSR
Gameplay — Scripting (C#/Lua), prefab system, ragdoll, GPU particle system, animation blend trees & IK
Editor — Asset streaming, visual shader editor
LUTH Engine is built on the shoulders of giants:
| Vulkan SDK | Rendering backend |
| VMA | Vulkan memory allocator |
| shaderc | Runtime GLSL → SPIR-V compilation (ships with Vulkan SDK) |
| SPIRV-Cross | Shader reflection |
| EnTT | Entity-Component-System |
| ImGui | Editor GUI |
| ImGuizmo | Translate / rotate / scale gizmos |
| Tracy | Frame profiler |
| GLFW | Windowing + input |
| GLM | Math |
| spdlog | Logging |
| assimp | Model importing |
| stb_image | Image loading |
| nlohmann/json | JSON serialization |
| Jolt Physics | Rigid body physics, jobified onto the fiber scheduler |
Released under the MIT License.

