Skip to content

Add Vulkan rendering backend#7233

Closed
laanwj wants to merge 41 commits into
scp-fs2open:masterfrom
laanwj:vulkan-pr
Closed

Add Vulkan rendering backend#7233
laanwj wants to merge 41 commits into
scp-fs2open:masterfrom
laanwj:vulkan-pr

Conversation

@laanwj

@laanwj laanwj commented Feb 16, 2026

Copy link
Copy Markdown
Contributor

Implement a Vulkan 1.1 renderer that replaces the previous stub with a fully functional backend, mostly matching the OpenGL backend's rendering capabilities. The game should be playable with minimal divergence from OpenGL rendering.

This is, most likely, too big to go in all at once, but just filing it here for reference because it's reached a testable state.

Core rendering infrastructure. The code lives under code/graphics/vulkan:

  • VulkanMemory: Custom allocator with sub-allocation from device-local and host-visible memory pools
  • VulkanBuffer: Per-frame bump allocator for streaming uniform/vertex/index data (persistently mapped, double-buffered, auto-growing)
  • VulkanTexture: Full texture management including 2D, 2D-array, 3D, and cubemap types with automatic mipmap generation and sampler caching
  • VulkanPipeline: Lazy pipeline creation from hashed render state, with persistent VkPipelineCache
  • VulkanShader: SPIR-V shader loading (main, deferred, effects, post-processing, shadows, decals, fog, MSAA resolve, etc.)
  • VulkanDescriptorManager: 3-set descriptor layout (Global/Material/PerDraw) with per-frame pool allocation, auto-grow, and batched updates
  • VulkanDeletionQueue: Deferred resource destruction synchronized to frame-in-flight fences

Design choices:

  • Two frames in flight with fence-based synchronization
  • Asynchronous texture upload, no waitIdle or other CPU-on-GPU blocking in hot path
  • Single command buffer per frame; render passes begun/ended as needed for the multi-pass deferred pipeline
  • Per-frame descriptor pools
  • All descriptor bindings pre-initialized with fallback resources (zero UBO + 1x1 white texture) so partial updates never leave undefined state
  • Streaming data (such as immediates) uses a bump allocator (one large VkBuffer per frame)
  • Pipeline cache persisted to disk for fast startup on subsequent runs

Some notable Vulkan vs OpenGL differences are:

  • Because shaders are pre-compiled to SPIR-V, shader variants are less feasible in Vulkan. Preprocessing directives have been converted to run-time uniform based branching.
  • Depth range is [0,1] not [-1,1]: shadow projection matrices adjusted, shaders that linearize depth need isinf/zero guards at depth boundaries where OpenGL gives finite values
  • Vulkan render target is "upside down", y-flip for render target is handled through negative viewport height, as is common
  • gl_ClipDistance is always evaluated: must write 1.0 when clipping is disabled (OpenGL allows leaving it uninitialized)
  • Texture addressing for AABITMAP/INTERFACE/CUBEMAP forced to clamp (OpenGL's sampler state happens to do this implicitly)
  • Render pass architecture requires explicit transitions between G-buffer, shadow, decal, light accumulation, fog, and post-processing passes (OpenGL just switches FBO bindings)
  • No geometry shaders. They're possible with Vulkan, but less common. Currently they're not used.

Preparation patches to common game code (these commits need to go in first):

  • Extract sphere and cylinder mesh generation into shared graphics utility: Needed in both GL and Vulkan
  • Route ImGui calls through gr_screen function pointers: Makes it possible for the Vulkan backend to provide its own ImGui implementation
  • Free bitmaps before destroying graphics backend: Fix shutdown order issue
  • Use float shader input instead of SCREEN_POS in gr_flash_internal: Compatibilty with Vulkan shaders
  • Remove now-unused SCREEN_POS vertex format: Cleanup after previous commit
  • Add dds_block_size and dds_compressed_mip_size utilities: Factor out utility code to be used in Vulkan backend
  • Add CAPABILITY_QUERIES_REUSABLE for GPU queries: Vulkan needs different lifecycle for GPU queries
  • Fix gr_flip debug output ordering: Prevent immediate buffer from being overwritten
  • Fix gr_end_2d_matrix viewport for render-to-texture: Fix RTT for Vulkan
  • Fix undefined gl_ClipDistance and use uint for std140 bool: Shader compatibility with Vulkan
  • Fix shader build MAIN_DEPENDENCY and add conditional GLSL/struct generation: Build system change for OpenGL/Vulkan shader split
  • Add missing memcpy_if_trivial_else_error for void *, const void*

What's possibly left to be done:

  • Unify OpenGL and Vulkan shaders where possible: the only shader shared with OpenGL (defined in the buid system's SHADERS_GL_SHARED) is still the default material. Although the Vulkan backend does some things differently, it would definitely be possible to share more code. But i didn't want to accidentally break OpenGL in some way.

  • Integrate VMA (Vulkan Memory Allocator). Some of the memory handling could be simplified by importing this dependency.

  • OpenXR anything. This is currently not implemented at all.

Build steps:

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DFSO_BUILD_WITH_VULKAN=ON -DFSO_BUILD_WITH_OPENXR=OFF
cmake --build build

To run (with maximum debugging and Vulkan layer validation):

export VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation
export VK_LOADER_DEBUG=all

build/bin/fs2_open_25_1_0_x64_AVX2-DEBUG -vulkan -gr_debug -stdout_log -profile_frame_time

Full disclosure: i used Claude Opus 4.6 while developing this. However, the overall direction and design is my own, and i've paid careful attention to the code.

@BMagnu

BMagnu commented Feb 17, 2026

Copy link
Copy Markdown
Member

Thanks for the PR!
I'll be looking at it and playing around with it soon.
Please be aware, this being as big as it is, that it might be a while until we get through it.

@BMagnu

BMagnu commented Feb 17, 2026

Copy link
Copy Markdown
Member

Okay, played around with it a little.
Got it to run, though with a slew of visual artifacts and some crashes on some mods.
Still, a great first step to see it running in vulkan, at quite impressive performance numbers.
I'd love to discuss some of the design decisions in more detail. Are you on the discord, or somewhere else sensible for extended discussion?

@The-E

The-E commented Feb 19, 2026

Copy link
Copy Markdown
Member

I played around with it a little bit as well, and using nSight I could at least get as far as seeing that rendering a background with a skybox causes some amount of corruption to get into the main framebuffer - I haven't yet been able to see where it's coming from (as rendering a skybox should be one of the simpler things, just some basic geo and a couple textures, no lighting), but, well, there it is.

@laanwj

laanwj commented Feb 21, 2026

Copy link
Copy Markdown
Contributor Author

Are you on the discord, or somewhere else sensible for extended discussion?

i'm on the hard-light discord server i'm "mara" there. i'm not very active on discord, but happy to discuss.

Got it to run, though with a slew of visual artifacts and some crashes on some mods.

To be honest i've only been replaying the retail campaign with it. So code paths not exercised there will be less (or even not) tested. Please let me know which mods this happens with!

I played around with it a little bit as well, and using nSight I could at least get as far as seeing that rendering a background with a skybox causes some amount of corruption to get into the main framebuffer - I haven't yet been able to see where it's coming from (as rendering a skybox should be one of the simpler things, just some basic geo and a couple textures, no lighting), but, well, there it is.

Thanks for trying. i've installed NVidia Coresight but couldn't get it to report any issues in the level i tried (and i'd been using Vulkan's validation layer as well as RenderDoc during development and it should be clean). Can you send me the level file this happens with? And the messages that i should watch for?

@The-E

The-E commented Feb 21, 2026

Copy link
Copy Markdown
Member

Thanks for trying. i've installed NVidia Coresight but couldn't get it to report any issues in the level i tried (and i'd been using Vulkan's validation layer as well as RenderDoc during development and it should be clean). Can you send me the level file this happens with? And the messages that i should watch for?

My testing was done using the latest mediaVPs mod, running both the first mission and using the lab environment.
image

RenderDoc capture available here: https://drive.google.com/file/d/1ficdGUP-e8xfmUjZWAzWZmwpa9aAtFmt/view?usp=drive_link

@BMagnu

BMagnu commented Feb 21, 2026

Copy link
Copy Markdown
Member

I was similarly running the MediaVPs' first mission (where I got the artifacts), and I was testing the "Icarus" Cutscene from Blue Planet (which crashes on trying to render the opening movie, skippable with -nomovies)

@Shivansps

Copy link
Copy Markdown
Contributor

Im not in any position to ask, but instead of getting the vulkan lib and headers currently installed in the host system, maybe its better to use a glad2 loader for vulkan in the same way as it is for OpenGL?

@SamuelCho

Copy link
Copy Markdown
Contributor

Wow, nice work. Pretty straightforward design, nothing surprising. I have a local WIP DX12 implementation I've been working on and off on in my spare time for general practice and I see a lot of similar decisions you've made here in your VK implementation.

I kind of wonder if we need to double buffer the immediate buffer so that we leave alone the one that's in-flight. But maybe it doesn't matter if the fence in the command buffer submission and flip takes care of everything. Or is it the buffer manager that keeps track of the frame num?

Surprised that my batching code made it out intact. Also surprised that my render primitives immediate code also made it out intact. Sorry if it caused any headaches.

@Shivansps

Shivansps commented Feb 28, 2026

Copy link
Copy Markdown
Contributor

Ill put this in here for reference in case anyone is interested.

i did tried to see if i can change it to use the glad2 loader instead, as i expected since it is using vulkan.hpp, it is using the Vulkan C++ bindings, glad 2 loader has the C bindings, in the exact same way as with the version OpenGL. So its not a huge amount of work to change it, but it is still considerable work to change all bindings. (like 2-3 days). It is some work just to get it compile again not knkwing it is going to still work after that.

I also got the current PR version to compile for android by just adding the missing .hpp vulkan headers to the Android NDK, not elegant as im adding stuff to the toolchain but, it will do for now. Buuuuut it does not compile for 32 bits (x86/arm32), but it does for x86_64/arm64, not sure if this also the case for regular builds

On my phone with a Mali-G57
fs2_open.log
Crashes during shader compilation
0000000001911d04 /vendor/lib64/egl/libGLES_mali.so (cmpbe_v2_compile_multiple_shaders+2372) (BuildId: 747cc1a89e3838ab)
02-28 11:41:37.638 7140 7140 F DEBUG : Cause: null pointer dereference

On my Retroid G2 Handheld with a Qualcomm G2 and an Adreno 22 GPU, it fails to init vulkan because it lacks a transfer queue. I guess it is VK_QUEUE_TRANSFER_BIT? So its not completely 1.1 it uses an optional extension/feature.
fs2_open.log.txt

@laanwj

laanwj commented Mar 1, 2026

Copy link
Copy Markdown
Contributor Author

@The-E

My testing was done using the latest mediaVPs mod, running both the first mission and using the lab environment.

Thanks. The renderdoc capture should be helpful for reproduction.
(had to send a request to access it)

@SamuelCho

I kind of wonder if we need to double buffer the immediate buffer so that we leave alone the one that's in-flight. But maybe it doesn't matter if the fence in the command buffer submission and flip takes care of everything. Or is it the buffer manager that keeps track of the frame num?

It does. This is handled purely in the Vulkan layer. The buffer manager does a double buffering of all dynamic and streaming buffers in FrameBumpAllocator m_frameAllocs[MAX_FRAMES_IN_FLIGHT]. So it should have the same behavior as the GL backend with regard to orphaned buffers.
(It's also enforced that dynamic and streaming buffer content isn't reused between frames, by throwing a failure in that case)

Surprised that my batching code made it out intact. Also surprised that my render primitives immediate code also made it out intact. Sorry if it caused any headaches.

Hahah it wasn't too bad!

Im not in any position to ask, but instead of getting the vulkan lib and headers currently installed in the host system, maybe its better to use a glad2 loader for vulkan in the same way as it is for OpenGL?

Will look into it. It seems it would be way easier to vendor vulkan.hpp instead of switching to using C bindings, so i'll go for that first.

@laanwj

laanwj commented Mar 1, 2026

Copy link
Copy Markdown
Contributor Author

Okay. i've bundled the Vulkan and Vulkan-CPP headers in lib/vulkan-headers and updated the build system for this. Function loading was already happening dynamically through SDL, except for ImGui, which now does so too. i did not need to use glad2.

With this, it should be possible to build it on (or for) platforms without the Vulkan library and headers installed.

@laanwj

laanwj commented Mar 1, 2026

Copy link
Copy Markdown
Contributor Author

Trying to get it to pass the CI now. Will squash all these changes into the main (or otherwise original) commit when done.

@laanwj

laanwj commented Mar 1, 2026

Copy link
Copy Markdown
Contributor Author

i'm not happy where clang-tidy is taking some of these. It first wants to make these functions static (because it could), and now it want to refer to them by fully qualified class name instead of instance:

-	auto* texSlot = texManager->getTextureSlot(handle);
+	auto* texSlot = graphics::vulkan::VulkanTextureManager::getTextureSlot(handle);
-	drawManager->stencilClear();
+	graphics::vulkan::VulkanDrawManager::stencilClear();

Which is strictly correct but it's also less readable, and asymmetric with the rest of the API. Will see if (void)this works.

Edit: it did. Will look into rendering issues next.

@laanwj laanwj force-pushed the vulkan-pr branch 2 times, most recently from e5c9a34 to 61f890e Compare March 1, 2026 22:36
@GamingCity

GamingCity commented Mar 2, 2026

Copy link
Copy Markdown

Hi, Shivansps here, im on a diferent account, i think i know why it says there is no transfer queue on the adreno driver.

I think this if here is wrong
https://github.com/laanwj/fs2open.github.com/blob/61f890e2966bbce9650d94eee9249ce13cae864b/code/graphics/vulkan/VulkanRenderer.cpp#L104

if (!values.transferQueueIndex.initialized && queue.queueFlags & vk::QueueFlagBits::eTransfer) {
//False if no eTransfer (optional)
} else if (queue.queueFlags & vk::QueueFlagBits::eTransfer && !(queue.queueFlags & vk::QueueFlagBits::eGraphics)) {
//False if no eTransfer (optional)
}

Acording to the documentation
https://registry.khronos.org/VulkanSC/specs/1.0-extensions/man/html/VkQueueFlagBits.html

"All commands that are allowed on a queue that supports transfer operations are also allowed on a queue that supports either graphics or compute operations. Thus, if the capabilities of a queue family include VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT, then reporting the VK_QUEUE_TRANSFER_BIT capability separately for that queue family is optional."

eGraphics (and eCompute) all include a transfer queue but may not report it.
So i think " & vk::QueueFlagBits::eTransfer" should be removed from the first if and assume it is. (and maybe make sure it is not eCompute? im not sure about that)

@laanwj

laanwj commented Mar 2, 2026

Copy link
Copy Markdown
Contributor Author

So i think " & vk::QueueFlagBits::eTransfer" should be removed from the first if and assume it is. (and maybe make sure it is not eCompute? im not sure about that)

Good catch. Yes, the logic there is wrong. "It worked on NVidia" 😊 Will fix.

Edit: Mind that the transfer queue is currently unused, as this makes the upload code simpler, due to there being no cross-queue synchronization requirement. In the current design there wouldn't be a benefit to using it, just overhead, as there's (AFAIK) no way to exploit parallelism here. So we could even decide to completely remove checking for it.

@laanwj

laanwj commented Mar 2, 2026

Copy link
Copy Markdown
Contributor Author

i've pushed a few rendering corruption fixes. Some wrong assumptions about renderpass state, and Vulkan vs GL differences. The cubemap corruption and random framebuffer noise should be solved now.

@Shivansps

Shivansps commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Just reporting back here, the change to the transfer queue selection did work. Now the Adreno GPU works and can get into the game.
The Mali GPU still crashes while compiling the default material shader, but no matter, ill guess that will be something to look at after the PR is merged.

@The-E

The-E commented Mar 3, 2026

Copy link
Copy Markdown
Member

Alright, your latest changes definitely fixed the framebuffer corruption, but I have more:
image
Not exactly sure what's going on here, but it seems there's something going weird when post processing is enabled: without post processing, the frame renders normally
One thing to examine would be wireframe rendering: I think there might be some options not being set correctly here.
image
Note that shutting off post processing in the lab also turns off imgui rendering.

Transparency is not rendered correctly
image

Particle and glowpoint blending modes are not set correctly:
image
Note the black halo around the glowpoint attached to the chin fin (or whatever that thing is called....)

textures appear to be downsampled in the lab:
image

I would also recommend running through the Blue Planet: War in Heaven intro - it shows a couple instances of textures rendered as pure white for some reason

@GamingCity

Copy link
Copy Markdown

Today i saw two things:
again, ill remember you android is not a working platform yet and ill work on the android PR after this and SDL3 is merged, so, i just mention things to keep track of it to see if can be fixed or it creates problems on other platforms. That said i dont know why 32bit CI does not complains about this and is only a problem on the ndk toolchain.

  1. I discovered why i was unable to compile 32 bits builds with the android-ndk, there is a mix of C and C++ types here:
    VulkanMemory.h
    struct VulkanAllocation {
    VkDeviceMemory memory = VK_NULL_HANDLE;
    VkDeviceSize offset = 0;
    VkDeviceSize size = 0;
    void* mappedPtr = nullptr; // Non-null if memory is mapped
    uint32_t memoryTypeIndex = 0;
    bool dedicated = false; // True if this is a dedicated allocation
    };

Changing to C++ types fixes 32 bit compilation

struct VulkanAllocation {
vk::DeviceMemory memory = VK_NULL_HANDLE;
vk::DeviceSize offset = 0;
vk::DeviceSize size = 0;
void* mappedPtr = nullptr; // Non-null if memory is mapped
uint32_t memoryTypeIndex = 0;
bool dedicated = false; // True if this is a dedicated allocation
};

Why this compiles its not going to work or it is going to have additional issues as VulkanPipeline.cpp has shifts to go out of range for 32 bit types.
shift warnings.txt

@BMagnu

BMagnu commented Mar 3, 2026

Copy link
Copy Markdown
Member

While not an immediate priority, I'd love to question the following design goal:
"Because shaders are pre-compiled to SPIR-V, shader variants are less feasible in Vulkan. Preprocessing directives have been converted to run-time uniform based branching."

Long / Medium term, I would like for FSO to ship with shadertool or something to allow it to compile to SPIR-V itself. This gets rid of a lot of issues here. First, we'd be able to keep text-based shaders that can be dual-use for OpenGL and Vulkan. Any incompatibilities can just be put in preprocessor blocks like main-f's prereplace, allowing full dual-use of all shaders. Furthermore, it'd allow table-able postprocessing and shader changes. While currently a full shader replace is necessary for custom shaders, I eventually want this to be properly modular, so being able to modify parts of shaders is a goal, and that for sure requires compilation on-the-fly.

Shipping with shadertool and then compiling on load (ideally after game-settings.tbl, especially since the recent Z-Compress changes) all available shaderfiles to SPRIV shouldn't be that hard either.

@laanwj

laanwj commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

@Shivansps
Good!
i could in principle test Android + Adreno on my Ayn Thor. i don't have any device with a Mali GPU. But one thing at a time. i've never really done android development so it'll be some things to figure out.

@GamingCity
Ah yes, you're right. It's better to be consistent about using the C++ vulkan types instead of the C ones. Will switch it over. Though i'm very surprised that it makes a difference in practice.

@The-E
Thanks for the reports. At least the rendering issues are getting more subtle.

@BMagnu
Yes. i think it would be fine to make FSO depend on a GLSL-to-SPIR-V compiler library, and then do the compilation at runtime instead of compile-time. i can look into it.
i was just trying to be careful here to not introduce any big dependencies.
In principle, modular shaders can also be done with simpler SPIR-V level linking. But that'd be incompatible with the goal of unifying with the OpenGL backend.

@Shivansps

Copy link
Copy Markdown
Contributor

Please, I dont want to make you waste time on android testing, its not even a working platform yet. Ill post if i can find out something.

If you want to see i have a Fso_Android_Wrapper](https://github.com/Shivansps/Fso_Android_Wrapper) as the android test app, Fso-Android-Prebuilts were i have the script and instructions to build the fso dependencies and fso itself, and i have a "android-build-vulkan" branch on my fork where i added this pr to my previous android work,

I did found one problem with android on VulkanRenderer
createInfo.preTransform = deviceValues.surfaceCapabilities.currentTransform;

It seems that if you leave at that and use
SDL_SetHint(SDL_HINT_ORIENTATIONS, "LandscapeLeft LandscapeRight");
that im using to force landscape mode on android, it will (if i understood right) get a surface that is already rotated and then rotate it again, making it rander in portrait mode.

I changed it to this that did worked.

auto supported = deviceValues.surfaceCapabilities.supportedTransforms;
if (supported & vk::SurfaceTransformFlagBitsKHR::eIdentity) {
createInfo.preTransform = vk::SurfaceTransformFlagBitsKHR::eIdentity;
} else {
createInfo.preTransform = deviceValues.surfaceCapabilities.currentTransform;
}

I dont know if thats the right fix, it does not seems to do anything in windows. Keep in mind i used an AI to point me to this and the potential fix as i did not know if anything in vulkan could cause this, it told me to check where the preTransfor and surface capabilities are set for the transform and that i should use the eidentity flag.

https://docs.vulkan.org/refpages/latest/refpages/source/VkSurfaceTransformFlagBitsKHR.html

@BMagnu

BMagnu commented Mar 5, 2026

Copy link
Copy Markdown
Member

Fair enough re: compiling shaders and large dependencies, but I think it is worth here.
Even just having a unified backend to maintain (where the shader compiler likely needs little to no continued maintainance) is worth it alone IMO, but with tableable variants, it is for sure.

@JohnAFernandez

JohnAFernandez commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

I may have seen that issue with the Triton on a Radeon iGPU in the past, but that was around 2 years ago when BTA 2 first came out. It would happen with some models but not with others.

@SamuelCho

Copy link
Copy Markdown
Contributor

Bashed my head against RenderDoc this entire week but this should fix corrupted geometry for real: 77610a8

It looks like it was a matter of checking the transparency buffer indexed vertices count to make sure we weren't exceeding the unsigned short maximum.

@notimaginative

Copy link
Copy Markdown
Contributor

@SamuelCho Yep, that fixed it! 👍

@SamuelCho

Copy link
Copy Markdown
Contributor

I think this fixes shadows: SamuelCho@ecc1507

The Vulkan backend tried to directly set the shadow viewport and scissor regions to the command buffer but it would get overridden by the material pipeline config when setting the model material. I guess Claude didn't think it needed the state tracker for viewport and scissor states for shadow map passes.

Along with that, the triangle winding order also needed to be reversed for shadows so I put in a little exemption for shadow map rendering when rendering models. I don't think we had to do this in OpenGL so this may be a temp fix until we figure out some more comprehensive way to account for the differences as before.

Shadows still don't work in the tech room BTW. There's a hard coded assumption in the Vulkan shadow start function that assumes we require g-buffers to render shadows which is also an incorrect assumption Claude made that needs to be fixed as we still use forward rendering in places. That's going to be the next thing I'll be working on.

@notimaginative

Copy link
Copy Markdown
Contributor

Shadows are working for me now in the lab and in mission. 👍

I do get broken shadows in the tech room though. It appears to happen when you view a ship in the lab (Fenris in this case, didn't try others) and then go to the tech room. Any ship that you view will have a fixed position shadow over it that generally matches the shadow cast by the ship you viewed in the lab. (Using shadow quality of medium, in case that makes a difference)

I only mention it because I assumed it wouldn't work at all, but then saw a ship half-covered by a shadow. I'm guessing that will be a non-issue when you get tech room shadows fixed but thought I'd point it out in case it indicates a state/rendering bug that would otherwise be hidden.

@SamuelCho

Copy link
Copy Markdown
Contributor

This should now fix techroom and mission briefing shadows: SamuelCho@c21dbb1

Not the prettiest fix but Vulkan's render pass system makes binding render targets a bit more complicated. It isn't as simple as the push/pop framebuffer state tracking we had for the OpenGL side. So, combined with Claude's understandable generalizations, there's some unnecessary render pass binds happening that need to be looked at again. But at least this gets us closer to stable.

@notimaginative

Copy link
Copy Markdown
Contributor

That works great in the techroom and briefings, but it's crashing for me otherwise (lab and in-mission). I am on a Mac, so it's doing Vulkan->Metal translation, and that might be triggering the error. The backtrace certainly seems to indicate that the assertion is in the Metal side. I'll try to confirm that same behavior on a Linux box when I get time.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #0: 0x000000018aaae5e8 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000105c26668 libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x000000018a9f0644 libsystem_c.dylib`abort + 148
    frame #3: 0x000000018a9ef8a0 libsystem_c.dylib`__assert_rtn + 284
    frame #4: 0x000000019733ab20 Metal`MTLReportFailure.cold.1 + 48
    frame #5: 0x0000000197301db8 Metal`MTLReportFailure + 576
    frame #6: 0x00000001972f7280 Metal`_MTLMessageContextEndNewNSErrorOrAbort(_MTLMessageContext*, bool, NSString*, unsigned long) + 992
    frame #7: 0x000000018b49081c MetalTools`-[MTLDebugRenderCommandEncoder setRenderPipelineState:] + 580
    frame #8: 0x000000011ca388fc MoltenVK`MVKMetalGraphicsCommandEncoderState::prepareDraw(id<MTLRenderCommandEncoder>, MVKCommandEncoder&, MVKVulkanGraphicsCommandEncoderState const&, MVKVulkanSharedCommandEncoderState const&) + 360
    frame #9: 0x000000011c9c879c MoltenVK`MVKCommandEncoder::finalizeDrawState(MVKGraphicsStage) + 88
    frame #10: 0x000000011c9c2f58 MoltenVK`MVKCmdDrawIndexed::encode(MVKCommandEncoder*) + 468
    frame #11: 0x000000011c9c59f0 MoltenVK`MVKCommandEncoder::encode(id<MTLCommandBuffer>, MVKCommandEncodingContext*) + 252
    frame #12: 0x000000011c9c62f8 MoltenVK`MVKCommandBuffer::submit(MVKQueueCommandBufferSubmission*, MVKCommandEncodingContext*) + 1456
    frame #13: 0x000000011ca1e688 MoltenVK`MVKQueueFullCommandBufferSubmission<1ul>::submitCommandBuffers() + 96
    frame #14: 0x000000011ca1cc94 MoltenVK`MVKQueueCommandBufferSubmission::execute() + 304
    frame #15: 0x000000011ca1aee8 MoltenVK`MVKQueue::submit(MVKQueueSubmission*) + 212
    frame #16: 0x000000011ca1b3b0 MoltenVK`VkResult MVKQueue::submit<VkSubmitInfo>(unsigned int, VkSubmitInfo const*, VkFence_T*, MVKCommandUse) + 352
    frame #17: 0x000000011c97e408 MoltenVK`vkQueueSubmit + 96
  * frame #18: 0x00000001004cfcec fs2_open_25_1_0_arm64-DEBUG`void vk::Queue::submit<vk::detail::DispatchLoaderDynamic, true>(this=0x0000000a633bca10, submits=0x000000016fdfd040, fence=(m_fence = 0x0000000a62c12b20), d=0x0000000101acc7c0) const at vulkan_funcs.hpp:936:7 [inlined]
    frame #19: 0x00000001004cfbfc fs2_open_25_1_0_arm64-DEBUG`graphics::vulkan::VulkanRenderFrame::submitAndPresent(this=0x0000000a633bca00, cmdBuffers=0x0000000106809170) at VulkanRenderFrame.cpp:86:18
    frame #20: 0x0000000100511174 fs2_open_25_1_0_arm64-DEBUG`graphics::vulkan::VulkanRenderer::flip(this=0x0000000106808f40) at VulkanRenderer.cpp:1202:49
    frame #21: 0x0000000100593030 fs2_open_25_1_0_arm64-DEBUG`graphics::vulkan::(anonymous namespace)::vulkan_flip() at gr_vulkan.cpp:56:21
    frame #27: 0x0000000100009cb8 fs2_open_25_1_0_arm64-DEBUG`std::__1::function<void ()>::operator()(this= Function = graphics::vulkan::(anonymous namespace)::vulkan_flip() ) const at function.h:772:10
    frame #28: 0x00000001002928f0 fs2_open_25_1_0_arm64-DEBUG`gr_flip(execute_scripting=true) at 2d.cpp:2963:3
    frame #29: 0x0000000100015364 fs2_open_25_1_0_arm64-DEBUG`game_flip_page_and_time_it() at freespace.cpp:3616:2
    frame #30: 0x0000000100017bc0 fs2_open_25_1_0_arm64-DEBUG`game_do_full_frame(clear_time2=0x000000016fdfdc14, render3_time1=0x000000016fdfdc28, render3_time2=0x000000016fdfdc24, render2_time1=0x000000016fdfdc30, render2_time2=0x000000016fdfdc2c, flip_time1=0x000000016fdfdc20, flip_time2=0x000000016fdfdc1c, offset=0x0000000000000000, rot_offset=0x0000000000000000, fov_override= Active Type = float ) at freespace.cpp:4169:3
    frame #31: 0x000000010001855c fs2_open_25_1_0_arm64-DEBUG`game_frame(paused=false) at freespace.cpp:4270:5
    frame #32: 0x0000000100019134 fs2_open_25_1_0_arm64-DEBUG`game_do_frame(set_frametime=true) at freespace.cpp:4567:2
    frame #33: 0x000000010001de78 fs2_open_25_1_0_arm64-DEBUG`game_do_state(state=2) at freespace.cpp:6420:4
    frame #34: 0x000000010025734c fs2_open_25_1_0_arm64-DEBUG`gameseq_process_events() at gamesequence.cpp:393:2
    frame #35: 0x000000010001f3a8 fs2_open_25_1_0_arm64-DEBUG`game_main(argc=13, argv=0x000000016fdff420) at freespace.cpp:6914:11
    frame #36: 0x0000000100021324 fs2_open_25_1_0_arm64-DEBUG`main(argc=13, argv=0x000000016fdff420) at freespace.cpp:8086:12
    frame #37: 0x000000018a72be00 dyld`start + 6992

@SamuelCho

SamuelCho commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Hmm, that's too bad. I did notice a discrepancy with the scene render pass load that gets bound after shadow pass rendering. Maybe this'll do the trick? SamuelCho@6189228

@notimaginative

Copy link
Copy Markdown
Contributor

Sorry, the actual assertion message would have been super useful I think. I'd like to blame lack of sleep for not including that, but honestly I just missed it.

-[MTLDebugRenderCommandEncoder setRenderPipelineState:]:1639: failed assertion `Set Render Pipeline State Validation
For color attachment 1, the render pipeline's pixelFormat (MTLPixelFormatInvalid) does not match the framebuffer's pixelFormat (MTLPixelFormatRGBA16Float).
For color attachment 2, the render pipeline's pixelFormat (MTLPixelFormatInvalid) does not match the framebuffer's pixelFormat (MTLPixelFormatRGBA16Float).
For color attachment 3, the render pipeline's pixelFormat (MTLPixelFormatInvalid) does not match the framebuffer's pixelFormat (MTLPixelFormatRGBA8Unorm).
For color attachment 4, the render pipeline's pixelFormat (MTLPixelFormatInvalid) does not match the framebuffer's pixelFormat (MTLPixelFormatRGBA16Float).
For color attachment 5, the render pipeline's pixelFormat (MTLPixelFormatInvalid) does not match the framebuffer's pixelFormat (MTLPixelFormatRGBA16Float).

@notimaginative

Copy link
Copy Markdown
Contributor

Finally got the chance to test on my Linux box as well. Tech room and loadout looks fine, and while it doesn't crash, rendering is completely broken in the lab and mission and models are generally black with some odd sparkles or something visible as you move the models around.

If shadows are disabled then rendering is fine on Linux, and it stops crashing on Mac.

@SamuelCho

Copy link
Copy Markdown
Contributor

Okay, that assertion is what I needed to see. We weren't updating the number of color attachments in the state trackers so let's see if this fixes it: SamuelCho@2b714c6

@notimaginative

Copy link
Copy Markdown
Contributor

No change on either platform I'm afraid.

I'm including the debug log from the Mac this time, and it stops at the assertion (which isn't logged). The one on Linux is functionally the same, but continues on past that point without any error or warning messages, and without properly rendering any models.

fs2_open.log

My graphics knowledge pretty much stopped at OpenGL 2, so I realize I'm not much help here. But if there is something you'd like me to try locally or that I should look for let me know. Or just send me a patch with a bunch of printf's to get more info, if that helps.

@SamuelCho

Copy link
Copy Markdown
Contributor

I finally realized that I wasn't testing with Vulkan validation layers turned on. So hopefully this should let us catch some errors that I normally wouldn't see on my own machine. For posterity, the -gr_debug commandline argument enables the validation layers. Though, I didn't see anything get reported to the log but I was able to see the errors in RenderDoc once I made a capture.

Long story short, I made a whoopsy thinking that I could change the expected layout for the Load scene render pass to use after the shadow map pass. I changed it back to normal and made a new render pass for the purposes of resuming after shadow map rendering. I hope this solves the problem. Hopefully the validation layer caught all the problems across platforms. SamuelCho@1305c14

@notimaginative

Copy link
Copy Markdown
Contributor

Still no change.

I thought that maybe I just had something in my tree that was b0rked, but gave a Windows test build to someone else and they said it worked fine. So for whatever reason it works on Windows but not Mac or Linux.

Keeping in mind that I have no real understanding of this code, nor the affect of any changes made, I made some incremental changes through trial and error. These adjustments got it all working for me: vk_mac_test.patch. Perhaps you can spot some difference there to help narrow down the problem.

For reference, if I disable validation and just let it get funky with the current broken behavior, it renders like this (but animated/flashing):
SCR-20260621-ednn

@wookieejedi

Copy link
Copy Markdown
Member

I tried this following build here on Windows,
https://pxo.nottheeye.com/files/test/fs2open/vulkan-test.7z

based on the branch
https://github.com/notimaginative/fs2open.github.com/tree/vulkan-pr-FIXES

Running with shadows enabled in retail, plus the -vulkan I restarted my computer, cleared by previous caches, and tested that build again. Now ships do not show up in the lab 😄 Ships in the techroom do show up properly though and properly display their shadows.

@wookieejedi

Copy link
Copy Markdown
Member

Update, got a build based on current master and lab now properly renders.

@notimaginative

notimaginative commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

The vulkan-test.7z build linked previously is the one in question. It also includes a hack to use the vk_mac_test patch if the -window cmdline option is specified, and otherwise to not use it. The vulkan-pre-FIXES branch is rebased against current master, contains all of SamuelCho's changes, as well as a few of my own.

@SamuelCho

Copy link
Copy Markdown
Contributor

That patch also renders fine on my machine just fine but it threw a validation error unfortuantely:

831 API High Miscellaneous 3048005104 Validation Error: [ VUID-vkCmdBeginRenderPass-initialLayout-00900 ] Object 0: handle = Command Buffer 10998, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = Render Pass 308, type = VK_OBJECT_TYPE_RENDER_PASS; Object 2: handle = Framebuffer 309, type = VK_OBJECT_TYPE_FRAMEBUFFER; Object 3: handle = 2D Color Attachment 237, type = VK_OBJECT_TYPE_IMAGE; Object 4: handle = Image View 240, type = VK_OBJECT_TYPE_IMAGE_VIEW; | MessageID = 0xb5acddf0 | vkCmdBeginRenderPass(): pCreateInfo->pAttachments[0] You cannot start a render pass using attachment 0 where the render pass initial layout is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL and the previous known layout of the attachment is VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL. The layouts must match, or the render pass initial layout for the attachment must be VK_IMAGE_LAYOUT_UNDEFINED. The Vulkan spec states: If the initialLayout member of any of the VkAttachmentDescription structures specified when creating the render pass specified in the renderPass member of pRenderPassBegin is not VK_IMAGE_LAYOUT_UNDEFINED, then each such initialLayout must be equal to the current layout of the corresponding attachment image subresource of the framebuffer specified in the framebuffer member of pRenderPassBegin (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-vkCmdBeginRenderPass-initialLayout-00900)

It was still useful to use as a reference so thanks for that. I tried to compare the render pass info in that gbuffer pass with the scene render pass resume. I think this was the only discrepancy FWIW: SamuelCho@63c7b1e

@notimaginative

Copy link
Copy Markdown
Contributor

No difference with that change. However!! ...

I enabled MSAA and it started working. Disabled MSAA again and it breaks.

So with MSAA (4x is all I tried) it works. Well, technically it's still broken, just in a different way. The model renders properly in the lab, but everything else is fubar...

SCR-20260627-efom Large

That's with my patch disabled obviously. And no Metal validation errors triggered from that surprisingly. The model will mess up to some extent depending on the background (even if no background is selected), and in-mission it's an absolute mess.

The log doesn't have more than the typical vulkan log spam. Though I did get a few more lines of this than before:

Vulkan message: [MoltenVK]: VK_ERROR_FEATURE_NOT_PRESENT: Metal does not support disabling primitive restart.

Not sure if that matters at all.

Hopefully that info provides you with some useful clue.

@wookieejedi

Copy link
Copy Markdown
Member

Edit, given I was using Talyor's custom branch and the -window cmd line, removed the -window cmd line and now the lab does not render for me on Windows. If I added -window back to enable taylor's fixes then the lab rendered.

@SamuelCho

Copy link
Copy Markdown
Contributor

Alright so I made an idiotic mistake in not double checking to see what vulkan_scene_texture_begin() did before doing this scene render pass nonsense. I assumed scene_texture_begin() always just bound the scene framebuffer but no, it actually binds a g-buffer framebuffer if deferred lighting is on. It only binds the scene framebuffer if deferred lighting is off. The latest changes shared here definitely helped me figure out my mistake so thank you for that.

So in actuality, after shadow map pass, we have to see if there are three potential framebuffers we need to rebind. I assumed it was just the swapchain and scene framebuffer but we actually need to make sure to check if the g-buffer needs to be bound.

So sorry for the wild goose chase: SamuelCho@effb90a

@notimaginative

Copy link
Copy Markdown
Contributor

YES!!!! 👍

That got it working for me! Tested on Mac and Linux with the same results. Great work!

The MSAA glitch is still present, but it's possible that it's unrelated and may well have been there for a while. I'll try to bisect that at some point this next week and see what I can find.

@notimaginative

Copy link
Copy Markdown
Contributor

Posting this here for the record and for anyone that might be following along:

I created some new test builds based on current master (26.0.0-RC3) with all of the Vulkan changes. These are just for Win64 and macOS arm64, where the most testing can be done and the performance improvements can be seen. There is still the odd texture corruption bug in some missions but so far it's been fully playable for me.

Windows x64: https://pxo.nottheeye.com/files/test/fs2open/vulkan-test-Win64.zip
macOS (Apple Silicon only): https://pxo.nottheeye.com/files/test/fs2open/vulkan-test-Mac_arm64.tar.xz

My test branch is at https://github.com/notimaginative/fs2open.github.com/tree/vulkan-pr-FIXES in case you want to build for your own testing or another platform. You'll need to have the vulkan sdk installed (or relevant packages from your package manager), or use the scp-prebuilt Vulkan PR with FSO_PREBUILT_OVERRIDE, in order to build.

@The-E

The-E commented Jun 29, 2026

Copy link
Copy Markdown
Member

Guys, can we create a new PR that is based on taylor's branch? That would make getting it ready for merge easier.

@SamuelCho

Copy link
Copy Markdown
Contributor

Maybe taylor should make the new PR since he's already has a branch rebased with his and my changes?

@notimaginative

Copy link
Copy Markdown
Contributor

Yeah I was planning to do that since laanwj appears to be MIA. I was hoping to get the vulkan prebuilt PR merged first though so that I won't have to rebase again to get the PR checks working.

@The-E

The-E commented Jun 29, 2026

Copy link
Copy Markdown
Member

That's entirely reasonable

@notimaginative

Copy link
Copy Markdown
Contributor

This has be superseded by #7553.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion This issue has (or wants) a discussion feature A totally new sort of functionality graphics A feature or issue related to graphics (2d and 3d) Waiting for Stable Marks a pull request that is to be merged after the next stable release, due to a release cycle

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants