Conversation
kpet
left a comment
There was a problem hiding this comment.
Thanks for this contribution! I have had a first look at this change. I have also made a few comments on the proposed extension specification, which I don't think we can really consider it settled/stable yet. Happy to keep iterating on both.
src/api.cpp
Outdated
|
|
||
|
|
||
| #endif // cl_ext_buffer_device_address | ||
|
|
There was a problem hiding this comment.
I have discussed releasing this extension with the Khronos OpenCL working group. The authors seem motivated to make it happen. In the meantime, I suggest you move all definitions that will be provided by the headers to src/cl_headers.hpp.
src/api.cpp
Outdated
| EXTENSION_ENTRYPOINT(clCreateCommandQueueWithPropertiesKHR), | ||
| EXTENSION_ENTRYPOINT(clGetKernelSuggestedLocalWorkSizeKHR), | ||
| {"clGetKernelSubGroupInfoKHR", FUNC_PTR(clGetKernelSubGroupInfo)}, | ||
| {"clSetKernelArgDevicePointerEXT", FUNC_PTR(clSetKernelArgDevicePointerEXT_fn)}, |
There was a problem hiding this comment.
| {"clSetKernelArgDevicePointerEXT", FUNC_PTR(clSetKernelArgDevicePointerEXT_fn)}, | |
| {"clSetKernelArgDevicePointerEXT", FUNC_PTR(clSetKernelArgDevicePointerEXT)}, |
For consistency with other extensions. You could then just use EXTENSION_ENTRYPOINT I think.
src/api.cpp
Outdated
| ret = CL_INVALID_OPERATION; | ||
| break; | ||
| } | ||
| val_sizet = buffer->device_address(); |
There was a problem hiding this comment.
The spec defines a new type that aliases cl_ulong which, contrary to size_t is guaranteed to have a size of 64 bits.
src/kernel.hpp
Outdated
| #endif | ||
|
|
||
| typedef cl_ulong cl_mem_device_address_EXT; | ||
|
|
There was a problem hiding this comment.
Please move these definitions to cl_headers.hpp for now.
src/device.hpp
Outdated
| object_magic_header<object_magic::device> { | ||
| /// Map for storing device pointers to buffer pointers | ||
| /// Support cl_ext_buffer_device_address | ||
| std::unordered_map<void*, void*> device_to_buffer_map; |
There was a problem hiding this comment.
Those pointers would only be valid for a given context. I think we probably want to keep this state in cvk_context. Please use the specific types instead of void*.
test-dev-buffer/main.cpp
Outdated
|
|
||
| std::cout << "All tests completed successfully\n"; | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
I'm guessing the intention is to move all the test code to tests/api/buffer_device_address.cpp, isn't it?
src/kernel.hpp
Outdated
| // device pointer found in map, swapping the buffer pointer | ||
| auto buffer_ptr_raw = it->second; | ||
| auto buffer_ptr = reinterpret_cast<cvk_buffer*>(buffer_ptr_raw); | ||
| m_kernel_resources[arg.binding] = buffer_ptr; |
There was a problem hiding this comment.
This will break the existing argument setting for buffers. It might be easier/cleaner to introduce a new cvk_kernel::set_arg_device_address function (that would have to also mark the argument as set as done by line 361).
|
@kpet thank you for the review, I'll work on addressing your comments. Meanwhile, this is what I get when I try to execute an example: I'm not an LLVM expert but just wanted to check with you first before I investigate further. |
|
Can you share the kernel source and the command line? Everything will be in clvk's log, that you would get running with: then upload |
|
The kernel source is a HIP matrix multiplication @rjodinchr |
|
The error message you have posted is about |
|
Oh, I understand what you meant by:
This application is using OpenCL SPIR-V source. This was not clear to me as SPIR-V has 2 variant. The OpenCL one and the Vulkan one. While the goal of clvk is to compile whatever CL source it gets to Vulkan SPIR-V, it can take OpenCL SPIR-V as the input. But the log does not contain the OpenCL SPIR-V inputs. Could you share the sources then (I see two calls to |
|
Hmm only one got dumped for me: |
|
I'll try to reproduce with that one then. Thank you |
|
I'm not able to reproduce. That should produce a |
|
|
So the issue is that We can force This is because we have the following line in the OpenCL SPIR-V kernel: Which gets translated into the following LLVM IR (by Note that it is on Removing every trace of |
|
|
|
Disabled it, and I was able to run a HIP example on Vulkan! |
|
|
@pvelesko The extension specification has now been finalised and integrated into the main OpenCL spec and we also have CTS tests. Are you planning to pick up this PR at some point in the future? It's fine if you're not but it would be helpful to know so someone else could pick it up (though I'm not aware of anyone dying to do it at this time :)). |
|
I will |
- Switch from provisional to official cl_ext_buffer_device_address extension - Add CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (0x4200) property for clCreateBufferWithProperties - Add CL_MEM_DEVICE_ADDRESS_EXT (0x4201) query for clGetMemObjectInfo - Add clSetKernelArgDevicePointerEXT function to set kernel args with device pointers - Add cl_mem_device_address_ext typedef (cl_ulong) - Move extension definitions to cl_headers.hpp - Add cvk_kernel::set_arg_device_address() method - Store device_to_buffer_map in cvk_context - Add BufferDeviceAddress test (basic increment) - Add BufferDeviceAddressMatrixMultiply test (64x64 matmul with verification) - Remove obsolete test-dev-buffer sample Addresses PR kpet#742 comments: - Use official Khronos extension enum values - Proper extension versioning (0.3.0) - Conditional extension advertisement based on device capability - Forward declare clSetKernelArgDevicePointerEXT for extension entrypoint table - Tests integrated into api_tests with REQUIRE_EXTENSION macro
|
@kpet Please test |
|
@kpet ping |
src/api.cpp
Outdated
|
|
||
| // Forward declaration for extension entrypoint | ||
| cl_int CLVK_API_CALL clSetKernelArgDevicePointerEXT(cl_kernel kernel, cl_uint arg_index, cl_mem_device_address_ext dev_addr); | ||
|
|
There was a problem hiding this comment.
Why is this necessary? We don't need do this for any other extension function, do we?
| if (!is_valid_kernel(kern)) { | ||
| return CL_INVALID_KERNEL; | ||
| } | ||
|
|
There was a problem hiding this comment.
It seems you don't handle the CL_INVALID_OPERATION error condition. From the spec:
CL_INVALID_OPERATION if no devices in the context associated with kernel support the cl_ext_buffer_device_address extension.
src/api.cpp
Outdated
| // Validate argument is a pointer type | ||
| // device pointer kernel argument of type __global is buffer type here so failure | ||
| // if (!arg.is_pod_pointer()) { | ||
| // return CL_INVALID_ARG_VALUE; | ||
| // } | ||
|
|
There was a problem hiding this comment.
| // Validate argument is a pointer type | |
| // device pointer kernel argument of type __global is buffer type here so failure | |
| // if (!arg.is_pod_pointer()) { | |
| // return CL_INVALID_ARG_VALUE; | |
| // } |
Remove?
src/cl_headers.hpp
Outdated
|
|
||
| // cl_ext_buffer_device_address extension definitions | ||
| // These will be provided by OpenCL-Headers once the extension is released | ||
| #ifndef cl_ext_buffer_device_address | ||
| #define cl_ext_buffer_device_address 1 | ||
|
|
||
| typedef cl_ulong cl_mem_device_address_ext; | ||
|
|
||
| #define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT 0x4200 | ||
| #define CL_MEM_DEVICE_ADDRESS_EXT 0x4201 | ||
| #define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8 | ||
|
|
||
| typedef cl_int (CL_API_CALL *clSetKernelArgDevicePointerEXT_fn)( | ||
| cl_kernel kernel, | ||
| cl_uint arg_index, | ||
| cl_mem_device_address_ext arg_value); | ||
|
|
||
| #endif // cl_ext_buffer_device_address |
There was a problem hiding this comment.
| // cl_ext_buffer_device_address extension definitions | |
| // These will be provided by OpenCL-Headers once the extension is released | |
| #ifndef cl_ext_buffer_device_address | |
| #define cl_ext_buffer_device_address 1 | |
| typedef cl_ulong cl_mem_device_address_ext; | |
| #define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT 0x4200 | |
| #define CL_MEM_DEVICE_ADDRESS_EXT 0x4201 | |
| #define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8 | |
| typedef cl_int (CL_API_CALL *clSetKernelArgDevicePointerEXT_fn)( | |
| cl_kernel kernel, | |
| cl_uint arg_index, | |
| cl_mem_device_address_ext arg_value); | |
| #endif // cl_ext_buffer_device_address |
These definitions should now be present in the OpenCL headers.
src/context.hpp
Outdated
| #include "device.hpp" | ||
| #include "objects.hpp" | ||
| #include "unit.hpp" | ||
| #include "cl_headers.hpp" |
src/context.hpp
Outdated
|
|
||
| // Map for storing device pointers to buffer pointers | ||
| // Support cl_ext_buffer_device_address | ||
| std::unordered_map<cl_mem_device_address_ext, cvk_buffer*> device_to_buffer_map; |
There was a problem hiding this comment.
| std::unordered_map<cl_mem_device_address_ext, cvk_buffer*> device_to_buffer_map; | |
| std::unordered_map<cl_mem_device_address_ext, cvk_buffer*> m_device_to_buffer_map; |
src/device.cpp
Outdated
|
|
||
| if (supports_buffer_device_address()) { | ||
| m_extensions.push_back( | ||
| MAKE_NAME_VERSION(0, 3, 0, "cl_ext_buffer_device_address")); |
There was a problem hiding this comment.
| MAKE_NAME_VERSION(0, 3, 0, "cl_ext_buffer_device_address")); | |
| MAKE_NAME_VERSION(1, 0, 2, "cl_ext_buffer_device_address")); |
Looks like the latest spec version is 1.0.2.
src/kernel.hpp
Outdated
|
|
||
| #include "spirv/unified1/NonSemanticClspvReflection.h" | ||
|
|
||
| #include "cl_headers.hpp" |
src/program.cpp
Outdated
| // auto size = parse_data->constants[inst->words[6]]; | ||
| // parse_data->binary->add_workgroup_variable_size(size); | ||
| // break; | ||
| // } |
Add buffer device address support including: - clSetKernelArgDevicePointerEXT API entry point - clSetKernelExecInfo for CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT - CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT property/flag handling - CL_MEM_DEVICE_ADDRESS_EXT query in clGetMemObjectInfo - Device-to-buffer map for translating device addresses to cl_mem - VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT for device address buffers - Buffer device address fallback via VkPhysicalDeviceVulkan12Features - pod_pointer arg handling for physical addressing mode
Move ENABLE_SPIRV_IL and COMPILER_AVAILABLE defines so they can be enabled independently. Add CLVK_CLSPV_BIN and CLVK_LLVMSPIRV_BIN cache variables for external tool paths when building without the built-in compiler. Link SPIRV-Tools-opt and SPIRV-Tools-link when SPIRV IL is enabled.
Query VkPhysicalDeviceShaderAtomicInt64FeaturesKHR and report spv::CapabilityInt64Atomics as supported when shaderBufferInt64Atomics is available.
Add the IL build path: llvm-spirv converts SPIR-V IL to LLVM IR, then clspv compiles it to Vulkan SPIR-V. Support the link path with a link_input_files vector. Pass -lower-generic-address-space to clspv for IR input. Strip -physical-storage-buffers during individual compile steps to avoid crashes when linking transformed BC files. Respect keep_temporaries config throughout.
Parse NonSemanticClspvReflectionProgramScopeVariablesStorageBuffer in SPIR-V reflection to create backing buffers for externally_initialized globals promoted to StorageBuffer SSBOs. Build descriptor set layouts and bind the buffers at kernel dispatch time.
When clspv fails with -physical-storage-buffers, retry without it since programs that don't use device pointers work fine without the flag. This handles clspv crashes in PhysicalPointerArgsPass on certain kernel patterns like multi-module link with atomics.
Includes: SimplifyPointerBitcastPass crash fix, Int64Atomics capability, StripFloat64/LowerGenericAddressSpace passes, externally-initialized globals as StorageBuffer SSBOs, intel_reqd_sub_group_size propagation, PU3AS4→PU3AS1 builtin renaming, and --lower-generic-address-space flag.
CMake 4.x may not propagate INTERFACE_COMPILE_DEFINITIONS to OBJECT library targets. Duplicate the definitions from clvk-config-definitions directly onto OpenCL-objects as a workaround.
Detect and enable VK_KHR_portability_enumeration at instance creation so that portability drivers (MoltenVK on macOS) are discoverable. Sets VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR flag.
Fix BitCastInst and stale GEP type handling in LowerGenericAddressSpacePass to prevent SIGSEGV on chipStar IR with inttoptr-to-AS4 patterns.
When chipStar passes pre-compiled Vulkan SPIR-V (GLCompute execution model) via clCreateProgramWithIL, detect it with is_vulkan_spirv() and load it directly into m_binary, skipping the llvm-spirv + clspv compilation pipeline. Also bypass parse_user_spec_constants() for Vulkan SPIR-V input since spec constants are already embedded by the upstream compiler. is_vulkan_spirv() detects the execution model by scanning OpCapability instructions: cap=1 (Shader) means Vulkan, cap=6 (Kernel) means OpenCL.
still WIP, sample works but test fails though more tests fail even on main