Skip to content
View bilzard's full-sized avatar

Block or report bilzard

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
9 stars written in Cuda
Clear filter

LLM training in simple, raw C/CUDA

Cuda 29,839 3,579 Updated Jun 26, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,214 971 Updated May 8, 2026

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,169 351 Updated May 8, 2026

GPU Accelerated t-SNE for CUDA with Python bindings

Cuda 1,926 137 Updated Oct 2, 2024

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 1,002 231 Updated May 8, 2026

A shift-window based transformer for 3D sparse tasks

Cuda 293 23 Updated Jun 25, 2023

Implementation of fused cosine similarity attention in the same style as Flash Attention

Cuda 220 12 Updated Feb 13, 2023

this is a high performance cuda porting of cbow model of word2vec

Cuda 43 41 Updated Sep 14, 2014

cuSTSG is a GPU-enabled spatial-temporal Savitzky-Golay (STSG) program based on the Compute Unified Device Architecture (CUDA). Firstly, the cosine similarities between the annual NDVI time series …

Cuda 16 5 Updated Jul 20, 2022