Skip to content
View troore's full-sized avatar
🎩
🎩

Block or report troore

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 626 83 Updated Sep 11, 2024

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,235 128 Updated Mar 16, 2026

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,567 313 Updated Mar 13, 2026
Fortran 10 1 Updated Sep 14, 2023

GNNear: Accelerating Full-Batch Training of Graph NeuralNetworks with Near-Memory Processing

C++ 15 1 Updated Sep 15, 2022

The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering

63 6 Updated Aug 11, 2024

LLM Inference analyzer for different hardware platforms

Jupyter Notebook 108 22 Updated Feb 17, 2026

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 118 5 Updated Mar 13, 2024

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .

C++ 156 123 Updated Mar 13, 2026

LLM inference in C/C++

C++ 98,105 15,528 Updated Mar 16, 2026

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 533 187 Updated Mar 12, 2026

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 480 56 Updated Apr 19, 2025

This is the FreePDK45 V1.4 Process Development Kit for the 45 nm technology

HTML 32 2 Updated Feb 22, 2021

Serving multiple LoRA finetuned LLM as one

Python 1,145 61 Updated May 8, 2024

A benchmark suite for xillybus

VHDL 6 1 Updated Feb 21, 2016

An integrated power, area, and timing modeling framework for multicore and manycore architectures

C++ 215 81 Updated Aug 8, 2020

Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.

Python 1,633 110 Updated Feb 17, 2026

Hardware utilities with Spinal HDL

Scala 1 Updated Feb 22, 2022

Provide Python access to the NVML library for GPU diagnostics

Python 260 34 Updated Sep 5, 2025

Cavs: An Efficient Runtime System for Dynamic Neural Networks

C++ 15 3 Updated Sep 18, 2020

Yinghan's Code Sample

Cuda 363 62 Updated Jul 25, 2022

RISC-V Instruction Set Manual

TeX 4,532 801 Updated Mar 14, 2026

Deep learning toolkit-enabled VLSI placement

C++ 958 257 Updated Feb 19, 2026

Bridging polyhedral analysis tools to the MLIR framework

C++ 119 24 Updated Sep 9, 2023

Polyhedral High-Level Synthesis in MLIR

C++ 35 9 Updated Mar 17, 2023

Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.

Python 1,860 269 Updated Feb 11, 2024

Research and development for optimizing transformers

Python 131 16 Updated Feb 16, 2021

Reproduce Fast ConvNets @CVPR 2020

Python 1 Updated Sep 10, 2021
Next