A library of Machine Learning kernels for the Cerebras' Wafer Scale Engine (WSE)
In modern days, most computers are built by following the von Neumann architecture, with the reason being the efficiency of said general purpose model. However, its greatest bottleneck is data transfer: indeed, at high bandwidths, this architectural paradigm struggles to keep an elevated throughput. Domain-specific architectures have been arising and re-emerging, especially with the increasing interest in AI and HPC applications: one such model is the dataflow architecture. Cerebras, a US-based company, designed a so-(affectionately)-called "dinner-plate" sized chip based on said paradigm, called the Wafer Scale Engine (WSE), which aims to ensure parallelism by emphasizing the data transfer within the chip. This project introduces a library of mathematical kernels, specifically adapted for the WSE.
This project has been discussed in a BSc thesis, available here on the writer's website.
The project consists of the Kernels folder, which has the following structure:
Kernels
┣ CereBlas_MM
┣ CereBlas_MV
┣ Collectives2D
┣ Convolution
┣ MultiConv
┣ Pooling
┗ Utilities
The CereBlas_MM, CereBlas_MV, Convolution and Pooling folders contain the 4 main kernels; the Collectives2D library is an edited version of the original provided by Cerebras with some custom adaptations; MultiConv contains an experiment to connect together the kernels; Utilities contains a set of miscellaneous code used for facilitating the use of the library.
Inside each kernel folder there is a README file which better explains how each kernel works, alongside some images.