Skip to content
Shuai YUAN edited this page Feb 4, 2026 · 8 revisions

Tool

Backpropagation


Yes you should understand backprop

Vanishing gradients on sigmoids

if you’re using sigmoids or tanh non-linearities in your network and you understand backpropagation you should always be nervous about making sure that the initialization doesn’t cause them to be fully saturated.

z = 1/(1 + np.exp(-np.dot(W, x))) # forward pass
dx = np.dot(W.T, z*(1-z)) # backward pass: local gradient for x
dW = np.outer(z*(1-z), x) # backward pass: local gradient for W
WXWorkCapture_17625031285135

Dying ReLUs

z = np.maximum(0, np.dot(W, x)) # forward pass
dW = np.outer(z > 0, x) # backward pass: local gradient for W
WXWorkCapture_1762503237564

Exploding gradients in RNNs / LSTMs

image

Performance Metrics (e.g. FLOPS)

Accuracy Metrics

smooth L1 loss

NMS

Operators (OP)

Normalization

LayerNorm

welford algorithm

DeepNorm

RMSNorm

RNN

LSTM

ROI Pooling / ROI Align

RROI Pooling

RROI Align

Resize

Softmax

Online Softmax

Softmax Optimization

approximation by grouping

ScatterND

Einsum

Activation

OP Fusion


Networks

Feedforward neural network

mtcnn

FDDB

Object Detection

Faster-RCNN

Yolo

Yolo v3

YOLOv12

YOLOv13

Image Segmentation

Mask-RCNN

PointRend

DCN

Darknet

language

DFCNN-CTC


Point cloud segmentation

dgcnn (Dynamic Graph CNN for Learning on Point Clouds)

Environment Setup

Frameworks

Mini-Caffe

ncnn

Mobile

SeetaFace

Table of Contents


Clone this wiki locally