A streamlined implementation of YOLOv2 (You Only Look Once v2) object detection using MLX, optimized for Apple Silicon.
- YOLOv2 architecture with anchor boxes
- Training on PASCAL VOC dataset
- Memory-efficient training
- Real-time object detection capabilities
- Optimized for Apple Silicon using MLX
- Input resolution: 448x448 pixels
- Backbone: darknet-19
- Output: 7x7 grid with 5 anchor boxes per cell
- Anchor box predictions: (x, y, w, h, confidence)
- Classes: 20 PASCAL VOC classes
- Loss: Multi-part loss function (classification, localization, confidence)
model.py: YOLO model architectureloss.py: Loss function implementationsimple_train.py: Training scriptsimple_inference.py: Inference scriptcompute_anchors.py: Utility for computing anchor boxesdata/: Dataset handling utilities
- Development Mode:
python simple_train.py --mode dev- Uses small subset of data (10 images)
- Smaller batch size
- Faster iteration for testing changes
- Full Training:
python simple_train.py --mode full- Uses complete dataset
- Larger batch size
- Regular validation and checkpointing
The model is trained on PASCAL VOC 2012 with 20 object classes:
- Vehicles: aeroplane, bicycle, boat, bus, car, motorbike, train
- Animals: bird, cat, cow, dog, horse, sheep
- Indoor Objects: bottle, chair, diningtable, pottedplant, sofa, tvmonitor
- People: person