We’ll classify one label per scene in playground scenes (Transit, Social_People, Play_Object_Normal) using 2D skeletons and a person–object panoramic graph (MP-GCN).
MP-GCN models interactions: intra-person (body topology), person↔object (hands↔swing/hill), and inter-person (pelvis↔pelvis). It’s lightweight, privacy-friendly, and captures risk/furniture use better than a per-person attention model.
- Reference repo: MP-GCN
- Paper: “Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph”
- Presentation Slides: MP-GCN Playground
- Understanding MP-GCN: Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
- Understanding MP-GCN original Codebase: Understanding MP-GCN Codebase
- MPGCN Data construction: MPGCN Data construction
- .npy Pose-Only Data: NPY Files
- Folds: FOLDS
- Late-Fusion Generated Data: Late-fusion folds & eval
- Full late-fusion workdir w/folds: Full fold data
This project proposes a lightweight framework for Group Activity Recognition (GAR) in playground environments using the Multi-Person Graph Convolutional Network (MP-GCN) architecture. Unlike traditional RGB-based methods that struggle with occlusions, visual noise, and privacy concerns, MP-GCN relies solely on 2D skeletal keypoints and object centroids to model interactions through a panoramic spatial–temporal graph. This representation captures intra-person body structure, inter-person coordination, and person–object relationships, such as children interacting with swings or slides, allowing the system to classify six scene categories (Transit, Social_People, Play_Object_Normal, Play_Object_Risk, Adult_Assisting, and Negative_Contact) efficiently and ethically. The implementation includes pose estimation and tracking (YOLOv11-pose), graph construction, and training of an adapted MP-GCN pipeline on short multi-person video segments. By emphasizing geometry and motion over appearance, the project aims to deliver a reproducible, real-time activity recognition model that enhances safety and behavioral understanding in public playgrounds while maintaining computational efficiency and privacy protection.
hello_mpgcn/: vendor copy of MP-GCN kept intact with itsconfig/,src/, andmain.pyentry for training/evaluation; use this when you need the canonical defaults or to compare against upstream.src/: playground-specific pipeline that wraps the vendor code. Highlights:reader/builds splits and tensors;dataset/defines graphs, augmentations, and the feeder;model/MPGCN/adds area embeddings;processor.pyhandles training/eval loops;initializer.pywires configs, samplers, and loss;generator.pyis the CLI hook for data generation.config/: playground configs and metadata.objects.yamlholds camera/object centroids;playground_gendata.yamldrives tensor generation;config/playground/*.yamlare stream-specific train/eval configs (J, JM, B, BM, joint mixes).data/: raw CSVs, ROI geopackage, processed labels, and generated tensors indata/npy/; intermediate YOLO outputs stay underdata/temp/and are ignored by Git.script/: utilities (tensor_builder.py,cal_mean_std.py,ensemble_playground.py) plusscript/notebooks/organized by pipeline stage (scene-/skeleton-/tensor-prefixes per the notebooks README).workdir/: outputs from data generation and training (per-fold tensors, checkpoints, confusion matrices, notes). CI-like artifacts are kept here rather than underdata/.- Root files:
main.pyis the slim CLI entry that calls intosrc/;automated_fusion_pipeline_v2.ipynbandlate_fusion_model.ipynbexplore fusion/ensembling outside the main pipeline;WORKPLAN.mdtracks experiment to-dos.
- Source clips and detections live under
data/(videos.csv/videos-trimmed.csv, YOLO JSONs intemp/, tensors innpy/). Notebookscript/notebooks/tensor-builder.ipynbconverts merged detections intopose_data.npyand_object_data.npy. Playground_Reader(src/reader/playground_reader.py) normalizes tensors to[T, O|M, …], learnsn_obj_max, filters unused labels, and writes fixed-shape pose/object/area arrays plus label pickles toworkdir/fold_{ID}.- Split strategy: 5×5 repeated stratified K-fold, seeded for reproducibility, so every fold keeps class balance and per-camera variety. Fold IDs are stable across reruns;
{ID}placeholders in configs resolve to the rightfold_{ID}folder. - People are ranked by motion energy via
select_top_m_people.pyto keep the top 4 movers (zero-padding the rest), aligning with the MP-GCNMlimit and avoiding shape drift across splits. - Regenerate tensors whenever labels, centroids, or sampling params change:
python hello_mpgcn/main.py -c config/playground_gendata.yaml -gd.
- Pose tensors come in as
pose_data.npyshaped[T, M, V, C]withT=48frames, up toM=4people,V=17joints,C=3coords. - Object tensors land in
*_object_data.npyshaped[T, O, C]whereOis per-foldn_obj_max; padding/truncation keeps time and objects static across clips. - At train time, the feeder concatenates object nodes onto the joint dimension so the model sees
[C, T, V+O, M](and builds B/JM/BM streams from that). - Graph adjacency uses COCO joints plus hands↔object links and optional pelvis↔pelvis inter-person links; area IDs thread through as embeddings to undo camera/site bias.
src/reader/playground_reader.py: end-to-end tensor builder for playground data. It tolerates both legacy_object.npyand new_object_data.npynames, normalizes arbitrary object array shapes to[T, O, C], scans the dataset to learnn_obj_max, writes fixed-shape pose/object/area tensors per fold, and derivesclass2idx. The repeated stratified splits keep all classes present in every fold.src/dataset/graphs.py: newplaygroundgraph type extends COCO withn_obj_maxobject nodes. Each object connects to both hands(9, obj_i)and(10, obj_i)so the graph can reason about hand–object contact; multi-person variants add pelvis-pair links (configurable base joints) for inter-person context.src/dataset/playground_feeder.py: dataloader that merges pose and object tensors. It tiles object centroids across persons to fit the ST-GCN layout, supports per-stream selection (J,JM,B,BM, orJVBM), and enforces windowed shapes after augmentation. Area IDs are either read from*_area.npyor inferred from clip names (columpiosvs.unknown) so downstream can compensate for camera bias.src/dataset/augment.py: augmentation keeps pose/object aligned—temporal jitter/crop/drop/speed-scale plus joint/object jitter, translation, and scaling—to fight overfitting while preserving hand–object geometry.src/model/MPGCN/nets.py: injects area embeddings. After the branch stack, an embedding vector per area is concatenated as extra channels before the main stream, letting the model undo static camera/site bias without hard-coding it into labels.src/dataset/utils.py:multi_inputunderstands aliases (JOINT,JM,B,BM,JVBM) and builds stacked multi-stream tensors from the augmented joint+object graph.graph_processinghandles person flattening while respecting the larger vertex count from the playground graph.src/initializer.py: resolves{ID}placeholders sofold_{ID}paths point at the correct split, infersnum_objectfrom saved tensors when not provided, builds class-weighted losses plus a WeightedRandomSampler to counter class imbalance, and optionally enables hard-example mining insrc/processor.py.
- Tensor generation:
python hello_mpgcn/main.py -c config/playground_gendata.yaml -gdwrites per-fold pose/object/area/label arrays underworkdir/. - Train or fine-tune:
python hello_mpgcn/main.py --config config/playground/mpgcn.yaml --gpus 0; setFOLD_IDordataset_args.fold_idto pick a split. Variantsmpgcn_J.yaml,mpgcn_B.yaml, etc. select individual streams. - Evaluate a checkpoint:
python hello_mpgcn/main.py --config config/playground/mpgcn.yaml --evaluate. - Notebook quick checks:
script/notebooks/tensor-builder.ipynbfor tensor sanity;hello_mpgcn/hello_mpgcn.ipynbto instantiatePlayground_Readerand print[C, T, V', M]stats.