This repository presents a computer vision project focused on exercise repetition counting from video. The implemented work centers on pose-based and RGB-based experimentation over the LLSP exercise dataset, with particular emphasis on understanding when pose is effective, when RGB is more informative, and how those findings can be converted into a practical counting workflow.
The codebase includes data preparation, pose extraction, feature construction, model training, evaluation, reviewed hard-case analysis, and a scoped squat-only runtime prototype. Countix onboarding utilities are also included, but that branch is intentionally deferred and is not required for the main project conclusions.
In this README, LLSP refers to the local exercise-video dataset folder used by the project.
All commands in this README assume the current working directory is the folder containing this file.
Clone the submission repository, enter the project subfolder, create a virtual environment, and install dependencies:
git clone https://github.khoury.northeastern.edu/khouryquanxing/CS5330_SP26_Group1.git
cd CS5330_SP26_Group1/CV_Image_pose_detection-main
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements-pose.txtRun the lightweight regression tests:
python tests/run_tests.py --list
python tests/run_tests.py allOpen the static project pages locally:
python3 -m http.server 8000Then visit:
http://localhost:8000/index.html
Runtime examples that read LLSP videos or saved model outputs require the local dataset/artifact folders documented below:
Data/LLSP/video/Data/LLSP/annotation_cleaned/artifacts/3_Modeling/training_outputs/
If those folders are not present after cloning, download the LLSP assets from the dataset link in this README or regenerate the artifacts using the staged notebooks and scripts in the run order below.
This project delivers:
- A complete research pipeline for exercise repetition counting from video
- Comparative evaluation across pose, RGB, multimodal, and routed counting branches
- Documented negative results, not only successful experiments
- A final exercise-dependent conclusion rather than a forced single-model answer
- A squat-only offline runtime and a live squat webcam prototype for practical demonstration
The strongest measured conclusion is that the best representation is exercise-dependent:
squat: pose is the strongest branchpush_up: RGB is the stronger branchpull_up: mixed and more sensitive to viewpoint, target-selection, and semantic ambiguity
For direct access to the main project surfaces:
- Project Home
- Architecture Results Dashboard
- EDA Dashboard
- Squat Runtime Prototype Page
- Live Squat Prototype Page
- Live Squat Runtime Script
- Hard-Case Review App
GitHub README files do not reliably play Google Drive videos inline, so the project uses direct links to the hosted video folders instead.
- LLSP Demo Videos on Google Drive
- Main LLSP Dataset Folder on Google Drive
- Live Squat Recordings on Google Drive
- Pose Overlay Demo on Google Drive
- Local Pose Overlay Demo Folder
- Local Pose Overlay Demo 1
- Local Pose Overlay Demo 2
These links are intended for video access and qualitative review. Reportable benchmark results remain the metrics and artifacts documented elsewhere in this repository.
The current repository includes:
- Cleaned LLSP-derived artifacts under
Data/LLSP/annotation_cleaned - Countix onboarding scaffolding under
Data/Countixandartifacts/2_Data_preparation - YOLO-based pose extraction to per-video
.npyarrays - Pose indexing and missing-only worklist generation
- Squat video quality auditing utilities
- Notebook-based modeling and evaluation workflows across multiple representation families
- A squat-only runtime counter that runs from video, pose arrays, or squat-feature arrays
- A live squat webcam prototype with overlay, counting display, and optional recording
The repository does not yet include:
- A production-ready webcam application
- A production-ready multi-exercise exercise-recognition and tracking layer
- A finalized multi-exercise packaged inference system
The smallest practical runtime surface in the repository is the squat-only counter.
Primary files:
Supported inputs:
--video-path: Run YOLO pose extraction, build squat features, then count repetitions--pose-path: Start from an existing[T, 51]pose.npy--feature-path: Start from an existing squat-feature.npy
Backend configuration:
- Default: Dedicated squat TCN from
artifacts/3_Modeling/training_outputs/squat_tcn_l1_channels96 - Fallback/reference: Tuned FSM thresholds from the earlier squat tuning stage
Example usage from the repository root:
source .venv/bin/activate
python3 artifacts/3_Modeling/run_squat_counter.py \
--video-path Data/LLSP/video/valid/train3946.mp4 \
--output-json artifacts/3_Modeling/training_outputs/train3946_squat_runtime.json \
--prettyForce the TCN explicitly:
python3 artifacts/3_Modeling/run_squat_counter.py \
--feature-path Data/LLSP/annotation_cleaned/squat_features/train3946_squat_features.npy \
--counter-backend tcn \
--prettyUse the FSM reference path explicitly:
python3 artifacts/3_Modeling/run_squat_counter.py \
--feature-path Data/LLSP/annotation_cleaned/squat_features/train3946_squat_features.npy \
--counter-backend fsm \
--prettyThis runtime is intentionally scoped to squat only. It is best described as an offline prototype rather than a production application.
The repository also includes a live squat webcam prototype intended for demonstration of the saved squat TCN in an interactive setting.
Primary files:
Implemented runtime flow:
- Frame-wise YOLO pose extraction
- Tracked target selection derived from the offline pose extractor
- Movement-gated live squat counting from the rolling squat feature stream
- Live TCN support estimate from
artifacts/3_Modeling/training_outputs/squat_tcn_l1_channels96 - Bounded rolling buffers to avoid unbounded session cost
Example run:
source .venv/bin/activate
python3 artifacts/3_Modeling/run_live_squat_counter.py \
--mirror \
--tcn-device cpuStart recording immediately on launch:
source .venv/bin/activate
python3 artifacts/3_Modeling/run_live_squat_counter.py \
--mirror \
--tcn-device cpu \
--auto-recordControls:
Ctrl+S: start recording the live overlay window to an.mp4e: stop recording and save the current videor: reset the current live session buffer and countq: quit the live window
Recorded videos are written under artifacts/3_Modeling/training_outputs/live_squat_recordings.
These files are qualitative demonstration captures of the live overlay and should not be described as benchmark results, reportable evaluation artifacts, or final experimental outputs.
This live path remains a squat-only research prototype. It demonstrates interactive inference, but it should not be presented as a production-ready webcam system.
The script-level tests are organized into grouped unittest suites under tests/README.md:
data_prepevaluationreviewruntimeall
Quick usage from the repo root:
source .venv/bin/activate
python tests/run_tests.py --list
python tests/run_tests.py runtime
python tests/run_tests.py allThe full discovery command still works:
source .venv/bin/activate
python -m unittest discover -s tests -p 'test_*.py'Current status:
- Implementation: Complete as an offline modular pipeline for data preparation, feature extraction, training, evaluation, and routed counting
- Repository access: Confirmed public by the project owner
- Dataset access: LLSP links are documented below; Countix remains deferred and optional
- Validation: Completed through staged Colab experiments across pose, RGB, multimodal, audit, and routed branches
- Script-level tests:
unittestcoverage exists for key helper modules, review tooling, and runtime paths
Important scope notes:
- This is a research-grade project repository, not a production deployment package
- Colab experiments provide empirical validation, but they do not replace formal unit tests
- The strongest conclusions are exercise-dependent rather than universal across all exercise types
Key strengths of the project:
- It extends beyond a single squat-only baseline into a comparative study across
squat,pull_up, andpush_up - It records negative and inconclusive findings, not only favorable results
- It shows that representation choice should depend on the exercise rather than assuming pose or RGB will win universally
- It converts those findings into a practical routed counting surface
- It preserves a reproducible artifact structure so experiments remain inspectable and comparable
Project contributors:
- Linda Perez Penaranda: data preparation, modeling experiments, evaluation, documentation, and submission packaging
- Kunyi Shi
- Peihan Wang
- Quanxing Lu
If a more detailed contribution breakdown is needed for submission, this section can be extended with member-specific responsibilities.
.
├── Data/
│ ├── Countix/
│ │ ├── annotation_cleaned/ # optional Countix benchmark artifacts
│ │ └── video/ # optional local Countix videos
│ └── LLSP/
│ ├── annotation/ # original labels
│ ├── annotation_cleaned/ # cleaned labels and generated pose artifacts
│ ├── original_data/ # source references / download links
│ └── video/ # train, valid, test videos
├── artifacts/
│ ├── 1_EDA/ # dataset analysis notebooks and plots
│ ├── 2_Data_preparation/ # preparation notebooks
│ └── 3_Modeling/ # pose extraction, feature extraction, modeling
├── resources/ # project notes and study materials
└── requirements-pose.txt # Python dependencies for runnable scripts
This folder contains the exploratory data analysis work used to understand the RepCount / LLSP data before building the pipeline.
Main contents:
1_EDA_34.ipynb: primary EDA notebook- Class distribution plots such as
class_imbalance_train_valid.png - Repetition and duration plots such as
count_distribution.pngandcycle_duration.png - Per-exercise inspection PDFs such as
squat_inspection.pdf,push_up_inspection.pdf, andpull_up_inspection.pdf
Purpose:
- Inspect the dataset visually
- Understand class imbalance
- Examine repetition count distributions
- Identify data quality issues before modeling
This folder contains the notebook used to clean labels and prepare the dataset contract used by later steps.
Main contents:
2_Data_Preparation_01.ipynb: label cleaning, split checks, and preparation workflowprepare_countix_manifest.py: normalize external Countix metadata into the repo contractCOUNTIX_INTEGRATION.md: Countix onboarding guide for reusing the pose pipeline
Purpose:
- Clean and standardize the annotations
- Verify train / validation splits
- Keep Countix as a separate benchmark branch rather than silently merging it into LLSP
- Keep Countix deferred unless a later external-validation question requires it
- Produce the cleaned label tables used downstream by the pose and counting stages
- Feed the later generated artifacts under
Data/LLSP/annotation_cleaned, such as:pose_feature_index.csvpose_sequence_index.csvsquat_feature_index.csv
This folder contains the executable modeling pipeline and the Colab notebooks used for squat baselines and the newer all-exercises widening path.
Main contents:
build_pose_feature_index.py: buildpose_feature_index.csvorpose_feature_index_squat.csvbuild_remaining_pose_worklist.py: buildpose_feature_index_remaining.csvfor videos that still need pose extractionpose_feature_extraction.py: run YOLO pose extraction and write raw pose.npyarraysanalyze_squat_video_quality.py: audit squat feature outputs and tag likely failure modesapply_validation_review_policy.py: apply the manual validation-review policy to a TCNpredictions.csvbootstrap_count_confidence_intervals.py: estimate bootstrap confidence intervals forMAE,RMSE, andWithin-1from a countingpredictions.csv3_Model_Training_01.ipynb: baseline temporal training notebook from extracted pose features4_All_Exercises_Pose_Extraction_Colab.ipynb: Colab stage for widening pose extraction to the remaining exercises5_All_Exercises_Pose_Sequence_Preparation_Colab.ipynb: Colab stage for generic normalized pose-sequence preparationbuild_pose_sequence_dataset.py: buildpose_sequence_index.csvandpose_sequence_summary.csvextract_rgb_frame_features.py: extract frozen RGB frame-feature sequences from raw videos for a controlled exercise subsettrain_pose_count_tcn.py: generic counting-only TCN trainer over normalized pose sequences, with optional exercise-specific keypoint weightingtrain_pose_count_density_tcn.py: density-based temporal counting TCN that predicts a non-negative repetition-density curve whose sum gives the final counttrain_pose_count_transformer.py: transformer-encoder counting trainer over normalized pose sequences that reuses the Stage 6 augmentation path and artifact contracttrain_rgb_count_tcn.py: counting-only TCN trainer over frozen RGB frame-feature sequencestrain_multimodal_count_tcn.py: simple late-fusion TCN trainer over paired pose sequences and RGB feature sequencesbuild_routed_count_predictions.py: assemble an exercise-dependent counting surface from the best current run per exerciseaudit_counting_hard_cases.py: audit pose-vs-RGB validation rows with pose quality, video metadata, and issue tagsbuild_hard_case_review_manifest.py: turn one or more7Dhard-case audit CSVs into a manual-review manifest with preserved annotationssummarize_reviewed_hard_cases.py: aggregate the reviewed hard-case manifest into confirmed issue counts by exercise and issue typeHARD_CASE_REVIEW_GUIDE.md: review taxonomy and tagging guide for fillinghard_case_review_manifest.csvhard_case_review_app.html+hard_case_review_app.js: browser-based reviewer for the7Dhard-case manifest, with video playback and multi-select issue tagshard_case_review_server.py: tiny local review server with CSV save/load endpoints for the browser review appcompare_count_run_to_baseline.py: compare a finished counting run against a trivial train-split count baselineregister_experiment.py: append or update a row inexperiment_registry.csv, optionally deriving the result string frommetrics_summary.jsonEXPERIMENT_SHOWCASE.md: compact narrative summary of the main experiments, decisions, and current routed directionexperiment_registry.csv: flat registry table of the main experiments and their decisionsARCHITECTURE_RESULTS_MATRIX.md: presentation-ready comparison of the architecture families and their measured outcomesarchitecture_results_long.csv: long-form architecture-by-exercise result table for sorting or charting6_All_Exercises_Counting_Baseline_Colab.ipynb: Colab stage for per-exercise counting baselines on generic pose sequences6B_Per_Exercise_SeqLen_Sweep_Colab.ipynb: Colab stage for exercise-by-exercise sequence-length sweeps starting from the frozen shared baseline6C_Per_Exercise_Keypoint_Weighting_Colab.ipynb: Colab stage for exercise-specific keypoint weighting after the6Btemporal sweep6D_Per_Exercise_Density_Counting_Colab.ipynb: Colab stage for explicit temporal density counting after the scalar TCN,6B, and6Cexperiments7_RGB_Counting_Baseline_Colab.ipynb: Colab stage for the first controlled RGB-vs-pose comparison onsquat,pull_up, andpush_up7C_Representation_Fit_Analysis_Colab.ipynb: Colab stage for checking whether RGB wins specifically where pose quality is weaker7B_Stronger_RGB_Backbone_Colab.ipynb: Colab stage for a stronger RGB backbone comparison after the initial Stage 7 RGB baseline7D_Hard_Case_Data_Audit_Colab.ipynb: Colab stage for tagging likely visibility, ambiguity, and representation-mismatch failures in the pose-vs-RGB subset7E_Multimodal_Pose_RGB_Fusion_Colab.ipynb: Colab stage for a simple late-fusion pose+RGB comparison against the best single-modality branches8_Exercise_Dependent_Counting_Colab.ipynb: Colab stage for building a practical routed counting surface from the best current branch per supported exercise9_Pose_Transformer_Colab.ipynb: Colab stage for trying a pose-sequence transformer with the same augmentation and comparison contract as the pose TCN runs9B_Pose_Transformer_Augmentation_Ablation_Colab.ipynb: Colab stage for checking whether the current pose-sequence augmentation settings help or hurt transformer validation results10_PullUp_Dedicated_Pose_Colab.ipynb: Colab stage for the first dedicated per-exercise pose-tuning follow-up beyond squat, focused onpull_up11_Reportable_Confidence_Intervals_Colab.ipynb: Colab stage for running bootstrap confidence intervals on the final reportable counting runsData/countix_full_colab.ipynb: optional Countix subset download notebook, currently deferred from the main experiment flow6_Squat_Rep_Counting_Colab.ipynb: Colab stage for FSM-based rep counting and evaluationYOLO_PIPELINE.md,YOLO_POSE_STAGE.md,COLAB_SQUAT_POSE.md: runbooks and stage documentationCOLAB_ALL_EXERCISES_POSE.md: runbook for widening pose extraction to the remaining exercisesCOLAB_RGB_COUNTING.md: runbook for the controlled Stage 7 RGB-vs-pose comparison
Purpose:
- Move from cleaned labels to pose features
- Convert raw pose into either generic normalized pose sequences or squat-specific engineered features
- Run counting baselines and evaluate the squat branch and widened sequence branch
- Support alternative training experiments from extracted features
Dataset access for the LLSP project data:
- Main LLSP dataset folder on Google Drive: https://drive.google.com/drive/folders/1NUiY4bCTy_zGmJ8AECBcAIpqee5g8F_g?usp=share_link
- LLSP video folder on Google Drive: https://drive.google.com/drive/folders/1ThJeuWPunxmXeUiUak_v11itO7f06UV-?usp=share_link
- Live squat recording demos on Google Drive: https://drive.google.com/drive/folders/1qfd36TFg2N9x2IvZzQyB-5_HhueOjXR6?usp=share_link
- Contents: Exercise videos and the related annotation files used by the pipeline
- Expected contents:
- Raw exercise videos
- Original annotation CSVs
- Generated pose and feature artifacts used by the current pipeline
- Notes: Countix is not required for the main project flow and remains deferred as a separate benchmark branch
Original LLSP split annotations checked into this workspace:
Data/LLSP/annotation/train.csv: 758 rowsData/LLSP/annotation/valid.csv: 131 rowsData/LLSP/annotation/test.csv: 152 rows
Generated local artifacts under Data/LLSP/annotation_cleaned include:
pose_feature_index.csvpose_sequence_index.csvrgb_feature_index_selected.csvrgb_feature_index_resnet50_selected.csvsquat_feature_index.csvsquat_feature_summary.csvsquat_rep_count_results.csvsquat_rep_count_results_tuned.csv
Current synced pose coverage in this workspace:
- Total indexed pose rows:
1041 - Total local pose feature files:
1003 - All supported exercise classes except
otherscurrently have local pose artifacts - Squat pose features currently exist for
135 / 135indexed squat videos - The only indexed rows without local pose files are the
38othersrows
| Stage | Main entry point | Primary output artifact |
|---|---|---|
| EDA | artifacts/1_EDA/1_EDA_34.ipynb |
dataset plots and inspection PDFs |
| Data preparation | artifacts/2_Data_preparation/2_Data_Preparation_01.ipynb |
cleaned label tables used to derive downstream annotation_cleaned artifacts |
| Pose indexing | artifacts/3_Modeling/build_pose_feature_index.py |
pose_feature_index.csv |
| Pose extraction | artifacts/3_Modeling/pose_feature_extraction.py |
raw pose .npy files, pose_extraction_report.csv, pose_extraction_summary.json |
| Pose sequences | artifacts/3_Modeling/build_pose_sequence_dataset.py |
pose_sequence_index.csv, pose_sequence_summary.csv |
| Shared pose baselines | artifacts/3_Modeling/6_All_Exercises_Counting_Baseline_Colab.ipynb |
per-run training_outputs/<run_name>/metrics_summary.json and predictions.csv |
| RGB branch | artifacts/3_Modeling/7_RGB_Counting_Baseline_Colab.ipynb and 7B_Stronger_RGB_Backbone_Colab.ipynb |
RGB feature directories and RGB training_outputs artifacts |
| Audits | artifacts/3_Modeling/7C_Representation_Fit_Analysis_Colab.ipynb and 7D_Hard_Case_Data_Audit_Colab.ipynb |
representation-fit summaries and hard-case audit CSV/JSON artifacts |
| Routed counting | artifacts/3_Modeling/8_Exercise_Dependent_Counting_Colab.ipynb |
routed_predictions.csv, routed_metrics_summary.json, routing_summary.csv |
| Experiment registry | artifacts/3_Modeling/register_experiment.py |
experiment_registry.csv |
Create and activate a virtual environment, then install the project dependencies:
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements-pose.txtCurrent Python dependencies for the runnable scripts:
numpyopencv-pythonpandaspillowtorchtorchvisionultralytics
Optional tools used by the audit workflow:
ffmpegffprobe
To run the lightweight script-level tests:
python3 -m unittest discover -s tests -p 'test_*.py'These tests complement, but do not replace, the staged Colab experiment validation used throughout the project.
Validation currently happens at two levels:
- Experiment-level validation through the staged Colab runs across pose, RGB, multimodal, audit, and routed branches
- Lightweight script-level testing through
unittestsuites for data preparation helpers, manifest normalization, experiment-registry and routing utilities, hard-case review tooling, and squat runtime/live-counter paths
This gives the project:
- Empirical validation on real dataset subsets
- Basic regression protection for core utility and artifact-building code paths
The pose extraction scripts use the YOLO pose checkpoint currently stored at:
artifacts/3_Modeling/yolo11n-pose.pt
This file is already present in the workspace.
The project has one main squat-focused path and one optional experimental branch.
Run these in order:
-
artifacts/1_EDA/1_EDA_34.ipynbUse this first if you want to understand the dataset and class distributions before building features. -
artifacts/2_Data_preparation/2_Data_Preparation_01.ipynbProduces the cleaned annotations used by the later stages. -
artifacts/3_Modeling/build_pose_feature_index.pyBuild the squat-only index from the cleaned annotations. -
artifacts/3_Modeling/4_All_Exercises_Pose_Extraction_Colab.ipynbReads videos and writes raw pose arrays inpose_features/for the remaining exercises. -
artifacts/3_Modeling/5_All_Exercises_Pose_Sequence_Preparation_Colab.ipynbReadspose_features/and writes generic normalized pose sequences inpose_sequences/, plus:pose_sequence_index.csvpose_sequence_summary.csv
-
artifacts/3_Modeling/6_All_Exercises_Counting_Baseline_Colab.ipynbReadspose_sequence_index.csvand trains counting-only TCN baselines on normalized pose sequences, reporting per-exerciseMAE,RMSE, andWithin-1. -
artifacts/3_Modeling/6_Squat_Rep_Counting_Colab.ipynbHistorical squat-specific FSM notebook retained for the original single-exercise branch. -
artifacts/3_Modeling/analyze_squat_video_quality.pyOptional audit step after feature extraction when you want to inspect difficult squat videos or diagnose pose/feature quality issues. -
artifacts/3_Modeling/apply_validation_review_policy.pyOptional post-evaluation step after TCN training when you want to apply the reviewed keep/flag/exclude policy to the latestpredictions.csvartifact and export filtered validation metrics.
When you are ready to move beyond the frozen squat-only branch:
-
artifacts/3_Modeling/build_pose_feature_index.pyBuild the full multi-exercise pose index. -
artifacts/3_Modeling/build_remaining_pose_worklist.pyCompare the full index against existing.npyartifacts and write the missing-only worklist. -
artifacts/3_Modeling/pose_feature_extraction.pyRun YOLO pose extraction onpose_feature_index_remaining.csvto cover the remaining exercises. -
artifacts/3_Modeling/5_All_Exercises_Pose_Sequence_Preparation_Colab.ipynbConvert the raw YOLO pose arrays into normalized generic sequences and writepose_sequence_index.csv. -
artifacts/3_Modeling/6_All_Exercises_Counting_Baseline_Colab.ipynbTrain counting-only per-exercise TCN baselines on the normalized pose sequences. -
artifacts/3_Modeling/6B_Per_Exercise_SeqLen_Sweep_Colab.ipynbSweepseq_lenvalues for the most promising exercises before changing representations or adding exercise-specific weighting. -
artifacts/3_Modeling/6C_Per_Exercise_Keypoint_Weighting_Colab.ipynbReuse the bestseq_lenper exercise from6Band test exercise-specific keypoint emphasis without rebuilding Stage 5. -
artifacts/3_Modeling/6D_Per_Exercise_Density_Counting_Colab.ipynbReuse the bestseq_lenper exercise from6B, but switch the counting formulation from direct scalar regression to temporal density prediction. -
artifacts/3_Modeling/7_RGB_Counting_Baseline_Colab.ipynbExtract frozen RGB features forsquat,pull_up, andpush_up, then train RGB TCN baselines and compare them directly against the best pose6Bruns. -
artifacts/3_Modeling/COLAB_ALL_EXERCISES_POSE.mdUse this runbook when executing the widening step in Colab.
If you do not want to use Colab for pose extraction, the local script path is:
build_pose_feature_index.pypose_feature_extraction.py- downstream Colab or notebook stages for squat features and rep counting
artifacts/3_Modeling/3_Model_Training_01.ipynb is a separate experimental branch for training a temporal regressor from extracted pose features. It is not the main squat FSM pipeline and should be treated as an alternative modeling path.
After producing a new predictions.csv from the TCN training stage, apply the reviewed validation policy:
python3 artifacts/3_Modeling/apply_validation_review_policy.py \
--predictions-csv artifacts/3_Modeling/training_outputs/<run_name>/predictions.csv \
--review-csv artifacts/3_Modeling/validation_failure_review.csvThis writes:
policy_filtered_metrics_summary.jsonpolicy_filtered_valid_predictions.csv
next to the supplied predictions.csv, so the project keeps both:
- The raw validation metrics
- The filtered view that excludes confirmed upstream failures and tags reviewed hard cases
After producing a final predictions.csv, estimate uncertainty on the reported metrics:
python3 artifacts/3_Modeling/bootstrap_count_confidence_intervals.py \
--predictions-csv artifacts/3_Modeling/training_outputs/<run_name>/predictions.csv \
--exercise squat \
--split valid \
--bootstrap-samples 5000 \
--seed 7This writes bootstrap_confidence_intervals.json beside the selected predictions.csv, including:
- Point estimates for
MAE,RMSE, andWithin-1 - Percentile bootstrap confidence intervals for the same metrics
- The row count and bootstrap configuration used
To turn the heuristic 7D audit into a confirmed review layer, first build a manual-review manifest:
python3 artifacts/3_Modeling/build_hard_case_review_manifest.py \
--audit-csv artifacts/3_Modeling/training_outputs/rgb_count_tcn_squat_seq256/hard_case_audit.csv \
--audit-csv artifacts/3_Modeling/training_outputs/rgb_count_tcn_pull_up_seq192/hard_case_audit.csv \
--audit-csv artifacts/3_Modeling/training_outputs/rgb_count_tcn_push_up_seq128/hard_case_audit.csv \
--output-csv artifacts/3_Modeling/training_outputs/hard_case_review_manifest.csvThen fill the manual columns in hard_case_review_manifest.csv, especially:
manual_review_statusmanual_primary_issuemanual_issue_tagsmanual_target_person_okmanual_count_label_okmanual_rep_definition_ambiguousmanual_visibility_issue_confirmedmanual_pose_failure_confirmedmanual_rgb_context_advantage_confirmedmanual_keep_for_reportmanual_notes
For a consistent issue taxonomy, use:
If you want a browser UI for the selected hard cases instead of editing the CSV directly:
- Start the review server from the repo root:
cd CV_Image_pose_detection
python3 artifacts/3_Modeling/hard_case_review_server.py --port 8000- Open:
http://localhost:8000/artifacts/3_Modeling/hard_case_review_app.html
- Load the manifest from:
artifacts/3_Modeling/training_outputs/hard_case_review_manifest.csv
If you prefer a static read-only setup, python3 -m http.server 8000 still works,
but backend save/load requires hard_case_review_server.py.
The app lets you:
- Click through the selected hard-case rows
- Watch the corresponding video
- Overlay the saved pose keypoints and skeleton lines on top of the video
- Show a playback HUD with the clip-level annotations and audit fields
- Inspect the original
L*repetition intervals fromData/LLSP/annotation/{train,valid,test}.csv - Choose one
manual_primary_issue - Assign multiple secondary
manual_issue_tags - Edit the remaining
manual_*review fields - Save the current review through the local review server
- Export an updated CSV for
summarize_reviewed_hard_cases.py
The pose overlay uses the raw pose_features/*.npy files from Stage 4. Because
those arrays contain only frames with successful pose extraction, the overlay is
time-aligned approximately by playback progress rather than exact frame index.
The annotation HUD is clip-level, not frame-level. It shows fields such as:
- Exercise label
- Split
- True count
- Pose prediction
- RGB prediction
- Model outcome
- Current playback time
The original LLSP annotation intervals are loaded from the raw split CSVs and shown as frame ranges plus approximate seconds using the audited FPS value. The currently active interval is highlighted while the video plays.
When served from the repo root, the default video base path in the app is:
/Data/LLSP/video/
That maps to the local folder:
Data/LLSP/video
After review, summarize the confirmed issues:
python3 artifacts/3_Modeling/summarize_reviewed_hard_cases.py \
--review-csv artifacts/3_Modeling/training_outputs/hard_case_review_manifest.csvThis writes:
reviewed_hard_case_summary.jsonreviewed_hard_case_primary_issues.csv
The goal is to distinguish confirmed data-side problems, label ambiguity, and true model failures rather than relying only on heuristic 7D buckets.
Generate an index for every cleaned sample:
python3 artifacts/3_Modeling/build_pose_feature_index.pyGenerate a squat-only index:
python3 artifacts/3_Modeling/build_pose_feature_index.py \
--exercise squat \
--output-csv Data/LLSP/annotation_cleaned/pose_feature_index_squat.csvThe generated CSV maps each video name to a target .npy output path and preserves the exercise label, split, and rep count.
Generate the full multi-exercise index:
python3 artifacts/3_Modeling/build_pose_feature_index.py \
--output-csv Data/LLSP/annotation_cleaned/pose_feature_index.csvBuild the missing-only worklist for the remaining exercises:
python3 artifacts/3_Modeling/build_remaining_pose_worklist.py \
--exclude-exercise othersRun extraction from an existing index:
python3 artifacts/3_Modeling/pose_feature_extraction.py \
--index-csv Data/LLSP/annotation_cleaned/pose_feature_index.csv \
--video-dir Data/LLSP/video \
--model artifacts/3_Modeling/yolo11n-pose.ptUseful debugging example:
python3 artifacts/3_Modeling/pose_feature_extraction.py \
--index-csv Data/LLSP/annotation_cleaned/pose_feature_index_squat.csv \
--video-dir Data/LLSP/video \
--model artifacts/3_Modeling/yolo11n-pose.pt \
--max-videos 5 \
--overwriteWhat the extractor does for each frame:
- opens the video with OpenCV
- runs YOLO pose inference
- selects the primary person
- stores 17 keypoints with
x,y, and confidence - flattens each frame into a 51-value feature vector
Output format:
- One
.npyfile per video - Array shape:
[T, 51] - Fallback shape when no pose is found:
[1, 51]filled with zeros
Generated outputs are written under Data/LLSP/annotation_cleaned/pose_features together with:
pose_extraction_report.csvpose_extraction_summary.json
The audit script joins summary statistics with local videos and tags common failure modes such as low confidence, poor lower-body visibility, or portrait framing.
It expects the squat feature summary generated in the squat feature extraction workflow, typically:
Data/LLSP/annotation_cleaned/squat_feature_summary.csv
Example:
python3 artifacts/3_Modeling/analyze_squat_video_quality.py \
--summary-csv Data/LLSP/annotation_cleaned/squat_feature_summary.csvAudit outputs are written to artifacts/3_Modeling/squat_video_audit/.
Most downstream experimentation currently lives in notebooks:
artifacts/1_EDA/1_EDA_34.ipynbartifacts/2_Data_preparation/2_Data_Preparation_01.ipynbartifacts/2_Data_preparation/COUNTIX_INTEGRATION.mdartifacts/3_Modeling/3_Model_Training_01.ipynbartifacts/3_Modeling/4_All_Exercises_Pose_Extraction_Colab.ipynbartifacts/3_Modeling/5_All_Exercises_Pose_Sequence_Preparation_Colab.ipynbartifacts/3_Modeling/6_All_Exercises_Counting_Baseline_Colab.ipynbartifacts/3_Modeling/6_Squat_Rep_Counting_Colab.ipynb
The project does not yet have a checked-in final rep-count evaluation report, but these metrics are currently available in the workspace.
Saved live runtime JSON files and live_squat_recordings videos are not part of the reportable benchmark
results below; they are prototype demo artifacts for qualitative inspection only.
From Data/LLSP/annotation_cleaned/pose_extraction_summary.json:
- Processed rows:
118 - Successful extractions:
118 - Failed extractions:
0 - Zero-pose outputs:
0 - Run cap used for that check: none (
max_videos = 0)
From artifacts/3_Modeling/squat_video_audit/squat_video_audit_summary.json:
- Audited squat videos:
118 - Severity breakdown:
ok:90review:15medium:11high:1critical:1
- Low-confidence counts:
- mean confidence
< 0.25:1 - mean confidence
< 0.40:2 - mean confidence
< 0.50:4 - mean confidence
< 0.70:21
- mean confidence
- Lower-body validity counts:
- valid ratio
< 0.25:2 - valid ratio
< 0.50:2 - valid ratio
< 0.75:5 - valid ratio
< 0.90:12
- valid ratio
From artifacts/3_Modeling/training_outputs/baseline_v2_rebuilt/feature_alignment_report.json:
- Train rows in cleaned labels:
732 - Valid rows in cleaned labels:
131 - Train rows aligned to current feature files:
20 - Valid rows aligned to current feature files:
0
The project now has reportable rep-count point estimates plus Stage 11 bootstrap confidence intervals for the dedicated squat control and the routed pull_up / push_up branches.
Reported metrics:
MAERMSEWithin-1 accuracy
Current reportable metrics with 95% bootstrap confidence intervals:
-
squatdedicated pose control (squat_tcn_l1_channels96,n=16)MAE = 2.1405,95% CI [1.1266, 3.3313]RMSE = 3.1016,95% CI [1.6982, 4.2837]Within-1 = 0.5625,95% CI [0.3125, 0.8125]
-
pull_uprouted pose branch (pose_count_tcn_pull_up_seq192,n=14)MAE = 4.6088,95% CI [2.0863, 7.5386]RMSE = 7.0169,95% CI [3.5909, 9.7687]Within-1 = 0.4286,95% CI [0.2143, 0.7143]
-
push_uprouted RGB branch (rgb_count_tcn_push_up_seq128,n=18)MAE = 6.6018,95% CI [3.3063, 10.4238]RMSE = 10.2865,95% CI [5.1748, 14.8974]Within-1 = 0.2778,95% CI [0.0556, 0.5000]
Current limitations:
- The live and packaged runtime path is intentionally squat-only; the broader exercise-dependent routing study is validated through offline artifacts and notebooks.
- The project assumes the exercise label is known at inference time. It does not yet include a production-ready exercise-recognition layer.
- The validation subsets for the primary reportable exercises are small (
n=16squat,n=14pull-up,n=18push-up), so the confidence intervals are wide and the conclusions should be treated as scoped research evidence. - The strongest result is exercise-dependent, not a universal architecture. Squat is best supported by dedicated pose features, push-up by RGB features, and pull-up remains sensitive to viewpoint and target-selection ambiguity.
- Runtime inference from raw video depends on local LLSP video files and saved model artifacts. The README documents the required folders and the dataset link, but large local assets are not all committed directly to Git.
- Most model-training workflows remain notebook-first because GPU-heavy experiments were run in Colab.
Future work:
- Package the routed multi-exercise counter behind a single inference entry point once the required model artifacts are finalized.
- Add an exercise classifier so the system no longer requires the exercise type to be supplied at inference time.
- Increase validation coverage or use cross-validation over the train/validation pool while keeping the test set held out.
- Evaluate stronger pose backbones and target-person tracking for difficult viewpoints, occlusion, and multi-person scenes.
- Replace simple late fusion with a learned modality-selection or confidence-aware fusion strategy.
- Move notebook-only training logic into reusable Python modules and add a documented end-to-end training/evaluation command.
- Extend the live prototype beyond squat only after the offline routed system is stable.
- The repository contains large local assets including videos, a YOLO checkpoint, and intermediate artifacts.
- The workflow is currently notebook-first for modeling and analysis.
- The project documentation in
artifacts/specification.mddescribes a broader future direction calledRepCoach, but the implemented code in this repo is narrower and focused on offline experimentation. - The strongest current result is an exercise-dependent routed system, not one universal counter.
- Countix is scaffolded but deferred; it is not part of the active LLSP conclusion surface.
- Validation slices for some exercises remain small, so some results should be interpreted as scoped research evidence rather than final deployment claims.
artifacts/specification.md: target product and system designartifacts/repcount_analysis.md: dataset notesartifacts/3_Modeling/YOLO_PIPELINE.md: pose extraction runbookartifacts/3_Modeling/COLAB_SQUAT_POSE.md: Colab workflow for squat extraction
Reasonable next improvements for this project are:
- Move notebook logic into reusable Python modules
- Add a documented training and evaluation script for rep counting
- Formalize metrics for per-exercise mean absolute error
- Stabilize the live squat prototype further across camera setups and movement speeds
- Extend the live path beyond squat only if a later project phase requires full recognition and tracking