| title | FocusGuard |
|---|---|
| emoji | 👁️ |
| colorFrom | blue |
| colorTo | indigo |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
| short_description | Real-time webcam focus detection via MediaPipe + MLP/XGBoost |
Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.
Team name: FocusGuards (5CCSAGAP Large Group Project)
Members: Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas
- Git repository: GAP_Large_project
- Deployed app (Hugging Face): FocusGuard/final_v2
- ClearML experiments: FocusGuards Large Group Project
- Checkpoints (Google Drive): Download folder
- Dataset (Google Drive): Dataset folder
- Data consent form (PDF): Consent document
The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).
Model checkpoints are not included in the submission archive. Download them before running inference.
Pre-trained checkpoints are available in the Hugging Face Space files:
https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints
Download and place into checkpoints/:
| File | Description |
|---|---|
mlp_best.pt |
PyTorch MLP (10-64-32-2, ~2,850 params) |
xgboost_face_orientation_best.json |
XGBoost (600 trees, depth 8, lr 0.1489) |
scaler_mlp.joblib |
StandardScaler fit on training data |
hybrid_focus_config.json |
Hybrid pipeline fusion weights |
hybrid_combiner.joblib |
Hybrid combiner |
L2CSNet_gaze360.pkl |
L2CS-Net ResNet50 gaze weights (96 MB) |
Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".
| Model | Task ID | Model ID |
|---|---|---|
| MLP | 3899b5aa0c3348b28213a3194322cdf7 |
56f94b799f624bdc845fa50c4d0606fe |
| XGBoost | c0ceb8e7e8194a51a7a31078cc47775c |
6727b8de334f4ca0961c46b436f6fb7c |
UI: Open a task on the experiments page, go to Artifacts > Output Models, and download.
Python:
from clearml import Model
mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy() # downloads .pt
xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy() # downloads .jsonCopy the downloaded files into checkpoints/.
If ClearML access is restricted, download checkpoints from: https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link
Place all files under checkpoints/.
python -m models.mlp.train
python -m models.xgboost.trainThis regenerates checkpoints/mlp_best.pt, checkpoints/xgboost_face_orientation_best.json, and scalers. Requires training data under data/collected_*/.
api/
db.py async SQLite DB (focus_sessions, focus_events, user_settings)
drawing.py server-side face mesh + HUD drawing for WebRTC/WS frames
assets/
focusguard-demo.gif demo gif used in this README
config/
default.yaml hyperparameters, thresholds, app settings
__init__.py config loader + ClearML flattener
clearml_enrich.py ClearML task enrichment + artifact upload
data_preparation/
prepare_dataset.py load/split/scale .npz files (pooled + LOPO)
data_exploration.ipynb EDA: distributions, class balance, correlations
models/
face_mesh.py MediaPipe 478-point face landmarks
head_pose.py yaw/pitch/roll via solvePnP, face-orientation score
eye_scorer.py EAR, MAR, gaze ratios, PERCLOS
collect_features.py real-time feature extraction + webcam labelling CLI
gaze_calibration.py 9-point polynomial gaze calibration
gaze_eye_fusion.py fuses calibrated gaze with eye openness
mlp/
train.py MLP training script
eval_accuracy.py accuracy evaluation
sweep.py Optuna hyperparameter sweep
xgboost/
config.py shared XGBoost params (reads config/default.yaml)
train.py XGBoost training script
eval_accuracy.py accuracy evaluation
add_accuracy.py post-hoc accuracy annotation
sweep.py ClearML + Optuna hyperparameter sweep
sweep_local.py local Optuna sweep (no ClearML)
fetch_sweep_results.py fetch and export sweep results from ClearML
L2CS-Net/ vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/ (excluded from archive; see download instructions above)
notebooks/
mlp.ipynb MLP training + LOPO in Jupyter
xgboost.ipynb XGBoost training + LOPO in Jupyter
evaluation/
justify_thresholds.py LOPO threshold + weight grid search
feature_importance.py XGBoost gain + leave-one-feature-out ablation
grouped_split_benchmark.py pooled vs LOPO comparison
plots/ ROC curves, confusion matrices, weight searches
logs/ JSON training logs
tests/
test_*.py unit + integration tests (pytest)
ui/
pipeline.py all 5 pipeline classes + output smoothing
live_demo.py OpenCV webcam demo
src/
components/
Home.jsx landing / session start
FocusPageLocal.jsx main focus session view
Records.jsx session history
Achievement.jsx gamification / badges
Customise.jsx user preferences
CalibrationOverlay.jsx 9-point gaze calibration UI
Help.jsx help screen
utils/
VideoManagerLocal.js webcam + WebSocket video management
scripts/
push_hf_test_final.sh push to Hugging Face Space
static/ built frontend assets (after npm build)
app.py re-exports app from main.py (ASGI convenience)
main.py FastAPI application entry point (~1000 lines)
download_l2cs_weights.py download L2CS-Net weights from Google Drive
resolve_lfs.py resolve Git LFS pointer files at runtime
Dockerfile container image (runs start.sh)
docker-compose.yml local Docker Compose config
start.sh entrypoint: resolves LFS, optional L2CS download, uvicorn
package.json frontend package manifest
requirements.txt
pytest.ini
.coveragerc pytest-cov config (branch coverage, root source)
Recommended versions:
- Python 3.10-3.11
- Node.js 18+ (needed only for frontend rebuild/dev)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtThen download checkpoints: Download folder and place into checkpoints/:
If you need to rebuild frontend assets locally: (optional)
npm install
npm run build
mkdir -p static && cp -r dist/* static/python ui/live_demo.py
python ui/live_demo.py --xgb # XGBoostControls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860docker-compose up # serves on port 7860python -m models.collect_features --name <participant>Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to data/collected_<participant>/ as .npz files. Raw video is never stored.
9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.
Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing
Webcam frame
--> MediaPipe Face Mesh (478 landmarks)
--> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
--> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
--> Gaze ratios: h_gaze, v_gaze, gaze_offset
--> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
--> 17 features --> select 10 --> clip to physiological bounds
--> ML model (MLP / XGBoost) or geometric scorer
--> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
--> FOCUSED / UNFOCUSED
Five runtime modes share the same feature extraction backbone:
| Mode | Description |
|---|---|
| Geometric | Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg |
| XGBoost | 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) |
| MLP | PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) |
| Hybrid | 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) |
| L2CS | Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) |
Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.
Both scripts read all hyperparameters from config/default.yaml.
python -m models.mlp.train
python -m models.xgboost.trainOutputs: checkpoints/ (model + scaler) and evaluation/logs/ (CSVs, JSON summaries).
USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearmlLogs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).
Reference experiment IDs:
| Model | ClearML experiment ID |
|---|---|
MLP (models.mlp.train) |
3899b5aa0c3348b28213a3194322cdf7 |
XGBoost (models.xgboost.train) |
c0ceb8e7e8194a51a7a31078cc47775c |
python -m evaluation.justify_thresholds # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark # pooled vs LOPO comparison
python -m evaluation.feature_importance # XGBoost gain + LOFO ablation| Model | Accuracy | F1 | ROC-AUC |
|---|---|---|---|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64-32) | 92.92% | 0.929 | 0.971 |
| Model | LOPO AUC | Best threshold (Youden's J) | F1 at best threshold |
|---|---|---|---|
| MLP | 0.862 | 0.228 | 0.858 |
| XGBoost | 0.870 | 0.280 | 0.855 |
Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820). Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).
The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.
| Channel subset | Mean LOPO F1 |
|---|---|
| All 10 features | 0.829 |
| Eye state only | 0.807 |
| Head pose only | 0.748 |
| Gaze only | 0.726 |
Top-5 XGBoost gain: s_face (10.27), ear_right (9.54), head_deviation (8.83), ear_avg (6.96), perclos (5.68).
L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.
Standalone mode: Select L2CS as the model.
Boost mode: Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.
Calibration: Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.
L2CS weight lookup order in runtime:
checkpoints/L2CSNet_gaze360.pklmodels/L2CS-Net/models/L2CSNet_gaze360.pklmodels/L2CSNet_gaze360.pkl
All hyperparameters and app settings are in config/default.yaml. Override with FOCUSGUARD_CONFIG=/path/to/custom.yaml.
Included checks:
- data prep helpers and real split consistency (
test_data_preparation.py; split test skips ifdata/collected_*/*.npzis absent) - feature clipping (
test_models_clip_features.py) - pipeline integration (
test_pipeline_integration.py) - gaze calibration / fusion diagnostics (
test_gaze_pipeline.py) - FastAPI health, settings, sessions (
test_health_endpoint.py,test_api_settings.py,test_api_sessions.py)
pytestCoverage is enabled by default via pytest.ini (--cov / term report). For HTML coverage: pytest --cov-report=html.
Stack: Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.
