FocusGuard

title	FocusGuard
emoji	👁️
colorFrom	blue
colorTo	indigo
sdk	docker
app_port	7860
pinned	false
short_description	Real-time webcam focus detection via MediaPipe + MLP/XGBoost

FocusGuard

Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.

Team

Team name: FocusGuards (5CCSAGAP Large Group Project)

Members: Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas

Links

Project access

Git repository: GAP_Large_project
Deployed app (Hugging Face): FocusGuard/final_v2
ClearML experiments: FocusGuards Large Group Project

Data and checkpoints

Checkpoints (Google Drive): Download folder
Dataset (Google Drive): Dataset folder
Data consent form (PDF): Consent document

The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).

Trained models

Model checkpoints are not included in the submission archive. Download them before running inference.

Option 1: Hugging Face Space

Pre-trained checkpoints are available in the Hugging Face Space files:

https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints

Download and place into checkpoints/:

File	Description
`mlp_best.pt`	PyTorch MLP (10-64-32-2, ~2,850 params)
`xgboost_face_orientation_best.json`	XGBoost (600 trees, depth 8, lr 0.1489)
`scaler_mlp.joblib`	StandardScaler fit on training data
`hybrid_focus_config.json`	Hybrid pipeline fusion weights
`hybrid_combiner.joblib`	Hybrid combiner
`L2CSNet_gaze360.pkl`	L2CS-Net ResNet50 gaze weights (96 MB)

Option 2: ClearML

Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".

Model	Task ID	Model ID
MLP	`3899b5aa0c3348b28213a3194322cdf7`	`56f94b799f624bdc845fa50c4d0606fe`
XGBoost	`c0ceb8e7e8194a51a7a31078cc47775c`	`6727b8de334f4ca0961c46b436f6fb7c`

UI: Open a task on the experiments page, go to Artifacts > Output Models, and download.

Python:

from clearml import Model

mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy()   # downloads .pt

xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy()   # downloads .json

Copy the downloaded files into checkpoints/.

Option 3: Google Drive (submission fallback)

If ClearML access is restricted, download checkpoints from: https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link

Place all files under checkpoints/.

Option 4: Retrain from scratch

python -m models.mlp.train
python -m models.xgboost.train

This regenerates checkpoints/mlp_best.pt, checkpoints/xgboost_face_orientation_best.json, and scalers. Requires training data under data/collected_*/.

Project layout

api/
    db.py                     async SQLite DB (focus_sessions, focus_events, user_settings)
    drawing.py                server-side face mesh + HUD drawing for WebRTC/WS frames
assets/
    focusguard-demo.gif       demo gif used in this README
config/
    default.yaml              hyperparameters, thresholds, app settings
    __init__.py               config loader + ClearML flattener
    clearml_enrich.py         ClearML task enrichment + artifact upload
data_preparation/
    prepare_dataset.py        load/split/scale .npz files (pooled + LOPO)
    data_exploration.ipynb    EDA: distributions, class balance, correlations
models/
    face_mesh.py              MediaPipe 478-point face landmarks
    head_pose.py              yaw/pitch/roll via solvePnP, face-orientation score
    eye_scorer.py             EAR, MAR, gaze ratios, PERCLOS
    collect_features.py       real-time feature extraction + webcam labelling CLI
    gaze_calibration.py       9-point polynomial gaze calibration
    gaze_eye_fusion.py        fuses calibrated gaze with eye openness
    mlp/
        train.py              MLP training script
        eval_accuracy.py      accuracy evaluation
        sweep.py              Optuna hyperparameter sweep
    xgboost/
        config.py             shared XGBoost params (reads config/default.yaml)
        train.py              XGBoost training script
        eval_accuracy.py      accuracy evaluation
        add_accuracy.py       post-hoc accuracy annotation
        sweep.py              ClearML + Optuna hyperparameter sweep
        sweep_local.py        local Optuna sweep (no ClearML)
        fetch_sweep_results.py  fetch and export sweep results from ClearML
    L2CS-Net/                 vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/                  (excluded from archive; see download instructions above)
notebooks/
    mlp.ipynb                 MLP training + LOPO in Jupyter
    xgboost.ipynb             XGBoost training + LOPO in Jupyter
evaluation/
    justify_thresholds.py     LOPO threshold + weight grid search
    feature_importance.py     XGBoost gain + leave-one-feature-out ablation
    grouped_split_benchmark.py  pooled vs LOPO comparison
    plots/                    ROC curves, confusion matrices, weight searches
    logs/                     JSON training logs
tests/
    test_*.py                 unit + integration tests (pytest)
ui/
    pipeline.py               all 5 pipeline classes + output smoothing
    live_demo.py              OpenCV webcam demo
src/
    components/
        Home.jsx              landing / session start
        FocusPageLocal.jsx    main focus session view
        Records.jsx           session history
        Achievement.jsx       gamification / badges
        Customise.jsx         user preferences
        CalibrationOverlay.jsx  9-point gaze calibration UI
        Help.jsx              help screen
    utils/
        VideoManagerLocal.js  webcam + WebSocket video management
scripts/
    push_hf_test_final.sh     push to Hugging Face Space
static/                       built frontend assets (after npm build)
app.py                        re-exports app from main.py (ASGI convenience)
main.py                       FastAPI application entry point (~1000 lines)
download_l2cs_weights.py      download L2CS-Net weights from Google Drive
resolve_lfs.py                resolve Git LFS pointer files at runtime
Dockerfile                    container image (runs start.sh)
docker-compose.yml            local Docker Compose config
start.sh                      entrypoint: resolves LFS, optional L2CS download, uvicorn
package.json                  frontend package manifest
requirements.txt
pytest.ini
.coveragerc                   pytest-cov config (branch coverage, root source)

Setup

Recommended versions:

Python 3.10-3.11
Node.js 18+ (needed only for frontend rebuild/dev)

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt

Then download checkpoints: Download folder and place into checkpoints/:

If you need to rebuild frontend assets locally: (optional)

npm install
npm run build
mkdir -p static && cp -r dist/* static/

Run

Local OpenCV demo

python ui/live_demo.py        
python ui/live_demo.py --xgb    # XGBoost

Controls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.

Web app (without Docker)

source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860

Open http://localhost:7860

Web app (Docker)

docker-compose up               # serves on port 7860

Data collection

python -m models.collect_features --name <participant>

Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to data/collected_<participant>/ as .npz files. Raw video is never stored.

9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.

Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing

Pipeline

Webcam frame
  --> MediaPipe Face Mesh (478 landmarks)
    --> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
    --> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
    --> Gaze ratios: h_gaze, v_gaze, gaze_offset
    --> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
  --> 17 features --> select 10 --> clip to physiological bounds
  --> ML model (MLP / XGBoost) or geometric scorer
  --> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
  --> FOCUSED / UNFOCUSED

Five runtime modes share the same feature extraction backbone:

Mode	Description
Geometric	Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg
XGBoost	600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal)
MLP	PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal)
Hybrid	30% MLP + 70% geometric ensemble (LOPO F1 = 0.841)
L2CS	Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained)

Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.

Training

Both scripts read all hyperparameters from config/default.yaml.

python -m models.mlp.train
python -m models.xgboost.train

Outputs: checkpoints/ (model + scaler) and evaluation/logs/ (CSVs, JSON summaries).

ClearML experiment tracking

USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml

Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).

Reference experiment IDs:

Model	ClearML experiment ID
MLP (`models.mlp.train`)	`3899b5aa0c3348b28213a3194322cdf7`
XGBoost (`models.xgboost.train`)	`c0ceb8e7e8194a51a7a31078cc47775c`

Evaluation

python -m evaluation.justify_thresholds          # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark     # pooled vs LOPO comparison
python -m evaluation.feature_importance          # XGBoost gain + LOFO ablation

Results (pooled random split, 15% test)

Model	Accuracy	F1	ROC-AUC
XGBoost (600 trees, depth 8)	95.87%	0.959	0.991
MLP (64-32)	92.92%	0.929	0.971

Results (LOPO, 9 participants)

Model	LOPO AUC	Best threshold (Youden's J)	F1 at best threshold
MLP	0.862	0.228	0.858
XGBoost	0.870	0.280	0.855

Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820). Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).

The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.

Feature ablation

Channel subset	Mean LOPO F1
All 10 features	0.829
Eye state only	0.807
Head pose only	0.748
Gaze only	0.726

Top-5 XGBoost gain: s_face (10.27), ear_right (9.54), head_deviation (8.83), ear_avg (6.96), perclos (5.68).

L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.

Standalone mode: Select L2CS as the model.

Boost mode: Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.

Calibration: Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.

L2CS weight lookup order in runtime:

checkpoints/L2CSNet_gaze360.pkl
models/L2CS-Net/models/L2CSNet_gaze360.pkl
models/L2CSNet_gaze360.pkl

Config

All hyperparameters and app settings are in config/default.yaml. Override with FOCUSGUARD_CONFIG=/path/to/custom.yaml.

Tests

Included checks:

data prep helpers and real split consistency (test_data_preparation.py; split test skips if data/collected_*/*.npz is absent)
feature clipping (test_models_clip_features.py)
pipeline integration (test_pipeline_integration.py)
gaze calibration / fusion diagnostics (test_gaze_pipeline.py)
FastAPI health, settings, sessions (test_health_endpoint.py, test_api_settings.py, test_api_sessions.py)

pytest

Coverage is enabled by default via pytest.ini (--cov / term report). For HTML coverage: pytest --cov-report=html.

Stack: Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FocusGuard

Team

Links

Project access

Data and checkpoints

Trained models

Option 1: Hugging Face Space

Option 2: ClearML

Option 3: Google Drive (submission fallback)

Option 4: Retrain from scratch

Project layout

Setup

Run

Local OpenCV demo

Web app (without Docker)

Web app (Docker)

Data collection

Pipeline

Training

ClearML experiment tracking

Evaluation

Results (pooled random split, 15% test)

Results (LOPO, 9 participants)

Feature ablation

L2CS Gaze Tracking

Config

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api		api
assets		assets
checkpoints		checkpoints
config		config
data_preparation		data_preparation
evaluation		evaluation
models		models
notebooks		notebooks
public		public
scripts		scripts
src		src
static		static
tests		tests
ui		ui
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
download_l2cs_weights.py		download_l2cs_weights.py
eslint.config.js		eslint.config.js
index.html		index.html
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
resolve_lfs.py		resolve_lfs.py
start.sh		start.sh
vite.config.js		vite.config.js

Folders and files

Latest commit

History

Repository files navigation

FocusGuard

Team

Links

Project access

Data and checkpoints

Trained models

Option 1: Hugging Face Space

Option 2: ClearML

Option 3: Google Drive (submission fallback)

Option 4: Retrain from scratch

Project layout

Setup

Run

Local OpenCV demo

Web app (without Docker)

Web app (Docker)

Data collection

Pipeline

Training

ClearML experiment tracking

Evaluation

Results (pooled random split, 15% test)

Results (LOPO, 9 participants)

Feature ablation

L2CS Gaze Tracking

Config

Tests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages