Based on ajsteele/faceHR — the original idea and core algorithm (Eulerian color magnification of skin pixels) come from that project. This repo is a reimplementation as an interactive Gradio web app with automatic skin segmentation powered by Meta's SAM 3 (Segment Anything Model 3).
Built with Claude Code.
Extracts the photoplethysmographic (PPG) signal from video of human skin, then amplifies the subtle color changes caused by blood flow so they become visible to the naked eye. The end goal is to make arterial pulsation visible enough to help locate arteries (e.g. for arterial blood draws).
A sample output video is available in output.mp4.
uv run app.pyThis opens a Gradio web UI on 0.0.0.0:7860. Upload or record a video, click on skin to pick the chroma reference, adjust parameters, and process.
Dependencies (gradio, numpy, scipy, opencv-python-headless, matplotlib, torch, transformers, cupy-cuda12x) are resolved automatically by uv via PEP 723 inline metadata.
- Python >= 3.12
- uv (recommended) or pip
- CUDA-capable GPU (for CuPy and SAM 3 skin segmentation)
- Skin segmentation — Meta's SAM 3 automatically segments skin regions using text-prompted segmentation; alternatively the user clicks a skin pixel and chroma-keying in YUV space classifies skin vs. non-skin.
- Spatial averaging — Mean color of skin pixels per frame produces a raw PPG time-series.
- Temporal bandpass — Two cascaded moving averages (high-pass to remove baseline drift, low-pass to smooth noise) isolate the cardiac frequency band.
- Eulerian color magnification — The filtered PPG signal is amplified and added back onto skin pixels in YUV chrominance space, making blood flow color changes visible.
The user clicks a skin pixel. Its YUV chrominance (U, V components only — luminance Y is discarded) is stored as skin_chroma.
For every frame, each pixel is classified as skin or not-skin:
distance² = (U_pixel - U_skin)² + (V_pixel - V_skin)²
is_skin = distance² < chroma_similarity (default threshold = 100)
This is a simple Euclidean distance in UV space (squared, to avoid a sqrt). The result is a boolean mask skin_key.
For each frame, the mean color of all skin-keyed pixels is computed:
ppg_yuv[i] = mean(frame_yuv[skin_key]) → shape (3,) = [Y, U, V]
ppg_rgb[i] = mean(frame_rgb[skin_key]) → shape (3,) = [B, G, R]
Over N frames this produces a 1D time-series per channel: the raw PPG signal. The cardiac pulse modulates skin color (mainly via hemoglobin absorption), so this average tracks blood volume changes.
The raw PPG is noisy and has a slow-moving baseline (lighting drift, movement). Two moving-average steps clean it:
ppg = ppg - moving_average(ppg, n_bg_ma=90) # high-pass: remove baseline drift
ppg = moving_average(ppg, n_smooth_ma=6) # low-pass: smooth out noiseThis is a bandpass filter implemented as two cascaded moving averages:
- Subtracting a wide (90-frame) moving average removes frequencies below ~fps/90 Hz (breathing, lighting changes).
- Applying a narrow (6-frame) moving average suppresses high-frequency noise above ~fps/6 Hz.
The result is then normalized to a target amplitude delta:
ppg_filtered = delta * ppg / max(|ppg|)
ppg_w_ma = mean(ppg_rgb_ma, axis=1) # average of filtered B, G, RThis collapses the three RGB channels into a single scalar per frame — a luminance-like "white" PPG signal. This is what gets amplified back onto the video.
In the second pass over the video, each frame is reconstructed with the PPG signal amplified back onto skin pixels:
# For each frame i:
colours_w = [0, skin_key * ppg_w_ma[i], skin_key * ppg_w_ma[i]] # [Y=0, U=signal, V=signal]
output = frame_yuv + colours_w * 10000 # amplify and add in YUV spaceKey details:
- The Y (luminance) component is left at 0 — only chrominance (U, V) is boosted. This avoids brightness flicker and emphasizes the color shift caused by blood.
- The multiplier
10000is the amplification factor. - Only pixels inside
skin_keyreceive the boost (others get 0). - The result is converted back to BGR for display/saving.
This is a simplified form of Eulerian Video Magnification (Wu et al., 2012): instead of spatially decomposing into pyramids, it uses a single spatial region (skin mask) and temporal bandpass (moving averages).
A sliding 256-frame window with Welch's method estimates the power spectral density of the PPG signal. The dominant frequency peak in the 0.8–3 Hz range gives the heart rate in Hz (×60 = BPM). This is implemented but commented out.
Video frames
→ YUV conversion
→ Chroma-key skin mask (UV distance)
→ Spatial mean of skin pixels per frame → raw PPG time-series
→ Temporal bandpass (subtract slow MA, apply fast MA) → filtered PPG
→ Average RGB channels → scalar "white" PPG
→ Multiply back onto skin pixels in UV space × 10000
→ Output: video with amplified blood flow color
| Parameter | Default | Purpose |
|---|---|---|
chroma_similarity |
100 (= 10²) | Skin detection threshold in UV² space |
n_bg_ma |
90 | High-pass: frames for baseline removal |
n_smooth_ma |
6 | Low-pass: frames for noise smoothing |
delta |
1 | Normalized PPG amplitude |
| Amplification | 10000 | Multiplier when adding signal back to frames |
- Per-pixel PPG instead of spatial average: The current code averages all skin pixels into one scalar. To locate arteries, you need the PPG signal per pixel (or per small patch). Compute
magnify_colour_maon a per-pixel or patch-grid basis. - Narrow bandpass around cardiac frequency: Use ~0.8–2.5 Hz (48–150 BPM). The moving-average bandpass is simple but imprecise — consider a Butterworth or FIR filter for tighter control.
- Amplitude map: The amplitude of the per-pixel PPG correlates with local blood volume pulsation. Arteries will show higher amplitude than veins or capillary beds. Render this as a heatmap overlay.
- Phase map: Arterial sites pulse earlier than venous. A phase-delay map (via cross-correlation or Hilbert transform against a reference PPG) can further disambiguate arteries.
- GPU acceleration: The current code uses CuPy/cuSignal. For a Gradio app, consider whether GPU is available or if NumPy/SciPy suffices for shorter clips.
- Original idea and algorithm: ajsteele/faceHR
- Based on Eulerian Video Magnification (Wu et al., 2012)
- Meta's SAM 3 (Segment Anything Model 3) for automatic skin segmentation (HuggingFace)
