GitHub - MexLinker/video_en_learn

Video Watching English Learn — Segmentation & Clips (CLI + Electron)

Overview

Two pipelines are provided to segment movie audio/video and cut clips:
- Option A (fast): inaSpeechSegmenter + ffmpeg via Python script.
- Option B (robust): pyannote.audio + whisperx + ffmpeg in a Jupyter notebook.
A simple Electron GUI is included to run Option A on Windows and produce clips without using the terminal.

User Guide

For a more vivid, step-by-step walkthrough, see USER_GUIDE.md.

Docker

Build the image:
- docker compose build
Run Option A (CLI) inside the container:
- docker compose run --rm cli bash -lc "python scripts/option_a_segment_and_cut.py --input samples/tts/tts_speech.mp4 --label female --ffmpeg $(which ffmpeg) --execute"
- Replace --input with your video path under the mounted workspace.
Launch JupyterLab for Option B:
- docker compose up jupyter
- Open http://localhost:8888 in your browser and load notebooks/pyannote_whisperx_pipeline.ipynb.
- Note: model downloads may be large; consider mirrors or local caches.

Quick Start (Electron GUI)

Prereqs:
- Windows 10/11
- Python 3.10+ on PATH (for running the segmentation script)
- Node.js 18+ (for running Electron)
Install dependencies:
- Set npm registry for speed: npm config set registry https://registry.npmmirror.com
- npm install
Run the app in dev mode:
- npm run start
- In the GUI, click “Detect Python & FFmpeg” (it auto-fills paths). Provide a video path (e.g., samples\\tts\\tts_speech.mp4). Choose a label (female/male/speech), then “Run Option A”.
Package a Windows EXE:
- npm run dist
- Output installer/exe appears under dist/. The build bundles the GUI and uses your local scripts/ and ffmpeg/ folders.

CLI Usage (Option A)

Command:
- python scripts/option_a_segment_and_cut.py --input <video.mp4> --label <female|male|speech> --ffmpeg <path-to-ffmpeg.exe> --execute
Outputs:
- outputs/option_a/segments_<label>.csv — segment timestamps
- outputs/option_a/cut_<label>_clips.bat — generated Windows cutting commands
- outputs/option_a/clips/clip_XXX.mp4 — cut clips (with --execute)

Notebook (Option B)

Path: notebooks/pyannote_whisperx_pipeline.ipynb
Purpose: diarization + transcription + word alignment + clip cutting command generation.
Note: Requires reliable model downloads from Hugging Face; use mirrors or pre-cached models in limited-network environments.

Troubleshooting

No speech detected:
- Some content is classified as music/noEnergy. Try --label female/--label male or relax VAD thresholds.
- Use a sample with clear spoken voice; samples\\tts\\tts_speech.mp4 is provided.
FFmpeg missing:
- A portable FFmpeg is included under ffmpeg/ffmpeg-8.0-essentials_build/bin/ffmpeg.exe.
- In Docker, ffmpeg is installed system-wide and available via $(which ffmpeg).
WhisperX/pyannote model download timeouts:
- Set mirrors: set HF_ENDPOINT=https://hf-mirror.com (PowerShell: $env:HF_ENDPOINT = 'https://hf-mirror.com').
- Provide local model caches if internet is constrained.

Credits

inaSpeechSegmenter, ffmpeg, pyannote.audio, whisperx — see their respective repositories for licensing and usage details.

==========================================================

This project provides two practical pipelines to segment a movie by speaker and cut clips for male speech:

Option A — quick + simple using inaSpeechSegmenter + ffmpeg
Option B — higher-quality using pyannote.audio + WhisperX + (optional) inaSpeechSegmenter + ffmpeg

Folder layout:

scripts/option_a_segment_and_cut.py — ready-to-run Python script for Option A
notebooks/pyannote_whisperx_pipeline.ipynb — step-by-step notebook for Option B

Option A — Quick + Simple (Recommended to start)

What it does:

Extracts audio from your video
Runs inaSpeechSegmenter to tag segments as male/female/music/noise
Generates a CSV and a Windows .bat file with ffmpeg commands to cut male-only clips
Can optionally execute the cuts automatically

Prerequisites:

Windows: install ffmpeg and ensure ffmpeg.exe is in PATH. For example: winget install --id=Gyan.FFmpeg.
Python 3.8–3.13
pip install inaSpeechSegmenter

Run:

python scripts/option_a_segment_and_cut.py --input C:\path\to\movie.mp4 --copy --execute

Useful flags:

--ffmpeg C:\path\to\ffmpeg.exe — provide explicit ffmpeg path if not in PATH
--label male|female|speech — choose which label to cut (default: male)
--pad 0.05 — add padding (seconds) around each segment
--merge-gap 0.2 — merge adjacent segments closer than this gap
--min-dur 1.0 — drop segments shorter than this duration
--copy — stream copy (fast, may have less-precise boundaries)
--reencode — reencode to H.264/AAC (precise boundaries, slower)
--execute — actually run ffmpeg and produce clips/*.mp4

Outputs:

outputs/option_a/extracted_audio.wav — audio extracted from the input
outputs/option_a/segments_male.csv — segment timestamps and labels
outputs/option_a/cut_male_clips.bat — Windows batch file to cut clips
outputs/option_a/clips/clip_001.mp4 (etc.) — if --execute is used

Option B — Higher Quality (Diarization + Transcript + Alignment)

What it does:

Extracts audio (ffmpeg)
Runs speaker diarization with pyannote.audio (“who spoke when”)
Transcribes and aligns words with whisperx (word-level timestamps)
Optionally maps speakers to perceived gender via inaSpeechSegmenter
Outputs CSVs and a .bat file to cut male clips (speaker-level or frame-level)

Prerequisites:

ffmpeg installed and available
Python 3.8–3.13
pip install pyannote.audio whisperx inaSpeechSegmenter
Hugging Face token (user access) for pyannote/speaker-diarization-community-1: https://huggingface.co/settings/tokens
GPU optional (CUDA 12.8 recommended for speed); CPU works but is slower

Usage:

Open the notebook: notebooks/pyannote_whisperx_pipeline.ipynb
Set INPUT_VIDEO, OUTDIR, FFMPEG_PATH (if needed), and HF_TOKEN in the first cell
Run cells in order; outputs are saved under outputs/option_b/

Outputs:

outputs/option_b/speaker_turns.csv — diarized speaker turns
outputs/option_b/words.csv — word-level timestamps
outputs/option_b/words_with_speakers.csv — words assigned to speakers
outputs/option_b/gender_segments.csv — optional gender segments (via inaSpeech)
outputs/option_b/speaker_gender.csv — speaker→gender mapping (heuristic)
outputs/option_b/male_cuts.csv — final cut list
outputs/option_b/cut_male_clips.bat — batch commands to cut clips

Accuracy, Bias, and Legal Notes

Gender detection from voice infers perceived gender from audio traits and can be inaccurate and biased across languages/cultures/conditions. Expect mistakes; review outputs.
Overlapping speech: diarization can handle overlap better than basic segmentation; decide whether to include/exclude mixed speech in your cuts.
Copyright: cutting and redistributing movie clips may violate copyright. For personal study, local laws vary. Ensure your use complies with license and jurisdiction.
Performance: long movies take time to process. Consider chunking or a machine with a decent CPU/GPU.

References (open source tools)

inaSpeechSegmenter — CNN-based segmentation with speech/music/noise and speaker gender labels: https://github.com/ina-foss/inaSpeechSegmenter
pyannote.audio — state-of-the-art speaker diarization toolkit: https://github.com/pyannote/pyannote-audio
WhisperX — Whisper + word-level alignment + diarization integration: https://github.com/m-bain/whisperX
OpenVINO diarization tutorial — pipeline concepts and tips: https://docs.openvino.ai/2023.3/notebooks/212-pyannote-speaker-diarization-with-output.html
SpeechBrain — speech toolkit (embeddings/classifiers optional): https://github.com/speechbrain/speechbrain

Quick Recommendation

Start with Option A to validate your workflow quickly. If you need better handling of overlapping speakers or a transcript with word-level timestamps, move to Option B and map speaker IDs to male/female using inaSpeechSegmenter or a simple classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
ffmpeg/ffmpeg-8.0-essentials_build		ffmpeg/ffmpeg-8.0-essentials_build
notebooks		notebooks
samples/tts		samples/tts
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
USAGE.md		USAGE.md
USER_GUIDE.md		USER_GUIDE.md
docker-compose.yml		docker-compose.yml
fake		fake
package-lock.json		package-lock.json
package.json		package.json
roadmap.md		roadmap.md
sample_sintel.mp4		sample_sintel.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Option A — Quick + Simple (Recommended to start)

Option B — Higher Quality (Diarization + Transcript + Alignment)

Accuracy, Bias, and Legal Notes

References (open source tools)

Quick Recommendation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Option A — Quick + Simple (Recommended to start)

Option B — Higher Quality (Diarization + Transcript + Alignment)

Accuracy, Bias, and Legal Notes

References (open source tools)

Quick Recommendation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages