Skip to content
This repository was archived by the owner on Mar 25, 2026. It is now read-only.

Commit 51dbd8b

Browse files
committed
add pyannote segmentation-3.0 CoreML version
1 parent e6117a4 commit 51dbd8b

File tree

22 files changed

+1199
-61
lines changed

22 files changed

+1199
-61
lines changed

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ result/
1717

1818
**/CLAUDE.md
1919

20-
*.json
2120
*.rttm
2221

2322
audio/

CMakeLists.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,15 @@ set(CMAKE_CXX_STANDARD_REQUIRED ON)
88
# Find Python
99
find_package(Python COMPONENTS Interpreter Development.Module REQUIRED)
1010

11-
# Add the fbank_extractor subdirectory. This is where the C++ library is defined, built, and its installation is configured
11+
# libfbank_extractor
1212
add_subdirectory(senko/fbank_extractor)
1313

14+
# libvad_coreml
15+
if(APPLE)
16+
add_subdirectory(senko/vad_coreml)
17+
endif()
18+
1419
# Only install in wheel builds (skip editable and sdist)
1520
if(SKBUILD_STATE STREQUAL "wheel")
16-
install(TARGETS fbank_extractor DESTINATION senko)
1721
install(DIRECTORY models DESTINATION senko)
1822
endif()

DOCS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ diarizer = senko.Diarizer(device='auto', vad='auto', clustering='auto', warmup=T
88
- `device`: Device to use for PyTorch operations (`auto`, `cuda`, `coreml`, `cpu`)
99
- `auto` automatically selects `coreml` if on macOS, if not, then `cuda`, if not, then `cpu`
1010
- `vad`: Voice Activity Detection model to use (`auto`, `pyannote`, `silero`)
11-
- `auto` automatically selects `pyannote` for `cuda`, `silero` for everything else
11+
- `auto` automatically selects `pyannote` for `cuda` & `coreml`, `silero` for `cpu`
1212
- `pyannote` uses Pyannote VAD (requires `cuda` for optimal performance)
13-
- `silero` uses Silero VAD (works on all devices, runs on CPU)
13+
- `silero` uses Silero VAD (runs on CPU; not available on macOS)
1414
- `clustering`: Clustering location when `device` == `cuda` (`auto`, `gpu`, `cpu`)
1515
- Only applies to CUDA devices; non-CUDA devices always use CPU clustering
1616
- `auto` uses GPU clustering for CUDA devices with compute capability >= 7.0, CPU clustering otherwise

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ A very fast and accurate speaker diarization pipeline.
55

66
1 hour of audio processed in 5 seconds (RTX 4090, Ryzen 9 7950X). ~17x faster than [Pyannote 3.1](https://huggingface.co/pyannote/speaker-diarization-3.1).
77

8-
On Apple M3, 1 hour in 15 seconds (~22x faster).
8+
On Apple M3, 1 hour in 7.7 seconds (~42x faster).
99

1010
The pipeline achieves a best score of 10.5% DER on VoxConverse, 9.3% on AISHELL-4, and 24.9% on AMI (IHM/SDM).
1111

@@ -111,7 +111,7 @@ During the embeddings generation phase, for example, while the actual model infe
111111
<br><br>
112112
Therefore, for optimal performance, pair a fast GPU with a fast CPU. The CPU bottleneck becomes more noticeable with very fast GPUs (ex. RTX 4090) where the GPU can execute the batch preparation and inference faster than the CPU can orchestrate/dispatch these operations.
113113
<br><br>
114-
As for Mac, by default, the only part of the pipeline that doesn't run on the CPU is the embeddings gen phase, which runs on the ANE (Apple Neural Engine) through CoreML. All other parts run on the CPU. You <i>can</i> get VAD running on the GPU by setting <code>vad="pyannote"</code> in the <code>Diarizer</code> object instantiation. However, Pyannote VAD only runs fast on <code>cuda</code>, not on Mac GPUs. Therefore it is best to leave <code>vad="silero"</code> when on Mac, which is the default.
114+
As for Mac, both the VAD and embeddings gen phases run on the ANE (Apple Neural Engine) & CPU through CoreML. The fbank stage and clustering run purely on the CPU.
115115
</details>
116116
<details>
117117
<summary>Known limitations?</summary>

THIRD_PARTY_LICENSES

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ The following components are licensed under the Apache 2.0 License:
2626
- See individual source files for specific copyright holders
2727
Source: https://github.com/kaldi-asr/kaldi
2828

29+
- FluidAudio
30+
Copyright 2025 Fluid Inference
31+
Source: https://github.com/FluidInference/FluidAudio
32+
2933
License text:
3034

3135
Apache License

evaluation/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ A dataset of conversational speech from YouTube videos. Primarily English, with
1111
| Device | VAD | Clustering Location | Global DER | Global RTF | System |
1212
|:--------:|:-----:|:-------------------:|:------------:|:------------:|:------------:|
1313
| `cuda` | pyannote | CPU | 10.5% | 0.0021401 | RTX 5090 + Ryzen 9 9950X |
14-
| `coreml` | silero | CPU | 11.0% | 0.0041023 | Apple M3 |
14+
| `coreml` | pyannote | CPU | 10.8% | 0.0020203 | Apple M3 |
1515
| `cuda` | pyannote | GPU | 14.5% | 0.0015595 | RTX 5090 + Ryzen 9 9950X |
1616

1717
</center>
@@ -25,7 +25,7 @@ A dataset of meeting recordings in Mandarin Chinese.
2525
|:--------:|:-----:|:-------------------:|:------------:|:------------:|:------------:|
2626
| `cuda` | pyannote | GPU | 9.3% | 0.0015444 | RTX 5090 + Ryzen 9 9950X |
2727
| `cuda` | pyannote | CPU | 9.4% | 0.0034435 | RTX 5090 + Ryzen 9 9950X |
28-
| `coreml` | silero | CPU | 10.7% | 0.0043948 | Apple M3 |
28+
| `coreml` | pyannote | CPU | 9.5% | 0.0036052 | Apple M3 |
2929

3030
</center>
3131

@@ -40,7 +40,7 @@ A dataset of meeting recordings in English, with participants recorded using hea
4040
|:--------:|:-----:|:-------------------:|:------------:|:------------:|:------------:|
4141
| `cuda` | pyannote | GPU | 24.9% | 0.0014214 | RTX 5090 + Ryzen 9 9950X |
4242
| `cuda` | pyannote | CPU | 24.9% | 0.0028280 | RTX 5090 + Ryzen 9 9950X |
43-
| `coreml` | silero | CPU | 26.2% | 0.0042680 | Apple M3 |
43+
| `coreml` | pyannote | CPU | 25.2% | 0.0030760 | Apple M3 |
4444

4545
</center>
4646

@@ -52,6 +52,6 @@ A dataset of meeting recordings in English, with participants recorded using hea
5252
|:--------:|:-----:|:-------------------:|:------------:|:------------:|:------------:|
5353
| `cuda` | pyannote | GPU | 24.9% | 0.0014103 | RTX 5090 + Ryzen 9 9950X |
5454
| `cuda` | pyannote | CPU | 24.9% | 0.0028629 | RTX 5090 + Ryzen 9 9950X |
55-
| `coreml` | silero | CPU | 33.3% | 0.0040706 | Apple M3 |
55+
| `coreml` | pyannote | CPU | 30.7% | 0.0029834 | Apple M3 |
5656

5757
</center>

evaluation/eval.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,7 @@ def main():
320320
parser.add_argument('--device', choices=['auto', 'cuda', 'coreml', 'cpu'], default='auto',
321321
help='Device for Senko processing')
322322
parser.add_argument('--vad', choices=['auto', 'pyannote', 'silero'], default='auto',
323-
help='VAD system to use (auto=pyannote for CUDA, silero otherwise)')
323+
help='VAD to use')
324324
parser.add_argument('--clustering', choices=['auto', 'gpu', 'cpu'], default='auto',
325325
help='Clustering location (auto=gpu for CUDA compute >=7.0, cpu otherwise)')
326326
parser.add_argument('--results_dir', type=Path, default='./senko_evaluation_results',

models/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Model Links
22
- [Pyannote segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
3+
- [Pyannote segmentation-3.0 CoreML version](https://huggingface.co/FluidInference/speaker-diarization-coreml)
34
- [CAM++](https://modelscope.cn/models/iic/speech_campplus_sv_zh_en_16k-common_advanced)
45
- CAM++ CoreML version: see `../tracing/coreml`
56
- CAM++ CUDA TorchScript JIT-traced version: see `../tracing/cuda`
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"fileFormatVersion": "1.0.0",
3+
"itemInfoEntries": {
4+
"10905DEA-14C4-4986-A29B-BF63B7ABE1C1": {
5+
"author": "com.apple.CoreML",
6+
"description": "CoreML Model Specification",
7+
"name": "model.mlmodel",
8+
"path": "com.apple.CoreML/model.mlmodel"
9+
},
10+
"167F767A-9FC7-4CE9-B70F-6E5D91BE2245": {
11+
"author": "com.apple.CoreML",
12+
"description": "CoreML Model Weights",
13+
"name": "weights",
14+
"path": "com.apple.CoreML/weights"
15+
}
16+
},
17+
"rootModelIdentifier": "10905DEA-14C4-4986-A29B-BF63B7ABE1C1"
18+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"fileFormatVersion": "1.0.0", "itemInfoEntries": {"model.mil": {"author": "pyannote", "description": "Segmentation model"}}, "rootModelIdentifier": "model.mil"}

0 commit comments

Comments
 (0)