Commit 0f7493b
feat: Support Parakeet-TDT-CTC-110M hybrid model (#433)
## Summary
Adds support for NVIDIA's Parakeet-TDT-CTC-110M hybrid model with fused
preprocessor+encoder architecture.
Based on the work by @JarbasAl in #383.
## Key Changes
### Model Architecture
- **Fused preprocessor+encoder**: No separate Encoder.mlmodelc file
- **Smaller dimensions**: encoderHidden=512, vocabSize=1024, single LSTM
layer
- **Array-format vocabulary**: vocab.json instead of dict format
- **BlankId**: 1024 (same as v2)
### Code Modifications
- **AsrModels**: Optional encoder support, fused frontend loading, array
vocab handling
- **AsrManager**: Version-aware decoder state shapes, fused frontend
availability checking
- **AsrTranscription**: Skip encoder step when preprocessor output is
fused
- **TdtDecoderState**: Parameterized LSTM layer count
- **TdtDecoderV3**: Use config.encoderHiddenSize instead of
auto-detection
- **EncoderFrameView**: Accept explicit hidden size parameter
- **TranscribeCommand**: New `--model-version tdt-ctc-110m` and
`--model-dir` flags
- **ModelNames**: parakeetTdtCtc110m repo reference
### CLI Usage
```bash
swift run fluidaudiocli transcribe audio.wav --model-version tdt-ctc-110m
swift run fluidaudiocli transcribe audio.wav --model-version tdt-ctc-110m --model-dir /path/to/custom/models
```
## Testing
- [ ] iOS compatibility testing (per concerns in #383)
- [ ] Benchmark performance documentation
- [ ] Verify fused model behavior on both macOS and iOS
## Related
- Closes #383
- Model repo:
[FluidInference/parakeet-tdt-ctc-110m-coreml](https://huggingface.co/FluidInference/parakeet-tdt-ctc-110m-coreml)
<img width="642" height="1389" alt="IMG_5033"
src="https://github.com/user-attachments/assets/a9105cf7-552b-4573-acfb-2a089bf52820"
/><!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/433"
target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
</picture>
</a>
<!-- devin-review-badge-end -->
---------
Co-authored-by: miro <jarbasai@mailfence.com>1 parent 0346057 commit 0f7493b
File tree
17 files changed
+1005
-83
lines changed- Documentation
- ASR
- Sources
- FluidAudioCLI/Commands/ASR
- FluidAudio
- ASR
- TDT
- Tests/FluidAudioTests/ASR
17 files changed
+1005
-83
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
27 | 33 | | |
28 | 34 | | |
29 | 35 | | |
| |||
88 | 94 | | |
89 | 95 | | |
90 | 96 | | |
91 | | - | |
92 | | - | |
| 97 | + | |
| 98 | + | |
93 | 99 | | |
94 | 100 | | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
95 | 105 | | |
96 | 106 | | |
97 | | - | |
98 | | - | |
99 | 107 | | |
100 | 108 | | |
101 | 109 | | |
| |||
110 | 118 | | |
111 | 119 | | |
112 | 120 | | |
113 | | - | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
114 | 125 | | |
115 | 126 | | |
116 | 127 | | |
| |||
293 | 304 | | |
294 | 305 | | |
295 | 306 | | |
296 | | - | |
297 | | - | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
298 | 311 | | |
299 | 312 | | |
300 | 313 | | |
301 | 314 | | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
302 | 318 | | |
303 | 319 | | |
304 | 320 | | |
305 | 321 | | |
306 | 322 | | |
307 | | - | |
308 | | - | |
| 323 | + | |
| 324 | + | |
309 | 325 | | |
310 | 326 | | |
311 | 327 | | |
| |||
326 | 342 | | |
327 | 343 | | |
328 | 344 | | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
329 | 361 | | |
330 | | - | |
331 | | - | |
| 362 | + | |
| 363 | + | |
332 | 364 | | |
333 | 365 | | |
334 | 366 | | |
| |||
341 | 373 | | |
342 | 374 | | |
343 | 375 | | |
344 | | - | |
| 376 | + | |
345 | 377 | | |
346 | 378 | | |
347 | 379 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
14 | 49 | | |
15 | 50 | | |
16 | 51 | | |
| |||
20 | 55 | | |
21 | 56 | | |
22 | 57 | | |
23 | | - | |
| 58 | + | |
| 59 | + | |
24 | 60 | | |
25 | 61 | | |
26 | 62 | | |
| |||
31 | 67 | | |
32 | 68 | | |
33 | 69 | | |
34 | | - | |
| 70 | + | |
35 | 71 | | |
36 | 72 | | |
37 | 73 | | |
| |||
48 | 84 | | |
49 | 85 | | |
50 | 86 | | |
| 87 | + | |
51 | 88 | | |
52 | | - | |
| 89 | + | |
53 | 90 | | |
54 | 91 | | |
55 | 92 | | |
| |||
60 | 97 | | |
61 | 98 | | |
62 | 99 | | |
63 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
64 | 109 | | |
65 | 110 | | |
66 | 111 | | |
| |||
78 | 123 | | |
79 | 124 | | |
80 | 125 | | |
81 | | - | |
| 126 | + | |
82 | 127 | | |
83 | 128 | | |
84 | 129 | | |
| |||
118 | 163 | | |
119 | 164 | | |
120 | 165 | | |
121 | | - | |
| 166 | + | |
122 | 167 | | |
123 | 168 | | |
124 | 169 | | |
| |||
138 | 183 | | |
139 | 184 | | |
140 | 185 | | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
145 | 193 | | |
146 | 194 | | |
147 | 195 | | |
| |||
185 | 233 | | |
186 | 234 | | |
187 | 235 | | |
188 | | - | |
| 236 | + | |
189 | 237 | | |
190 | 238 | | |
191 | 239 | | |
192 | | - | |
193 | | - | |
194 | | - | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
195 | 251 | | |
| 252 | + | |
| 253 | + | |
196 | 254 | | |
197 | 255 | | |
198 | 256 | | |
199 | 257 | | |
| 258 | + | |
| 259 | + | |
200 | 260 | | |
201 | 261 | | |
202 | 262 | | |
| |||
324 | 384 | | |
325 | 385 | | |
326 | 386 | | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
334 | 404 | | |
335 | 405 | | |
336 | 406 | | |
| |||
365 | 435 | | |
366 | 436 | | |
367 | 437 | | |
368 | | - | |
| 438 | + | |
| 439 | + | |
369 | 440 | | |
370 | 441 | | |
371 | 442 | | |
| |||
397 | 468 | | |
398 | 469 | | |
399 | 470 | | |
400 | | - | |
| 471 | + | |
401 | 472 | | |
402 | | - | |
403 | 473 | | |
404 | 474 | | |
405 | 475 | | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
406 | 479 | | |
407 | 480 | | |
408 | 481 | | |
| |||
0 commit comments