Skip to content

Commit bd1f48d

Browse files
authored
Increase FLEURS to run on all 25 languages and HF download retries (#158)
### Why is this change needed? <!-- Explain the motivation for this change. What problem does it solve? --> We need to increase coverage to properly test all the languages, previously the files were failing to download because of HF limits, updating the download utils with a fallback with ENV tokens help here. The full benchmark is still running, I will update the benchmarks.md once its done I have a PR to improve things for #128 but want to run FLEURS e2e first
1 parent e524cc4 commit bd1f48d

File tree

4 files changed

+357
-75
lines changed

4 files changed

+357
-75
lines changed

Documentation/Benchmarks.md

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,46 @@
77
https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml
88

99
```bash
10-
swift run fluidaudio fleurs-benchmark --languages en_us,it_it,es_419,fr_fr,de_de,ru_ru,uk_ua --samples all
10+
swift run fluidaudio fleurs-benchmark --languages all --samples all
1111
```
1212

1313
```text
14-
[01:58:26.666] [INFO] [FLEURSBenchmark] ================================================================================
15-
[01:58:26.666] [INFO] [FLEURSBenchmark] FLEURS BENCHMARK SUMMARY
16-
[01:58:26.666] [INFO] [FLEURSBenchmark] ================================================================================
17-
[01:58:26.666] [INFO] [FLEURSBenchmark]
18-
[01:58:26.666] [INFO] [FLEURSBenchmark] Language | WER% | CER% | RTFx | Duration | Processed | Skipped
19-
[01:58:26.666] [INFO] [FLEURSBenchmark] -----------------------------------------------------------------------------------------
20-
[01:58:26.666] [INFO] [FLEURSBenchmark] English (US) | 5.7 | 2.8 | 197.8 | 3442.9s | 350 | -
21-
[01:58:26.666] [INFO] [FLEURSBenchmark] French (France) | 6.3 | 3.0 | 191.3 | 560.8s | 52 | 298
22-
[01:58:26.667] [INFO] [FLEURSBenchmark] German (Germany) | 3.1 | 1.2 | 216.7 | 62.1s | 5 | -
23-
[01:58:26.667] [INFO] [FLEURSBenchmark] Italian (Italy) | 4.3 | 2.0 | 213.5 | 743.3s | 50 | -
24-
[01:58:26.667] [INFO] [FLEURSBenchmark] Russian (Russia) | 7.8 | 2.8 | 186.3 | 621.2s | 50 | -
25-
[01:58:26.667] [INFO] [FLEURSBenchmark] Spanish (Spain) | 5.6 | 2.7 | 214.6 | 586.9s | 50 | -
26-
[01:58:26.667] [INFO] [FLEURSBenchmark] Ukrainian (Ukraine) | 7.2 | 2.1 | 192.8 | 528.2s | 50 | -
27-
[01:58:26.667] [INFO] [FLEURSBenchmark] -----------------------------------------------------------------------------------------
28-
[01:58:26.667] [INFO] [FLEURSBenchmark] AVERAGE | 5.7 | 2.4 | 201.9 | 6545.5s | 607 | 298
14+
[17:19:31.944] [INFO] [FluidAudio.FLEURSBenchmark] ----------------------------------------
15+
[17:19:31.944] [INFO] [FluidAudio.FLEURSBenchmark] ================================================================================
16+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Results saved to fleurs_benchmark_results.json
17+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] ================================================================================
18+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] FLEURS BENCHMARK SUMMARY
19+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] ================================================================================
20+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark]
21+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Language | WER% | CER% | RTFx | Duration | Processed | Skipped
22+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] -----------------------------------------------------------------------------------------
23+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Bulgarian (Bulgaria) | 12.9 | 4.1 | 187.5 | 3468.0s | 350 | -
24+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Croatian (Croatia) | 14.2 | 4.3 | 197.1 | 3647.0s | 350 | -
25+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Czech (Czechia) | 12.5 | 4.1 | 205.7 | 4247.4s | 350 | -
26+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Danish (Denmark) | 20.6 | 7.7 | 206.1 | 10579.1s | 930 | -
27+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Dutch (Netherlands) | 7.9 | 2.7 | 184.9 | 3337.7s | 350 | -
28+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] English (US) | 5.7 | 2.8 | 200.8 | 3442.9s | 350 | -
29+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Estonian (Estonia) | 20.5 | 4.4 | 215.4 | 10825.4s | 893 | -
30+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Finnish (Finland) | 15.5 | 3.5 | 211.9 | 11894.4s | 918 | -
31+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] French (France) | 6.3 | 2.6 | 192.0 | 3667.3s | 350 | -
32+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] German (Germany) | 7.1 | 2.8 | 206.6 | 4684.6s | 350 | -
33+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Greek (Greece) | 37.1 | 13.8 | 175.2 | 6862.0s | 650 | -
34+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Hungarian (Hungary) | 18.1 | 5.4 | 203.9 | 11050.9s | 905 | -
35+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Italian (Italy) | 4.8 | 1.9 | 222.6 | 5098.7s | 350 | -
36+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Latvian (Latvia) | 27.9 | 7.8 | 208.6 | 10218.6s | 851 | -
37+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Lithuanian (Lithuania) | 25.3 | 7.0 | 193.7 | 10686.5s | 986 | -
38+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Maltese (Malta) | 25.6 | 9.7 | 206.8 | 12770.6s | 926 | -
39+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Polish (Poland) | 8.7 | 2.9 | 183.8 | 3409.6s | 350 | -
40+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Romanian (Romania) | 14.7 | 4.8 | 192.3 | 9099.4s | 883 | -
41+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Russian (Russia) | 7.4 | 2.4 | 199.6 | 3974.6s | 350 | -
42+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Slovak (Slovakia) | 13.0 | 4.5 | 217.8 | 4169.6s | 350 | -
43+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Slovenian (Slovenia) | 27.6 | 9.4 | 189.1 | 8173.1s | 834 | -
44+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Spanish (Spain) | 6.5 | 3.9 | 214.2 | 4258.9s | 350 | -
45+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Swedish (Sweden) | 17.0 | 5.2 | 211.7 | 8399.2s | 759 | -
46+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] Ukrainian (Ukraine) | 7.4 | 2.5 | 195.4 | 3853.7s | 350 | -
47+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] -----------------------------------------------------------------------------------------
48+
[17:19:31.945] [INFO] [FluidAudio.FLEURSBenchmark] AVERAGE | 15.2 | 5.0 | 200.9 | 161819.2 | 14085 | -
49+
[17:19:31.954] [INFO] [FluidAudio.Main] Peak memory usage (process-wide): 0.487 GB
2950
```
3051

3152
```text

Sources/FluidAudio/Diarizer/Offline/Segmentation/OfflineSegmentationProcessor.swift

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,6 @@ struct OfflineSegmentationProcessor {
6161
var frameDuration: Double = 0
6262
var numFrames = 0
6363
let speakerCount = 3
64-
let speakerClassIndices: [[Int]] = (0..<speakerCount).map { speaker in
65-
powerset.enumerated().compactMap { index, combination in
66-
combination.contains(speaker) ? index : nil
67-
}
68-
}
6964

7065
// Pre-compute flat mapping matrix for vectorized speaker activation
7166
// Matrix[speaker][class] = 1.0 if speaker in powerset[class], else 0.0

Sources/FluidAudio/DownloadUtils.swift

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,168 @@ public class DownloadUtils {
1818
return URLSession(configuration: configuration)
1919
}()
2020

21+
private static let huggingFaceUserAgent = "FluidAudio/1.0 (HuggingFaceDownloader)"
22+
23+
public enum HuggingFaceDownloadError: LocalizedError {
24+
case invalidResponse
25+
case rateLimited(statusCode: Int, message: String)
26+
case unexpectedContent(statusCode: Int, mimeType: String?, snippet: String)
27+
28+
public var errorDescription: String? {
29+
switch self {
30+
case .invalidResponse:
31+
return "Received an invalid response from Hugging Face."
32+
case .rateLimited(_, let message):
33+
return "Hugging Face rate limit encountered: \(message)"
34+
case .unexpectedContent(_, let mimeType, let snippet):
35+
let mimeInfo = mimeType ?? "unknown MIME type"
36+
return "Unexpected Hugging Face content (\(mimeInfo)): \(snippet)"
37+
}
38+
}
39+
}
40+
41+
private static func huggingFaceToken() -> String? {
42+
let env = ProcessInfo.processInfo.environment
43+
return env["HF_TOKEN"]
44+
?? env["HUGGINGFACEHUB_API_TOKEN"]
45+
?? env["HUGGING_FACE_HUB_TOKEN"]
46+
}
47+
48+
private static func isLikelyHtml(_ data: Data) -> Bool {
49+
guard !data.isEmpty,
50+
let prefix = String(data: data.prefix(128), encoding: .utf8)?
51+
.trimmingCharacters(in: .whitespacesAndNewlines)
52+
.lowercased()
53+
else {
54+
return false
55+
}
56+
57+
return prefix.hasPrefix("<!doctype html") || prefix.hasPrefix("<html")
58+
}
59+
60+
private static func makeHuggingFaceRequest(for url: URL) -> URLRequest {
61+
var request = URLRequest(url: url)
62+
request.httpMethod = "GET"
63+
request.setValue(huggingFaceUserAgent, forHTTPHeaderField: "User-Agent")
64+
request.setValue("application/octet-stream", forHTTPHeaderField: "Accept")
65+
request.timeoutInterval = DownloadConfig.default.timeout
66+
67+
if let token = huggingFaceToken() {
68+
request.setValue("Bearer \(token)", forHTTPHeaderField: "Authorization")
69+
}
70+
71+
return request
72+
}
73+
74+
public static func fetchHuggingFaceFile(
75+
from url: URL,
76+
description: String,
77+
maxAttempts: Int = 4,
78+
minBackoff: TimeInterval = 1.0
79+
) async throws -> Data {
80+
var lastError: Error?
81+
82+
for attempt in 1...maxAttempts {
83+
do {
84+
let request = makeHuggingFaceRequest(for: url)
85+
let (data, response) = try await sharedSession.data(for: request)
86+
87+
guard let httpResponse = response as? HTTPURLResponse else {
88+
throw HuggingFaceDownloadError.invalidResponse
89+
}
90+
91+
if httpResponse.statusCode == 429 || httpResponse.statusCode == 503 {
92+
let message = "HTTP \(httpResponse.statusCode)"
93+
throw HuggingFaceDownloadError.rateLimited(
94+
statusCode: httpResponse.statusCode, message: message)
95+
}
96+
97+
if let mimeType = httpResponse.mimeType?.lowercased(),
98+
mimeType == "text/html"
99+
{
100+
let snippet = String(data: data.prefix(256), encoding: .utf8) ?? ""
101+
throw HuggingFaceDownloadError.unexpectedContent(
102+
statusCode: httpResponse.statusCode,
103+
mimeType: mimeType,
104+
snippet: snippet
105+
)
106+
}
107+
108+
if isLikelyHtml(data) {
109+
let snippet = String(data: data.prefix(256), encoding: .utf8) ?? ""
110+
throw HuggingFaceDownloadError.unexpectedContent(
111+
statusCode: httpResponse.statusCode,
112+
mimeType: httpResponse.mimeType,
113+
snippet: snippet
114+
)
115+
}
116+
117+
return data
118+
119+
} catch let error as HuggingFaceDownloadError {
120+
lastError = error
121+
122+
if attempt == maxAttempts {
123+
break
124+
}
125+
126+
let backoffSeconds = pow(2.0, Double(attempt - 1)) * minBackoff
127+
let backoffNanoseconds = UInt64(backoffSeconds * 1_000_000_000)
128+
let formattedBackoff = String(format: "%.1f", backoffSeconds)
129+
130+
switch error {
131+
case .rateLimited(let statusCode, _):
132+
if huggingFaceToken() == nil {
133+
logger.warning(
134+
"Rate limit (HTTP \(statusCode)) while downloading \(description). "
135+
+ "Set HF_TOKEN or HUGGINGFACEHUB_API_TOKEN for higher limits. "
136+
+ "Retrying in \(formattedBackoff)s."
137+
)
138+
} else {
139+
logger.warning(
140+
"Rate limit (HTTP \(statusCode)) while downloading \(description). "
141+
+ "Retrying in \(formattedBackoff)s."
142+
)
143+
}
144+
case .unexpectedContent(_, _, let snippet):
145+
logger.warning(
146+
"Unexpected content while downloading \(description). "
147+
+ "Snippet: \(snippet.prefix(100)). "
148+
+ "Retrying in \(formattedBackoff)s."
149+
)
150+
case .invalidResponse:
151+
logger.warning(
152+
"Invalid response while downloading \(description). "
153+
+ "Retrying in \(formattedBackoff)s."
154+
)
155+
}
156+
157+
try await Task.sleep(nanoseconds: backoffNanoseconds)
158+
159+
} catch {
160+
lastError = error
161+
162+
if attempt == maxAttempts {
163+
break
164+
}
165+
166+
let backoffSeconds = pow(2.0, Double(attempt - 1)) * minBackoff
167+
let backoffNanoseconds = UInt64(backoffSeconds * 1_000_000_000)
168+
let formattedBackoff = String(format: "%.1f", backoffSeconds)
169+
170+
logger.warning(
171+
"Download attempt \(attempt) for \(description) failed: "
172+
+ "\(error.localizedDescription). "
173+
+ "Retrying in \(formattedBackoff)s."
174+
)
175+
176+
try await Task.sleep(nanoseconds: backoffNanoseconds)
177+
}
178+
}
179+
180+
throw lastError ?? HuggingFaceDownloadError.invalidResponse
181+
}
182+
21183
private static func configureProxySettings() -> [String: Any]? {
22184
#if os(macOS)
23185
var proxyConfig: [String: Any] = [:]

0 commit comments

Comments
 (0)