Fix CI workflows, compiler flags, and flaky tests#1415
Merged
Conversation
f4b3bd9 to
1872937
Compare
lalitb
reviewed
Mar 21, 2026
lalitb
reviewed
Mar 21, 2026
lalitb
reviewed
Mar 21, 2026
This was referenced Mar 21, 2026
117493b to
710a531
Compare
ThomsonTan
reviewed
Mar 31, 2026
7f4e119 to
79342fe
Compare
ThomsonTan
reviewed
Apr 16, 2026
ThomsonTan
reviewed
Apr 16, 2026
ThomsonTan
reviewed
Apr 16, 2026
ebdf617 to
c917965
Compare
ThomsonTan
reviewed
Apr 20, 2026
lalitb
reviewed
Apr 22, 2026
lalitb
reviewed
Apr 22, 2026
Contributor
|
@bmehta001 - The CI fixes are important and also more safe, but this PR also changes some runtime behavior in the HTTP client, WorkerThread, TPM, Logger, and HttpResponseDecoder. Can we keep this PR scoped to the CI/build/test fixes, and move the runtime race/shutdown fixes to a separate PR? If any of these runtime changes are needed to unblock CI, can you call that out with the specific failure they address and how it was validated? |
ThomsonTan
reviewed
Apr 27, 2026
- Update Actions to latest versions (checkout@v4, setup-java@v5, etc.) - Add concurrency groups scoped to PR events only (push builds on main always run to completion) - Add 30-minute timeout to iOS CI to prevent deadlock hangs - Fix iOS simulator UUID resolution for unambiguous destination - Fix Android SDK path detection in CodeQL workflow - Document python3 requirement for iOS tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Split GCC/Clang warning flags (Clang-only flags were breaking GCC) - Add -fno-finite-math-only to restore IEEE 754 compliance needed by sqlite3 when -ffast-math is enabled - Remove global -Wno-reorder, scope to Sanitizer.cpp only (submodule declares members in wrong order) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Resolve simulator UUID for unambiguous xcodebuild destination - Add python3 dependency documentation - Update iOS getting-started docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use unique phase2 DB paths instead of sleep to avoid SQLite vnode collisions from deferred sqlite3_close_v2 cleanup - Clean SQLite journal files (-wal, -shm, -journal) in CleanStorage - Fix cross-test config contamination (reset storage settings) - Use sleep(10) instead of yield() in waitForEvents to reduce CPU contention on single-core simulator runners - Increase timeouts for iOS simulator environment - Fix multi-batch upload flakiness with SetTransmitProfile(RealTime) - Use 127.0.0.1 instead of localhost for local test server Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1830e42 to
defb45b
Compare
Contributor
Author
|
@lalitb - Yes, will remove all functional changes to put in another PR, even if a few tests fail in the CI on this branch, just to keep things simple. |
lalitb
approved these changes
Apr 29, 2026
Contributor
lalitb
left a comment
There was a problem hiding this comment.
Thanks @bmehta001 - Nice work, Appreciate making the CI green!
bmehta001
added a commit
to bmehta001/cpp_client_telemetry
that referenced
this pull request
May 14, 2026
PR microsoft#1416 inadvertently regressed PR microsoft#1415 by removing -fno-finite-math-only from the three Unix REL_FLAGS branches and silencing the resulting Clang warning via -Wno-nan-infinity-disabled. That left -ffast-math implying -ffinite-math-only on Clang/AppleClang, which folds std::isnan / std::isinf to false in nlohmann::json (lib/include/mat/json.hpp) and breaks the IEEE 754 invariants that SQLite relies on for sqlite3_bind_double / sqlite3IsNaN. Rather than restore -fno-finite-math-only, address the root cause by removing -ffast-math entirely from all three Unix REL_FLAGS branches (GCC, AppleClang, generic Clang). For a telemetry library: - The hot paths are string concatenation, Bond/JSON serialization, HTTP I/O, and SQLite reads/writes - there is essentially no floating-point arithmetic for -ffast-math to optimize. - Defining __FAST_MATH__ causes some libc++ / glibc headers to shortcut isnan / isinf even with -fno-finite-math-only set, silently corrupting NaN handling in nlohmann::json output. - On x86 Linux, -ffast-math links crtfastmath.o which sets MXCSR FTZ/DAZ bits process-wide at startup, leaking flush-to-zero behavior into client applications that link this SDK - a contract violation for any consumer doing scientific compute or audio DSP. - SQLite source explicitly recommends against -ffast-math. This subsumes the partial mitigation from microsoft#1415 (-fno-finite-math-only is no longer needed once -ffast-math itself is gone) and aligns with the intent of the long-standing draft PR microsoft#1392 by ThomsonTan, extended to the GCC branch as well (GCC's -ffast-math has the same finite-math-only and crtfastmath.o behavior on x86 Linux). Also drop -Wno-nan-infinity-disabled from the Clang WARN_FLAGS - it only existed to silence the warning that flagged this exact bug, and is unnecessary once -ffast-math is no longer pulling in -ffinite-math-only. Validation: - cmake configure + Release build of UnitTests on macOS arm64 (AppleClang 21): no warnings, no -Wnan-infinity-disabled hits. - ./out/tests/unittests/UnitTests: 486/486 pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bmehta001
added a commit
to bmehta001/cpp_client_telemetry
that referenced
this pull request
May 15, 2026
PR microsoft#1416 inadvertently regressed PR microsoft#1415 by removing -fno-finite-math-only from the three Unix REL_FLAGS branches and adding -Wno-nan-infinity-disabled to silence the Clang diagnostic. That left release builds using -ffast-math without preserving the NaN/Inf semantics needed by nlohmann::json and SQLite paths. Remove -ffast-math entirely from the GCC, AppleClang, and generic Clang release flags rather than relying on -fno-finite-math-only to partially undo it. This SDK is not floating-point compute-bound; its hot paths are string, Bond/JSON serialization, HTTP I/O, and SQLite reads/writes. Avoiding -ffast-math: - preserves std::isnan/std::isinf behavior for JSON and storage code, - avoids compiler/runtime fast-math side effects such as x86 GCC's crtfastmath.o changing MXCSR FTZ/DAZ behavior process-wide, and - aligns with SQLite's guidance to avoid fast-math. This subsumes microsoft#1415's partial mitigation and aligns with microsoft#1392's intent, extended to GCC because GCC fast-math has the same broad assumptions and runtime side effects. Also remove -Wno-nan-infinity-disabled because the warning should not be suppressed once the cause is gone. Validation: - CMake Release build of UnitTests on macOS arm64 (AppleClang 21). - UnitTests passed on macOS arm64. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CI/build/test fixes only — no runtime behavior changes. Runtime fixes (data races, memory leak, shutdown safety) are in #1429.
Modernize CI workflows (11 files)
cancel-in-progressscoped topull_requestevents only — push builds onmainalways run to completion so a rapid second push can't hide a broken committimeout-minutesto iOS CI to prevent deadlock hangs (theCancelAllRequestsspin loop can hang indefinitely if an NSURLSession completion handler never fires on the simulator)xcrun simctl listto avoid ambiguous-destinationmatchingFix compiler flags (2 files)
-Wno-unknown-warning-optionis Clang-only and was breaking GCC builds-fno-finite-math-onlyto all non-MSVC release flags — restores IEEE 754 compliance (INFINITYmacro) needed by sqlite3 and tests when-ffast-mathis enabled-Wno-reorder, scope toSanitizer.cpponly viaset_source_files_properties— the only file that triggers the warning is in thelib/modulessubmodule whereSanitizer.hppdeclares members in a different order than the constructor init listImprove iOS test script and documentation (3 files)
build-tests-ios.shto dynamically resolve simulator UUID for unambiguous xcodebuild destinationdocs/cpp-start-ios.mdwith current build instructionsFix flaky functional tests (6 files)
APITest.cpp): replacePAL::sleep(500)with unique-per-invocation DB paths using an atomic counter. The SDK closes SQLite viasqlite3_close_v2()which defers file-descriptor cleanup — reusing a fixed path andstd::remove()-ing files with deferred fds still open caused iOSvnode unlinked while in useerrorsAPITest.cpp,AISendTests.cpp): remove-wal,-shm,-journalfiles inCleanStorage()to prevent cross-test contaminationBasicFuncTests.cpp): resetCFG_INT_CACHE_FILE_SIZEandCFG_INT_STORAGE_FULL_PCTin testInitialize()—LogManager::GetLogConfiguration()returns a global singleton, so tests that modify config fields contaminate subsequent testsBasicFuncTests.cpp): reduces CPU contention on single-core iOS simulator runners and gives the network stack time to deliverBasicFuncTests.cpp): addSetTransmitProfile(TransmitProfile_RealTime)beforeUploadNow()to set all timer delays to 0 — prevents events from being held back in lower-priority batchesBasicFuncTests.cpp): replacelocalhostwith127.0.0.1for the local test HTTP server — avoids DNS resolution failures on iOS simulatorwaitForEventstimeouts across tests to accommodate slow CI runnersMultipleLogManagersTests.cpp): fix data race on shared event counter