Skip to content

WIP: Test ccache fix hypothesis on ITK.Linux Azure DevOps#6047

Closed
hjmjohnson wants to merge 2 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:wip-ccache-azure-fix
Closed

WIP: Test ccache fix hypothesis on ITK.Linux Azure DevOps#6047
hjmjohnson wants to merge 2 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:wip-ccache-azure-fix

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

Test whether fixing ccache configuration on Azure DevOps ITK.Linux improves hit rate from ~0% to 95%+.

Hypothesis and changes

Root cause analysis: ITK.Linux Azure DevOps ccache had near-0% hit rate due to:

  1. CCACHE_NODIRECT=1 — disabled fast direct mode (ARM CI without this gets 98.5%)
  2. Missing CCACHE_SLOPPINESS=pch_defines,time_macros__DATE__/__TIME__ macros cause misses
  3. Three jobs (Linux, LinuxLegacyRemoved, LinuxCxx20) shared one cache key but built different configs
  4. ITK_USE_CCACHE=ON + CMAKE_*_COMPILER_LAUNCHER=ccache — potential double-wrapping

Changes to AzurePipelinesLinux.yml:

  • Remove CCACHE_NODIRECT (enable direct mode)
  • Add CCACHE_SLOPPINESS=pch_defines,time_macros
  • Add job name to cache key (ccache-v5)
  • Remove ITK_USE_CCACHE (launcher is sufficient)
  • Minimize build to ITKCommon+IO modules (fast iteration)
  • Single job only to avoid cache collision
  • Add ccache --show-stats --verbose step

All other CI disabled (trigger: none / workflow_dispatch) to isolate the test.

Expected result
  • First run: 0% hit rate (cold cache with new key ccache-v5)
  • Second run (push a trivial change): 95%+ hit rate with fast direct hits
  • Build time on second run should be ~2-3 min vs ~30 min full rebuild

DO NOT MERGE — this PR disables all CI except ITK.Linux. Will be closed after hypothesis is confirmed.

Hypothesis: ITK.Linux Azure ccache has near-0% hit rate because:
1. CCACHE_NODIRECT=1 disables the fast direct mode
2. CCACHE_SLOPPINESS not set (__DATE__/__TIME__ cause misses)
3. Three jobs share one cache key but build different configs
4. ITK_USE_CCACHE=ON conflicts with CMAKE_*_COMPILER_LAUNCHER

Fix applied to AzurePipelinesLinux.yml:
- Remove CCACHE_NODIRECT (enable direct mode)
- Add CCACHE_SLOPPINESS=pch_defines,time_macros
- Add job name to cache key (ccache-v5)
- Remove ITK_USE_CCACHE (launcher is sufficient)
- Minimize build to ITKCommon+IO modules only (fast iteration)
- Keep only 1 job (Linux) to avoid cache key collision
- Add ccache --show-stats --verbose step

All other CI temporarily disabled (trigger: none / workflow_dispatch)
to isolate the hypothesis test. This PR must NOT be merged.
@github-actions github-actions bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct labels Apr 13, 2026
Trivial change to trigger a rebuild against the cache seeded by Run 1.
Expected: 95%+ direct hit rate from ccache --show-stats --verbose.
@hjmjohnson
Copy link
Copy Markdown
Member Author

Hypothesis confirmed: 99.95% ccache hit rate on Run 2 after removing CCACHE_NODIRECT.

Results
Metric Run 1 (cold) Run 2 (warm) Before fix
Hit rate 0% (seeding) 99.95% ~0%
Direct hits 0 2166/2167 0-1
Misses ~2167 1 ~4200+

Root cause: CCACHE_NODIRECT=1 disabled ccache's fast direct mode.

Clean PR with the fix applied to all Azure pipelines incoming.

@hjmjohnson hjmjohnson closed this Apr 13, 2026
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request Apr 13, 2026
Remove CCACHE_NODIRECT=1 which disabled ccache's fast direct mode,
forcing preprocessor-mode hashing on every compilation unit. The ARM
CI (which never had NODIRECT) achieves 98.5% hit rate; Azure DevOps
was at 0.02%.

Additional fixes:
- Add CCACHE_SLOPPINESS=pch_defines,time_macros to prevent
  __DATE__/__TIME__ macros from causing cache misses
- Add job name to cache keys so Linux, LinuxLegacyRemoved, and
  LinuxCxx20 each get their own cache (different build configs
  produce different preprocessor output)
- Remove ITK_USE_CCACHE:BOOL=ON from the Linux job — redundant
  with CMAKE_C/CXX_COMPILER_LAUNCHER=ccache
- Add ccache --show-stats step to all three jobs for monitoring
- Add ccache --zero-stats before build for accurate per-run stats

Tested in PR InsightSoftwareConsortium#6047: Run 2 achieved 99.95% hit rate (2166/2167
direct hits, 1 miss) after these fixes.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request Apr 13, 2026
Remove CCACHE_NODIRECT=1 which disabled ccache's fast direct mode,
forcing preprocessor-mode hashing on every compilation unit. The ARM
CI (which never had NODIRECT) achieves 98.5% hit rate; Azure DevOps
was at 0.02%.

Applied to all 7 Azure DevOps pipelines:
- AzurePipelinesLinux.yml (3 jobs: Linux, LinuxLegacyRemoved, LinuxCxx20)
- AzurePipelinesLinuxPython.yml
- AzurePipelinesMacOS.yml
- AzurePipelinesMacOSPython.yml
- AzurePipelinesWindows.yml
- AzurePipelinesWindowsPython.yml
- AzurePipelinesBatch.yml

Changes per pipeline:
- Remove CCACHE_NODIRECT=1 (enable direct mode)
- Add CCACHE_SLOPPINESS=pch_defines,time_macros
- Add per-job name to cache key (ccache-v4 | OS | JobName | SHA)
- Remove ITK_USE_CCACHE:BOOL=ON (redundant with CMAKE_*_LAUNCHER)
- Add ccache --zero-stats + --show-config maintenance step
- Add ccache --show-stats step after build for monitoring

Tested in PR InsightSoftwareConsortium#6047: Run 2 achieved 99.95% hit rate (2166/2167
direct hits) after these fixes on the Linux pipeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant