Skip to content

fix(install): unstick install gate + clean download log noise#547

Merged
rainxchzed merged 2 commits into
mainfrom
fix/install-gate-diagnostics
May 8, 2026
Merged

fix(install): unstick install gate + clean download log noise#547
rainxchzed merged 2 commits into
mainfrom
fix/install-gate-diagnostics

Conversation

@rainxchzed
Copy link
Copy Markdown
Member

@rainxchzed rainxchzed commented May 8, 2026

Surfaced from a field report of "downloads complete at 100% but never start installing" when mirror download is on and 2-3 apps are queued.

Root cause (Candidate A — confirmed by code read)

DefaultDownloadOrchestrator.runInstall (the AlwaysInstall path used by Shizuku/Dhizuku) handles the two InstallOutcome values asymmetrically:

  • COMPLETED (silent install succeeded) → markCompleted releases the gate immediately. ✅
  • DELEGATED_TO_SYSTEM (silent install fell back to the AndroidInstaller default-dialog path — SilentInstallerDispatcher.kt:110-112 does this when the Shizuku binder is null, returns non-zero, or throws) → gate is intentionally not released, by design, because the system installer dialog is still up.

The intentional non-release is correct for the dialog stacking case it was designed for. But when the user dismisses the dialog (or the dialog never opens because the fallback failed too), no PACKAGE_REPLACED broadcast arrives, no markCompleted fires, and the gate stays held until the timeout. The other 2-3 apps in the queue are downloaded fully (UI shows 100%), then sit at awaitFreeAndMarkPending waiting on a gate that nothing will release.

Default timeout was 60s. With 3 apps queued behind a stuck gate, that's up to 3×60s of "stuck at 100%" before the queue recovers. Matches the field-report symptom exactly.

Fix

Three small changes; no behaviour change for the happy path:

  1. SystemInstallSerializer.DEFAULT_TIMEOUT_MS 60_000L → 15_000L. Long enough for a normal Shizuku/Dhizuku silent install round-trip, short enough that a stuck gate recovers before the user gives up. Force-claim semantics on timeout are unchanged.

  2. DefaultSystemInstallSerializer.awaitFreeAndMarkPending now logs when it has to wait. Logger.i { "$packageName waiting for $heldBy to clear (timeout 15000ms)" } on entry-while-blocked, and a matching "acquired gate after waiting for $heldBy" on success. Future stuck reproductions show the queue clearly in Logcat.

  3. AndroidDownloader no longer logs CancellationException as an error. The MultiSourceDownloader race cancels the loser racer on every download when a mirror is configured — that's the normal happy path. The previous catch-all catch (e: Exception) was logging every losing racer as Logger.e { "Download failed" }, flooding Logcat with red lines that look like real failures and obscured the actual stuck-install signal in the field report. Cancellation now gets a dedicated catch that cleans up the temp file and rethrows without logging; non-cancellation Exception still logs as before.

Test plan

  • :core:data:compileDebugKotlinAndroid
  • :core:data:compileKotlinJvm
  • Local CodeRabbit review: 0 findings.
  • Field smoke: queue 3 apps with mirror download on. Confirm:
    • Logcat no longer flooded with Download failed for losing racers.
    • If gate ever sticks, Logcat shows the awaiting / acquired pair so the next bug report is actionable.
    • Worst-case stuck recovery is now ~15s per queued app instead of ~60s.

Notes

  • This does NOT change the gate-locked-during-DELEGATED_TO_SYSTEM behaviour — that's still correct for the dialog-stacking case the gate was originally designed for. The timeout shrink is the safety net.
  • Shizuku-fallback-to-default path that originally exposed this is left as-is. A follow-up could add a markCompleted after the fallback intent fires (since the fallback intent IS the system dialog and the gate is held by definition there too — same logic as the inherent DELEGATED_TO_SYSTEM path).

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Download cancellations are now properly handled without generating error logs.
    • System installation timeout reduced from 60 to 15 seconds, addressing instances where installations appeared stuck at 100%.
  • Improvements

    • Enhanced logging for better installation diagnostics and troubleshooting.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

Walkthrough

This PR addresses stuck system installations where packages remain at 100% when the broadcast releasing the installation gate never arrives. It reduces the timeout from 60 to 15 seconds, adds detailed logging to track gate acquisition blocking, and improves cancellation handling in AndroidDownloader to distinguish transient cancellations from actual errors.

Changes

System Install Timeout and Cancellation Handling

Layer / File(s) Summary
Timeout Configuration
core/domain/src/commonMain/kotlin/zed/rainxch/core/domain/system/SystemInstallSerializer.kt
DEFAULT_TIMEOUT_MS reduced from 60,000ms to 15,000ms with comments explaining timeout tuning for stuck-install scenarios.
Gate Acquisition Observability
core/data/src/commonMain/kotlin/zed/rainxch/core/data/services/DefaultSystemInstallSerializer.kt
awaitFreeAndMarkPending now snapshots the currently-held package and logs informational messages when the gate is initially held and when acquisition succeeds after waiting.
Cancellation Exception Handling
core/data/src/androidMain/kotlin/zed/rainxch/core/data/services/AndroidDownloader.kt
download() explicitly catches CancellationException, deletes the per-attempt temp file, and rethrows without logging as an error to preserve structured concurrency semantics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 A timeout that's snappier, logs that shine bright,
Cancellations now handled with structured delight,
No more stuck installations at one-hundred-oh-oh!
The gate logs reveal what we needed to know.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing the stuck install gate by reducing timeout and adding diagnostic logging, plus cleaning up download log noise from CancellationException handling.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/install-gate-diagnostics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/data/src/commonMain/kotlin/zed/rainxch/core/data/services/DefaultSystemInstallSerializer.kt (1)

30-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use initiallyHeldBy instead of pending.value in the timeout warning.

By the time line 32 executes, pending.value may already be null (if the prior holder called markCompleted in the narrow window after the timeout fired but before this log line) or may show a different package (if another force-claim raced in). Either way, the message "timed out waiting for X to clear" would show a misleading or empty holder name. initiallyHeldBy, captured before the timeout block, is the package that was actually blocking this caller when it started waiting.

🛠️ Proposed fix
        if (acquired == null) {
            Logger.w {
-               "SystemInstallSerializer: timed out waiting for ${pending.value} to clear; force-claiming for $packageName"
+               "SystemInstallSerializer: timed out waiting for $initiallyHeldBy to clear; force-claiming for $packageName"
            }
            pending.value = packageName
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@core/data/src/commonMain/kotlin/zed/rainxch/core/data/services/DefaultSystemInstallSerializer.kt`
around lines 30 - 34, The timeout log uses pending.value which can be changed by
races; replace that with the earlier-captured initiallyHeldBy variable in the
Logger.w message inside the branch where acquired == null so the log reads that
it timed out waiting for initiallyHeldBy to clear before force-claiming for
packageName; update the message construction in the
DefaultSystemInstallSerializer's timeout path (where acquired, pending.value,
initiallyHeldBy and packageName are in scope) to reference initiallyHeldBy
instead of pending.value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@core/data/src/commonMain/kotlin/zed/rainxch/core/data/services/DefaultSystemInstallSerializer.kt`:
- Around line 30-34: The timeout log uses pending.value which can be changed by
races; replace that with the earlier-captured initiallyHeldBy variable in the
Logger.w message inside the branch where acquired == null so the log reads that
it timed out waiting for initiallyHeldBy to clear before force-claiming for
packageName; update the message construction in the
DefaultSystemInstallSerializer's timeout path (where acquired, pending.value,
initiallyHeldBy and packageName are in scope) to reference initiallyHeldBy
instead of pending.value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7bd0156a-1723-4197-827c-fc2d1ab2c6c3

📥 Commits

Reviewing files that changed from the base of the PR and between cc8c639 and 0c93b23.

📒 Files selected for processing (3)
  • core/data/src/androidMain/kotlin/zed/rainxch/core/data/services/AndroidDownloader.kt
  • core/data/src/commonMain/kotlin/zed/rainxch/core/data/services/DefaultSystemInstallSerializer.kt
  • core/domain/src/commonMain/kotlin/zed/rainxch/core/domain/system/SystemInstallSerializer.kt

@rainxchzed rainxchzed merged commit a6d2be3 into main May 8, 2026
1 check passed
@rainxchzed rainxchzed deleted the fix/install-gate-diagnostics branch May 8, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant