|
| 1 | +--- |
| 2 | +name: ci-status |
| 3 | +description: > |
| 4 | + Check CI build status and investigate failures for dotnet/android PRs. ALWAYS use this skill when |
| 5 | + the user asks "check CI", "CI status", "why is CI failing", "is CI green", "why is my PR blocked", |
| 6 | + or anything about build status on a PR. Auto-detects the current PR from the git branch when no |
| 7 | + PR number is given. Covers both GitHub checks and internal Azure DevOps builds. |
| 8 | + DO NOT USE FOR: GitHub Actions workflow authoring, non-dotnet/android repos. |
| 9 | +--- |
| 10 | + |
| 11 | +# CI Status |
| 12 | + |
| 13 | +Check CI status and investigate build failures for dotnet/android PRs. |
| 14 | + |
| 15 | +**Key fact:** dotnet/android's primary CI runs on Azure DevOps (internal). GitHub checks alone are insufficient — they may all show ✅ while the internal build is failing. |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +| Tool | Check | Setup | |
| 20 | +|------|-------|-------| |
| 21 | +| `gh` | `gh --version` | https://cli.github.com/ | |
| 22 | +| `az` + devops ext | `az version` | `az extension add --name azure-devops` then `az login` | |
| 23 | + |
| 24 | +If `az` is not authenticated, stop and tell the user to run `az login`. |
| 25 | + |
| 26 | +## Workflow |
| 27 | + |
| 28 | +### Phase 1: Quick Status (always do this first) |
| 29 | + |
| 30 | +#### Step 1 — Resolve the PR and detect fork status |
| 31 | + |
| 32 | +**No PR specified** — detect from current branch: |
| 33 | + |
| 34 | +```bash |
| 35 | +gh pr view --json number,title,url,headRefName,isCrossRepository --jq '{number,title,url,headRefName,isCrossRepository}' |
| 36 | +``` |
| 37 | + |
| 38 | +**PR number given** — use it directly: |
| 39 | + |
| 40 | +```bash |
| 41 | +gh pr view $PR --repo dotnet/android --json number,title,url,headRefName,isCrossRepository --jq '{number,title,url,headRefName,isCrossRepository}' |
| 42 | +``` |
| 43 | + |
| 44 | +If no PR exists for the current branch, tell the user and stop. |
| 45 | + |
| 46 | +**`isCrossRepository`** tells you whether the PR is from a fork: |
| 47 | +- `true` → **fork PR** (external contributor) |
| 48 | +- `false` → **direct PR** (team member, branch in dotnet/android) |
| 49 | + |
| 50 | +This matters for CI behavior: |
| 51 | +- **Fork PRs:** `Xamarin.Android-PR` does NOT run. `dotnet-android` runs the full pipeline including tests. |
| 52 | +- **Direct PRs:** `Xamarin.Android-PR` runs the full test suite. `dotnet-android` skips test stages (build-only) since tests run on DevDiv instead. |
| 53 | + |
| 54 | +Highlight the fork status in the output so the user understands which checks to expect. |
| 55 | + |
| 56 | +#### Step 2 — Get GitHub check status |
| 57 | + |
| 58 | +```bash |
| 59 | +gh pr checks $PR --repo dotnet/android --json "name,state,link,bucket" 2>&1 \ |
| 60 | + | jq '[.[] | {name, state, bucket, link}]' |
| 61 | +``` |
| 62 | + |
| 63 | +```powershell |
| 64 | +gh pr checks $PR --repo dotnet/android --json "name,state,link,bucket" | ConvertFrom-Json |
| 65 | +``` |
| 66 | + |
| 67 | +Note which checks passed/failed/pending. The `link` field contains the AZDO build URL for internal checks. |
| 68 | + |
| 69 | +#### Step 3 — Get Azure DevOps build status (repeat for EACH build) |
| 70 | + |
| 71 | +There are typically **two separate AZDO builds** for a dotnet/android PR. They run **independently** — neither waits for the other: |
| 72 | +- **`dotnet-android`** on `dev.azure.com/dnceng-public` — Defined in `azure-pipelines-public.yaml` with an explicit `pr:` trigger. |
| 73 | + - **Fork PRs:** runs the full pipeline including build + tests (since `Xamarin.Android-PR` won't run for forks). |
| 74 | + - **Direct PRs:** runs **build-only** — test stages are auto-skipped because those run on DevDiv instead. This means the `dotnet-android` build will be significantly shorter for direct PRs. |
| 75 | +- **`Xamarin.Android-PR`** on `devdiv.visualstudio.com` — full test suite, MAUI integration, compliance. Defined in `azure-pipelines.yaml` but its PR trigger is configured in the AZDO UI, not in YAML. |
| 76 | + - **Fork PRs:** does NOT run at all (no access to internal resources). |
| 77 | + - **Direct PRs:** runs the full test matrix. May take a few minutes to start after a push. |
| 78 | + |
| 79 | +Use the **pipeline definition name** (from the `definitionName` field) as the label in output — do NOT label them "Public" or "Internal". |
| 80 | + |
| 81 | +When a check shows **"Expected — Waiting for status to be reported"** on GitHub (typically `Xamarin.Android-PR`): |
| 82 | +- **For direct PRs:** the pipeline hasn't been triggered yet — this is normal, it's not waiting for the other build, just for AZDO to pick it up. Report it as: "⏳ Not triggered yet — typically starts within a few minutes of a push." |
| 83 | +- **For fork PRs:** `Xamarin.Android-PR` will NOT run. Report: "⏳ Will not run — fork PRs don't trigger the internal pipeline." |
| 84 | + |
| 85 | +Extract AZDO build URLs from the check `link` fields. Parse `{orgUrl}`, `{project}`, and `{buildId}` from patterns: |
| 86 | +- `https://dev.azure.com/{org}/{project}/_build/results?buildId={id}` |
| 87 | +- `https://{org}.visualstudio.com/{project}/_build/results?buildId={id}` |
| 88 | + |
| 89 | +**Run Steps 3, 3a, and 3b for each AZDO build independently.** The builds have different pipelines, different job counts, and different typical durations — each gets its own progress and ETA. |
| 90 | + |
| 91 | +For each build, first get the overall status including start time and definition ID: |
| 92 | + |
| 93 | +```bash |
| 94 | +az devops invoke --area build --resource builds \ |
| 95 | + --route-parameters project=$PROJECT buildId=$BUILD_ID \ |
| 96 | + --org $ORG_URL \ |
| 97 | + --query "{status:status, result:result, startTime:startTime, finishTime:finishTime, definitionId:definition.id, definitionName:definition.name}" \ |
| 98 | + --output json 2>&1 |
| 99 | +``` |
| 100 | + |
| 101 | +**Compute elapsed time:** Subtract `startTime` from the current time (or from `finishTime` if the build is complete). Present as e.g. "Ran for 42 min" or "Running for 42 min". |
| 102 | + |
| 103 | +Then fetch the build timeline for **all jobs** (to get progress counts) and **any failures so far** — even when the build is still in progress: |
| 104 | + |
| 105 | +```bash |
| 106 | +az devops invoke --area build --resource timeline \ |
| 107 | + --route-parameters project=$PROJECT buildId=$BUILD_ID \ |
| 108 | + --org $ORG_URL \ |
| 109 | + --query "records[?type=='Job'] | [].{name:name, state:state, result:result}" \ |
| 110 | + --output json 2>&1 |
| 111 | +``` |
| 112 | + |
| 113 | +**Compute job progress counters** from the timeline response: |
| 114 | +- Count jobs where `state == 'completed'` → **finished** |
| 115 | +- Count jobs where `state == 'inProgress'` → **running** |
| 116 | +- Count jobs where `state == 'pending'` → **waiting** |
| 117 | +- Total = finished + running + waiting |
| 118 | + |
| 119 | +Then fetch failures: |
| 120 | + |
| 121 | +```bash |
| 122 | +az devops invoke --area build --resource timeline \ |
| 123 | + --route-parameters project=$PROJECT buildId=$BUILD_ID \ |
| 124 | + --org $ORG_URL \ |
| 125 | + --query "records[?result=='failed'] | [].{name:name, type:type, result:result, issues:issues, errorCount:errorCount, log:log}" \ |
| 126 | + --output json 2>&1 |
| 127 | +``` |
| 128 | + |
| 129 | +Check `issues` arrays first — they often contain the root cause directly. |
| 130 | + |
| 131 | +#### Step 3a — Estimate completion time per build (when build is in progress) |
| 132 | + |
| 133 | +Use the `definitionId` from the build to query recent successful builds of the **same pipeline definition** and compute the median duration. **Do this separately for each build** — the pipelines have very different durations. |
| 134 | + |
| 135 | +**Important:** The `dotnet-android` pipeline duration varies significantly based on whether the PR is from a fork: |
| 136 | +- **Direct PRs:** `dotnet-android` runs build-only (tests skipped) — typically much shorter (~1h 45min) |
| 137 | +- **Fork PRs:** `dotnet-android` runs the full pipeline with tests — typically much longer |
| 138 | + |
| 139 | +To get accurate ETAs, filter historical builds to match the current PR type. You can approximate this by looking at the **job count** of the current build vs historical builds — build-only runs have ~3 jobs while full runs have many more. Alternatively, compare the historical durations and pick the ones that are similar in magnitude to what you'd expect for the current build type. |
| 140 | + |
| 141 | +```bash |
| 142 | +az devops invoke --area build --resource builds \ |
| 143 | + --route-parameters project=$PROJECT \ |
| 144 | + --org $ORG_URL \ |
| 145 | + --query-parameters "definitions=$DEF_ID&statusFilter=completed&resultFilter=succeeded&\$top=10" \ |
| 146 | + --query "value[].{startTime:startTime, finishTime:finishTime}" \ |
| 147 | + --output json 2>&1 |
| 148 | +``` |
| 149 | + |
| 150 | +**Compute ETA:** |
| 151 | +1. For each recent build, calculate `duration = finishTime - startTime` |
| 152 | +2. Filter to builds with similar duration profile (short ~1-2h for build-only, long ~3h+ for full runs) matching the current PR type |
| 153 | +3. Compute the **median** duration of the filtered set (more robust than average against outliers) |
| 154 | +4. `ETA = startTime + medianDuration` |
| 155 | +5. Present as: "ETA: ~14:30 UTC (typical for direct PRs: ~1h 45min)" |
| 156 | + |
| 157 | +If `startTime` is null (build hasn't started yet), skip the ETA and say "Build queued, not started yet". |
| 158 | +If the build already completed, skip the ETA and show the actual duration instead. |
| 159 | + |
| 160 | +#### Step 3b — Check for failed tests (always do this, especially when the build is still running) |
| 161 | + |
| 162 | +**This step is critical when the build is in progress.** Test results are published as jobs complete, so failures may already be visible before the build finishes. Surfacing these early lets the user start fixing them immediately. |
| 163 | + |
| 164 | +Query test runs for this build: |
| 165 | + |
| 166 | +```bash |
| 167 | +az devops invoke --area test --resource runs \ |
| 168 | + --route-parameters project=$PROJECT \ |
| 169 | + --org $ORG_URL \ |
| 170 | + --query-parameters "buildUri=vstfs:///Build/Build/$BUILD_ID" \ |
| 171 | + --query "value[?runStatistics[?outcome=='Failed']] | [].{id:id, name:name, totalTests:totalTests, state:state, stats:runStatistics}" \ |
| 172 | + --output json 2>&1 |
| 173 | +``` |
| 174 | + |
| 175 | +For each test run that has failures, fetch the failed test results: |
| 176 | + |
| 177 | +```bash |
| 178 | +az devops invoke --area test --resource results \ |
| 179 | + --route-parameters project=$PROJECT runId=$RUN_ID \ |
| 180 | + --org $ORG_URL \ |
| 181 | + --query-parameters "outcomes=Failed&\$top=20" \ |
| 182 | + --query "value[].{testName:testCaseTitle, outcome:outcome, errorMessage:errorMessage, durationMs:durationInMs}" \ |
| 183 | + --output json 2>&1 |
| 184 | +``` |
| 185 | + |
| 186 | +If the `errorMessage` is truncated or absent, you can fetch a single test result's full details: |
| 187 | + |
| 188 | +```bash |
| 189 | +az devops invoke --area test --resource results \ |
| 190 | + --route-parameters project=$PROJECT runId=$RUN_ID testId=$TEST_ID \ |
| 191 | + --org $ORG_URL \ |
| 192 | + --query "{testName:testCaseTitle, errorMessage:errorMessage, stackTrace:stackTrace}" \ |
| 193 | + --output json 2>&1 |
| 194 | +``` |
| 195 | + |
| 196 | +#### Step 4 — Present summary |
| 197 | + |
| 198 | +Use this format — **one section per AZDO build**, each with its own progress and ETA: |
| 199 | + |
| 200 | +``` |
| 201 | +# CI Status for PR #NNNN — "PR Title" |
| 202 | +🔀 **Direct PR** (branch in dotnet/android) — or 🍴 **Fork PR** (external contributor) |
| 203 | +
|
| 204 | +## GitHub Checks |
| 205 | +| Check | Status | |
| 206 | +|-------|--------| |
| 207 | +| check-name | ✅ / ❌ / 🟡 | |
| 208 | +
|
| 209 | +## dotnet-android [#BuildId](link) |
| 210 | +**Result:** ✅ Succeeded / ❌ Failed / 🟡 In Progress |
| 211 | +ℹ️ Build-only (tests run on Xamarin.Android-PR for direct PRs) — or ℹ️ Full pipeline with tests (fork PR) |
| 212 | +⏱️ Running for **12 min** · ETA: ~15:15 UTC (typical for direct PRs: ~1h 45min) |
| 213 | +📊 Jobs: **0/3 completed** · 1 running · 2 waiting |
| 214 | +
|
| 215 | +| Job | Status | |
| 216 | +|-----|--------| |
| 217 | +| macOS > Build | 🟡 In Progress | |
| 218 | +| Linux > Build | ⏳ Waiting | |
| 219 | +| Windows > Build & Smoke Test | ⏳ Waiting | |
| 220 | +
|
| 221 | +## Xamarin.Android-PR [#BuildId](link) |
| 222 | +**Result:** ✅ Succeeded / ❌ Failed / 🟡 In Progress |
| 223 | +— or for fork PRs: ⏳ **Will not run** — fork PRs don't trigger this pipeline |
| 224 | +⏱️ Running for **42 min** · ETA: ~15:45 UTC (typical: ~2h 30min) |
| 225 | +📊 Jobs: **18/56 completed** · 6 running · 32 waiting |
| 226 | +
|
| 227 | +### Failures (if any) |
| 228 | +❌ Stage > Job > Task |
| 229 | + Error: <first error message> |
| 230 | +
|
| 231 | +### Failed Tests (if any — even while build is still running) |
| 232 | +| Test Run | Failed | Total | |
| 233 | +|----------|--------|-------| |
| 234 | +| run-name | N | M | |
| 235 | +
|
| 236 | +**Failed test names:** |
| 237 | +- `Namespace.TestClass.TestMethod` — brief error message |
| 238 | +- ... |
| 239 | +
|
| 240 | +## What next? |
| 241 | +1. View full logs / stack traces for a test failure |
| 242 | +2. Download and analyze .binlog artifacts |
| 243 | +3. Retry failed stages |
| 244 | +``` |
| 245 | + |
| 246 | +**Progress section guidelines:** |
| 247 | +- Always show fork status (🔀 Direct PR / 🍴 Fork PR) at the top — it determines which builds run and their expected durations |
| 248 | +- For `dotnet-android`, note whether it's build-only (direct PR) or full pipeline (fork PR) |
| 249 | +- For `Xamarin.Android-PR` on fork PRs, don't try to query it — just report "Will not run" |
| 250 | +- Always show elapsed time when `startTime` is available |
| 251 | +- Show ETA when the build is in progress and historical data is available. If the build has been running longer than the median, say "overdue by ~X min" |
| 252 | +- Show job counters as "N/Total completed · M running · P waiting" |
| 253 | +- If the build hasn't started yet, show "⏳ Not triggered yet — typically starts within a few minutes of a push" |
| 254 | +- If a check is in "Expected" state with no build URL on a direct PR, the AZDO pipeline hasn't picked it up yet — this is normal and not gated on other builds |
| 255 | + |
| 256 | +**If the build is still running but tests have already failed**, highlight these prominently so the user can start fixing them immediately. Use a note like: |
| 257 | + |
| 258 | +> ⚠️ Build still in progress, but **N tests have already failed** — you can start investigating these now. |
| 259 | +
|
| 260 | +**If no failures found anywhere**, report CI as green and stop. |
| 261 | + |
| 262 | +### Phase 2: Deep Investigation (only if user requests) |
| 263 | + |
| 264 | +Only proceed here if the user asks to investigate a specific failure, view logs, or analyze binlogs. |
| 265 | + |
| 266 | +#### Fetch logs |
| 267 | + |
| 268 | +Get the `log.id` from failed timeline records, then: |
| 269 | + |
| 270 | +```bash |
| 271 | +az devops invoke --area build --resource logs \ |
| 272 | + --route-parameters project=$PROJECT buildId=$BUILD_ID logId=$LOG_ID \ |
| 273 | + --org $ORG_URL --project $PROJECT \ |
| 274 | + --out-file "/tmp/azdo-log-$LOG_ID.log" 2>&1 |
| 275 | +tail -40 "/tmp/azdo-log-$LOG_ID.log" |
| 276 | +``` |
| 277 | + |
| 278 | +```powershell |
| 279 | +$logFile = Join-Path $env:TEMP "azdo-log-$LOG_ID.log" |
| 280 | +az devops invoke --area build --resource logs ` |
| 281 | + --route-parameters project=$PROJECT buildId=$BUILD_ID logId=$LOG_ID ` |
| 282 | + --org $ORG_URL --project $PROJECT ` |
| 283 | + --out-file $logFile |
| 284 | +Get-Content $logFile -Tail 40 |
| 285 | +``` |
| 286 | + |
| 287 | +#### Analyze .binlog artifacts |
| 288 | + |
| 289 | +See [references/binlog-analysis.md](references/binlog-analysis.md) for binlog download and analysis commands. |
| 290 | + |
| 291 | +#### Categorize failures |
| 292 | + |
| 293 | +See [references/error-patterns.md](references/error-patterns.md) for dotnet/android-specific error patterns and categorization. |
| 294 | + |
| 295 | +## Error Handling |
| 296 | + |
| 297 | +- **Build in progress:** Still query for failed timeline records AND test runs. Report any early failures alongside the in-progress status. Only offer `gh pr checks --watch` if there are no failures yet. |
| 298 | +- **Check in "Expected" state (no build URL):** The AZDO pipeline hasn't been triggered yet. This is normal — the two pipelines (`dotnet-android` and `Xamarin.Android-PR`) run independently, not sequentially. Report: "⏳ Not triggered yet — typically starts within a few minutes of a push." Do NOT say it's waiting for the other build. |
| 299 | +- **Auth expired:** Tell user to run `az login` and retry. |
| 300 | +- **Build not found:** Verify the PR number/build ID is correct. |
| 301 | +- **No test runs yet:** The build may not have reached the test phase. Report what's available and note that tests haven't started. |
| 302 | + |
| 303 | +## Tips |
| 304 | + |
| 305 | +- Focus on the **first** error chronologically — later errors often cascade |
| 306 | +- `.binlog` has richer detail than text logs when logs show only "Build FAILED" |
| 307 | +- `issues` in timeline records often contain the root cause without needing to download logs |
0 commit comments