feat: forecast — context-aware downloads, no workflow logs, partial results on timeout, enriched report with repo aggregate#37457
Conversation
…loading - Add forecastDownloadUsageArtifact: downloads only the usage artifact without workflow run log downloads (not needed for AIC computation) - Thread context.Context through loadCachedRunAIC so downloads respect the forecast timeout instead of using context.Background() - Produce partial forecast results on timeout or user cancellation instead of discarding all progress — results collected so far are emitted before the timeout error is returned - Add structured logging throughout loadCachedRunAIC: cache hits, cache misses, download start/end, artifact counts, and interruption - Update workflow YAML: on timeout with partial results, emit a warning and continue (instead of always failing with exit 1) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…rtialForecastResults Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot Check the compiler code and ensure it generates a step in the conclusion job to create the usage artifact. |
|
The compiler already generates usage artifact steps in the conclusion job. In
Both steps run with |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
… to forecast table Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…run samples section, remove horizon tip Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in commits
|
|
@copilot, the forecast should also provide an aggregate of all forecasts so we know the total forecast for the repo. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR expands the forecast output to include per-run percentile metrics, weekly/monthly projections, and detailed run samples, while optimizing artifact downloads for forecasting and emitting partial results on interruption.
Changes:
- Add per-run AIC samples, P50/P95 per-run metrics, and weekly/monthly projections (incl. Monte Carlo summaries) to forecast results and table output.
- Replace forecast artifact downloading with a usage-artifact-only implementation and improve cancellation/timeout behavior by emitting partial results.
- Update GitHub Action issue template + JS generator/tests to render the new summary columns and a run-samples details section.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/forecast.go | Adds new forecast result fields (percentiles, weekly/monthly projections, run samples), partial-result emission, and usage-only artifact downloader. |
| pkg/cli/forecast_test.go | Updates mocks and callsites for new loadCachedRunAIC(ctx, ...) signature. |
| actions/setup/md/forecast_issue.md | Switches issue template placeholder from zero-tip to run-samples section. |
| actions/setup/js/create_forecast_issue.cjs | Updates issue body generator to render new columns, totals row, and run-samples details block; keeps legacy fallback. |
| actions/setup/js/create_forecast_issue.test.cjs | Updates existing expectations for new table format and adds tests for run samples + TOTAL row. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 5
| // Legacy fallback: derive weekly/monthly from the configured-period P50 when new fields are absent. | ||
| const hasNewFields = workflows.some(w => w?.p50_aic_per_run != null || w?.weekly_projected_aic != null); | ||
| const legacyRows = hasNewFields | ||
| ? null | ||
| : workflows.map(workflow => { | ||
| const p50 = workflow?.monte_carlo?.p50_projected_aic ?? workflow?.projected_aic ?? workflow?.monte_carlo?.p50_projected_effective_tokens ?? workflow?.projected_effective_tokens ?? 0; | ||
| return [escapeCell(workflow.workflow_id), workflow.sampled_runs ?? 0, Number(p50)]; | ||
| }); |
| func loadCachedRunAIC(ctx context.Context, runID int64, verbose bool) float64 { | ||
| dir := filepath.Join(defaultLogsOutputDir, fmt.Sprintf("run-%d", runID)) | ||
| summary, ok := loadRunSummary(dir, verbose) | ||
| if ok && summary != nil && summary.TokenUsage != nil && summary.TokenUsage.TotalAIC > 0 { | ||
| forecastRunLog.Printf("AIC cache hit for run %d: aic=%.3f (from run_summary.json)", runID, summary.TokenUsage.TotalAIC) | ||
| return summary.TokenUsage.TotalAIC | ||
| } |
| // Sort partial results by Monte Carlo P50 descending (mirrors the full-results sort). | ||
| sort.Slice(results, func(i, j int) bool { | ||
| pi := results[i].ProjectedAIC | ||
| if mc := results[i].MonteCarlo; mc != nil { | ||
| pi = mc.P50ProjectedAIC | ||
| } | ||
| pj := results[j].ProjectedAIC | ||
| if mc := results[j].MonteCarlo; mc != nil { | ||
| pj = mc.P50ProjectedAIC | ||
| } | ||
| return pi > pj | ||
| }) |
| // Compute P50 and P95 of individual run AIC (per-run percentiles, not period totals). | ||
| sortedAIC := make([]int, len(aicObservations)) | ||
| copy(sortedAIC, aicObservations) | ||
| sort.Ints(sortedAIC) | ||
| result.P50AIC = roundForecastAIC(float64(percentileInt(sortedAIC, 50)) / 1000) | ||
| result.P95AIC = roundForecastAIC(float64(percentileInt(sortedAIC, 95)) / 1000) | ||
|
|
||
| // Compute observed run frequency: runs per calendar day over the history window, | ||
| // scaled to the projection period. | ||
| result.ObservedRunsPerPeriod = float64(n) / float64(config.Days) * float64(periodDays) | ||
| observedRunsPerDay := float64(n) / float64(config.Days) | ||
| result.ObservedRunsPerPeriod = observedRunsPerDay * float64(periodDays) | ||
|
|
||
| // Point estimates for weekly (7-day) and monthly (30-day) projections. | ||
| weeklyRuns := observedRunsPerDay * 7 | ||
| monthlyRuns := observedRunsPerDay * 30 | ||
| result.WeeklyProjectedAIC = roundForecastAIC(weeklyRuns * result.AvgAIC) | ||
| result.MonthlyProjectedAIC = roundForecastAIC(monthlyRuns * result.AvgAIC) |
| for (const wf of workflows) { | ||
| const samples = Array.isArray(wf?.run_samples) ? wf.run_samples : []; | ||
| for (const s of samples) { | ||
| const runID = s?.run_id ?? ""; | ||
| const date = s?.date ?? ""; | ||
| const aic = formatAIC(s?.aic ?? 0); | ||
| lines.push(`| ${escapeCell(wf.workflow_id)} | #${runID} | ${date} | ${aic} |`); | ||
| } | ||
| } |
Done in the latest commits. The |
The forecast command downloaded workflow run logs (unnecessary for AIC computation), ignored the timeout context in artifact downloads, discarded all collected results when a timeout fired, and produced a report with insufficient cost detail.
Changes
Skip workflow log downloads — Introduces
forecastDownloadUsageArtifactas a focused replacement for the general-purposedownloadRunArtifacts. It lists artifacts, downloads only the matchingusageartifact, and returnsErrNoArtifactsimmediately when none is found — no diagnostic log download fallback.Context propagation —
loadCachedRunAICnow acceptscontext.Contextand forwards it through to artifact downloads. Previously usedcontext.Background(), so downloads ran to completion even after the configured--timeoutfired.Partial results on timeout/cancel —
emitPartialForecastResultsemits JSON/table output for all workflows processed before the interrupt, then returns the timeout exit code. Previously all progress was silently discarded.Structured logging —
loadCachedRunAICandforecastDownloadUsageArtifactnow log cache hits, cache misses, download start/completion, artifact counts, and context interruptions viaforecastRunLog.Workflow YAML resilience — On exit 124 (CLI timeout), the step now checks whether
report.jsonhas content. With partial results:::warning::+ continue (cache saves, issue is created with partial data). Without any results:::error::+exit 1as before.Enriched forecast report — The CLI table and GitHub issue now show P50/Run (per-run median AIC), P95/Run (95th-percentile per-run AIC), Weekly (P50) and Monthly (P50) projected totals, and a TOTAL row across all workflows. A detailed run-samples section lists every sampled run with its ID, date, and raw AIC for human review (collapsible
<details>block in the issue body). The "increase the horizon" tip is removed.Repo-level aggregate —
ForecastResultnow carries atotalsfield (weekly_p50,monthly_p50) that sums the weekly and monthly P50 projections across all workflows. This is computed by the newcomputeForecastTotalshelper (preferring Monte Carlo P50 over point estimates) and is always present inreport.json, so consumers never need to aggregate individual workflow entries themselves. The CLI table TOTAL row and the GitHub issue TOTAL row both use this pre-computed value.