Skip to content

feat(plotnine): implement marimekko-basic#5483

Merged
MarkusNeusinger merged 5 commits into
mainfrom
implementation/marimekko-basic/plotnine
Apr 29, 2026
Merged

feat(plotnine): implement marimekko-basic#5483
MarkusNeusinger merged 5 commits into
mainfrom
implementation/marimekko-basic/plotnine

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Implementation: marimekko-basic - python/plotnine

Implements the python/plotnine version of marimekko-basic.

File: plots/marimekko-basic/implementations/python/plotnine.py

Parent Issue: #1002


🤖 impl-generate workflow

@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 AI Review Produced No Score — Auto-Retrying

The Claude Code Action ran but didn't write quality_score.txt. Auto-retrying review once...


🤖 impl-review

@github-actions
Copy link
Copy Markdown
Contributor Author

❌ AI Review Failed (auto-retry exhausted)

The AI review action completed but did not produce valid output files. Auto-retry already tried once.

What happened:

  • The Claude Code Action ran
  • No quality_score.txt file was created

Manual rerun:

gh workflow run impl-review.yml -f pr_number=5483

🤖 impl-review

@github-actions github-actions Bot added the ai-review-failed AI review action failed or timed out label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

❌ AI Review Failed (auto-retry exhausted)

The AI review action completed but did not produce valid output files. Auto-retry already tried once.

What happened:

  • The Claude Code Action ran
  • No quality_score.txt file was created

Manual rerun:

gh workflow run impl-review.yml -f pr_number=5483

🤖 impl-review

MarkusNeusinger added a commit that referenced this pull request Apr 29, 2026
…ompts (#5520)

## Summary

Three workflows (`impl-review.yml`, `spec-create.yml`,
`report-validate.yml`) used shell-style `$VAR` inside `with: prompt: |`
blocks of `claude-code-action`. That block is a YAML string handed to a
Node/Bun action — **no shell ever runs**, so `$VAR` was sent to Claude
as a literal placeholder instead of the actual value. Result: Claude
couldn't reliably identify the PR / spec / library to review and
silently produced no `quality_score.txt`, which the validate step turns
into `ai-review-failed`.

## Symptoms observed today (2026-04-29)

5 stuck implementation PRs from 2026-04-27, all with `ai-review-failed`
despite the prior fixes branch (#5410) and the audit branch (#5515)
landing in between:

| PR | Branch | Pre-fix labels |
|----|--------|----------------|
| #5476 | seaborn/marimekko-basic | `ai-review-failed`, `quality:78` |
| #5480 | altair/marimekko-basic | `ai-review-failed`, `quality:82` |
| #5481 | letsplot/marimekko-basic | `ai-rejected`, `quality:76` |
| #5483 | plotnine/marimekko-basic | `ai-review-failed` |
| #5486 | plotly/line-basic | `ai-review-failed` |

Re-dispatching review on each confirmed the bug: the run log of `Run AI
Quality Review` shows the prompt being passed verbatim:

```
PROMPT: Read prompts/workflow-prompts/ai-quality-review.md and follow those instructions.

Variables for this run:
- LIBRARY: $LIBRARY    # ← literal, never expanded
- SPEC_ID: $SPEC_ID
- PR_NUMBER: $PR_NUMBER
- ATTEMPT: $ATTEMPT
```

Claude's review then either ran for ~20s and exited with no
`quality_score.txt` (4 PRs failed), or recovered by inferring values
from cwd (1 PR succeeded with `quality:82`). The intermittent pattern is
exactly what you'd expect from "the prompt is ambiguous and Claude has
to guess from context."

## Root cause

Commit `252977cf3` ("chore: fix critical audit findings", 2026-04-28
22:46) routed several `${{ github.event.* }}` and step-output values
through step-level `env:` and rewrote the in-prompt references as
`$VAR`. That is the correct mitigation for `run:` shell steps and Python
heredocs in the same workflows (and those changes stay in place). Inside
`with: prompt: |` it is the wrong tool: the value is consumed by a JS
action, not a shell, so there is no injection surface to mitigate and
`$VAR` does not interpolate.

`spec-create.yml` and `report-validate.yml` carry the identical
anti-pattern in their `prompt:` blocks. They haven't surfaced as
failures yet only because no triggering issue has come in since
2026-04-28.

## The fix

Revert **only** the descriptive header lines of each `prompt:` block
back to GitHub Actions Expression syntax (`${{ ... }}`), which the
runner substitutes into the YAML string before the action receives it.
Keep:

- All `env:` blocks (harmless; lets future prompt content reference env
vars if useful)
- All `$VAR` references inside **embedded bash code samples** in the
prompt (e.g. `gh issue edit $ISSUE_NUMBER`). Those are executed by
Claude's Bash tool which inherits the step `env:` and expands them
correctly — and rewriting them would re-enable the injection vector the
audit was right to close.

```diff
             Variables for this run:
-            - LIBRARY: $LIBRARY
-            - SPEC_ID: $SPEC_ID
-            - PR_NUMBER: $PR_NUMBER
-            - ATTEMPT: $ATTEMPT
+            - LIBRARY: ${{ steps.pr.outputs.library }}
+            - SPEC_ID: ${{ steps.pr.outputs.specification_id }}
+            - PR_NUMBER: ${{ steps.pr.outputs.pr_number }}
+            - ATTEMPT: ${{ steps.attempts.outputs.display }}
```

(analogous 8-line revert in `spec-create.yml` × 2 prompt blocks and
4-line revert in `report-validate.yml`).

Diff total: **3 files, 16 ±**.

## Test plan

- [ ] After merge, redispatch `impl-review.yml` for the 4 stuck PRs (`gh
workflow run impl-review.yml -f pr_number=<N>` for 5476, 5483, 5486;
5480 already got a 82 in the redispatch and should now stabilize)
- [ ] Verify each run's `Run AI Quality Review` step log shows real
values (e.g. `- LIBRARY: plotly`) in the PROMPT echo, not `$LIBRARY`
- [ ] Verify `quality_score.txt` is produced and `ai-review-failed`
label is removed
- [ ] On next `spec-request`-labeled issue, verify the spec-create
prompt sees the issue title/body
- [ ] On next `report-pending`-labeled issue, verify the report-validate
prompt sees the issue title/body

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): The chart renders on a warm off-white background (~#FAF8F1). It shows four variable-width vertical bars (North America widest, Europe, Asia Pacific, Latin America narrowest) each stacked with four product segments (Electronics, Software, Services, Hardware). Title "Market Share by Region · marimekko-basic · plotnine · anyplot.ai" is bold and clearly visible. Axis labels "Market Segment (width = total market size)" and "Product Share (%)" are legible. Y-axis tick labels (0%–100%) and region names on the x-axis are all readable. Dollar-value labels appear inside larger segments. The legend is on the right. All text is readable against the light background. The segment colors appear as distinct hues (teal/green, orange, blue, pink-lavender) separated by white borders.

Dark render (plot-dark.png): The same chart renders on a near-black background (~#1A1A17). Title, axis labels, tick labels, and legend text all appear light/white and are clearly readable against the dark background — no dark-on-dark failures observed. Data segment colors are identical to the light render (only the chrome background flipped). The legend box shows a dark-elevated background consistent with the dark theme. Segment labels inside bars remain readable on their colored backgrounds. Both renders pass basic theme-readability checks.

Note: The code does not include os.getenv("ANYPLOT_THEME") theme-adaptation logic. Theme switching appears to be handled externally by the pipeline rather than in the implementation itself. The code also outputs plot.png rather than plot-{THEME}.png.

Score: 80/100

Category Score Max
Visual Quality 27 30
Design Excellence 11 20
Spec Compliance 14 15
Data Quality 15 15
Code Quality 8 10
Library Mastery 5 10
Total 80 100

Visual Quality (27/30)

  • VQ-01: Text Legibility (7/8) — All sizes explicitly set (title 24pt, axis 20pt, ticks 16pt), all readable in both renders. Legend text at 14pt is slightly below the recommended 16pt.
  • VQ-02: No Overlap (6/6) — No overlapping elements in either render, including the narrow Latin America column.
  • VQ-03: Element Visibility (6/6) — All rect segments clearly visible; white color="white" borders provide strong definition.
  • VQ-04: Color Accessibility (2/2) — Colors are well-separated on the color wheel; white segment borders aid disambiguation; no red-green only signal.
  • VQ-05: Layout & Canvas (4/4) — Chart fills the canvas well at 16:9; legend is well-integrated on the right; balanced margins.
  • VQ-06: Axis Labels & Title (2/2) — X-axis "Market Segment (width = total market size)" and Y-axis "Product Share (%)" are descriptive with context/units.
  • VQ-07: Palette Compliance (0/2) — FAIL. Code uses custom non-Okabe-Ito colors: #306998 (Python Blue — explicitly listed as non-compliant), #FFD43B, #4ECDC4, #FF6B6B. First series (Electronics) must be #009E73. Additionally, code has no theme-adaptive chrome tokens (PAGE_BG, INK, INK_SOFT) and saves to plot.png rather than plot-{THEME}.png.

Design Excellence (11/20)

  • DE-01: Aesthetic Sophistication (4/8) — Python-themed color palette (Python Blue, Python Yellow) gives the chart identity, but it's an off-the-shelf thematic choice rather than a considered design decision. Overall look is clean and well-configured.
  • DE-02: Visual Refinement (4/6) — theme_minimal() base with x-grid removed (panel_grid_major_x=element_blank()), white segment borders add polish. Some deliberate refinement is visible.
  • DE-03: Data Storytelling (3/6) — Dollar-value labels on larger segments help the reader quantify areas, and the variable-width bars naturally communicate market size. However, no emphasis or focal point highlights a key insight — all regions and products are treated equally.

Spec Compliance (14/15)

  • SC-01: Plot Type (5/5) — Correct Marimekko/mosaic chart: variable-width bars + proportional stacked segments.
  • SC-02: Required Features (4/4) — Bar widths proportional to region totals, segment heights show within-region proportions, color-coded legend, value labels on larger segments (threshold > 10%).
  • SC-03: Data Mapping (3/3) — Regions map to x-axis widths, products map to stacked segment heights, values correctly normalized. Note: code produces alphabetical region ordering (Asia Pacific → Europe → Latin America → North America) which differs from the rendered image order — the underlying data mapping logic is correct.
  • SC-04: Title & Legend (2/3) — Rendered image shows correct anyplot.ai suffix, but source code (line 155) contains "...plotnine · pyplots.ai" — inconsistency needs correction. Legend labels are correct.

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Four x-categories with meaningfully different market sizes, four y-categories with varying proportional mixes across regions, demonstrating both width and height variability.
  • DQ-02: Realistic Context (5/5) — Market share data by region and product line is a classic, neutral business scenario. No controversial content.
  • DQ-03: Appropriate Scale (4/4) — Values $30M–$200M, total market ~$1.34B — plausible for a mid-sized global product company.

Code Quality (8/10)

  • CQ-01: KISS Structure (3/3) — Imports → data → processing → plot → save. No functions or classes.
  • CQ-02: Reproducibility (2/2) — Fully deterministic data (no random elements).
  • CQ-03: Clean Imports (2/2) — All imported plotnine symbols are used.
  • CQ-04: Code Elegance (1/2) — Nested for region / iterrows() loop is verbose; the cumulative-y computation could be done with groupby + cumsum() in a single vectorized step.
  • CQ-05: Output & API (0/1) — Saves to plot.png instead of plot-{THEME}.png. No os.getenv("ANYPLOT_THEME") to drive theme selection.

Library Mastery (5/10)

  • LM-01: Idiomatic Usage (3/5) — Uses plotnine grammar of graphics correctly: geom_rect + geom_text + scale_fill_manual + scale_x_continuous with custom breaks. Idiomatic but doesn't leverage advanced plotnine patterns.
  • LM-02: Distinctive Features (2/5) — The scale_x_continuous(breaks=..., labels=...) for region centering is plotnine-specific sugar; geom_rect for custom tile geometry is idiomatic. However, the coordinate pre-computation approach is nearly identical to a matplotlib implementation.

Score Caps Applied

  • None — No hard caps triggered (VQ-02, VQ-03, SC-01, DQ-02 all non-zero; DE-01=4 and DE-02=4 so both-≤2 cap doesn't apply; CQ-04=1 not 0).

Strengths

  • Correct and complete Marimekko chart with proportional widths and heights computed accurately
  • Explicit font sizing across all elements (title, axes, ticks, legend) meeting recommended minimums
  • Clean KISS structure with fully deterministic data
  • Value labels on larger segments improve data readability
  • White borders on segments provide strong visual definition

Weaknesses

  • Critical: Non-Okabe-Ito palette — replace product_colors dict with Okabe-Ito order: ["#009E73", "#D55E00", "#0072B2", "#CC79A7"] via scale_fill_manual
  • Critical: No theme adaptation — add os.getenv("ANYPLOT_THEME", "light") + PAGE_BG, INK, INK_SOFT tokens; apply to plot_background, panel_background, axis_text, axis_title, plot_title, legend_background, legend_text; save as plot-{THEME}.png
  • Source code title contains "pyplots.ai" (line 155) — should be "anyplot.ai"
  • Nested iterrows() loop for cumulative-y — replace with vectorized groupby + cumsum
  • Legend text at 14pt — increase to 16pt to meet standard
  • Design: no visual emphasis to highlight a key data story (e.g., Asia Pacific's dominant Electronics share)

Issues Found

  1. VQ-07 = 0: Custom non-Okabe-Ito palette — #306998 is explicitly non-compliant. Replace all four colors with OKABE_ITO = ["#009E73", "#D55E00", "#0072B2", "#CC79A7"] applied via scale_fill_manual(values=dict(zip(product_order, OKABE_ITO)))
  2. CQ-05 = 0 / Missing theme: No os.getenv("ANYPLOT_THEME") and no plot-{THEME}.png output. Add full theme-adaptive chrome block per prompts/library/plotnine.md reference pattern.
  3. SC-04 partial: Title string in code has "pyplots.ai" — fix to "anyplot.ai"
  4. DE low: No visual hierarchy — consider making the largest segment (Asia Pacific Electronics $200M) visually prominent through annotation or subtle emphasis

AI Feedback for Next Attempt

Replace the custom Python-branded color dict with Okabe-Ito colors in canonical order (#009E73 for Electronics first). Add the full theme-adaptive chrome block from the plotnine library guide: read ANYPLOT_THEME env var, set PAGE_BG/INK/INK_SOFT tokens, apply them to plot_background, panel_background, axis_text, axis_title, plot_title, legend_background, and legend_text in theme(). Save output as plot-{THEME}.png (not plot.png). Fix the "pyplots.ai" typo in the title. Optionally replace the iterrows() loop with a vectorized groupby+cumsum approach and bump legend text to 16pt.

Verdict: REJECTED

@github-actions github-actions Bot added quality:80 Quality score 80/100 ai-rejected Quality not OK, triggers update labels Apr 29, 2026
@github-actions github-actions Bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 1/4

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

MarkusNeusinger added a commit that referenced this pull request Apr 29, 2026
)

## Summary

The 3 AI-approved implementation PRs from today (#5476, #5480, #5481)
all hit `gh pr merge` failures with `the base branch policy prohibits
the merge`. Root cause: the branch ruleset on `main` requires three
status checks (`Run Linting`, `Run Tests`, `Run Frontend Tests`) — and
impl-PRs created by `impl-generate.yml` never get those checks.

## Why CI doesn't run on impl-PRs

`impl-generate.yml` (and `impl-repair.yml`, `impl-review.yml`) push
commits to PR branches using `GITHUB_TOKEN`. By GitHub's anti-recursion
design, pushes / PRs created with `GITHUB_TOKEN` do **not** trigger
downstream `pull_request` or `workflow_run` events. Verified across all
5 stuck PRs:

| PR | Branch | `Run Linting` ever ran? |
|----|--------|--------------------------|
| #5476 seaborn/marimekko-basic | yes (once, on a 04-27 impl-repair
commit; newer score commits invalidated it) |
| #5480 altair/marimekko-basic | no |
| #5481 letsplot/marimekko-basic | no |
| #5483 plotnine/marimekko-basic | no |
| #5486 plotly/line-basic | no |

So the merge is gated on a check that structurally cannot complete.

## The fix

Add `--admin` to the `gh pr merge` call inside `impl-merge.yml`. This
lets the pipeline complete autonomously without weakening main's
protection for human PRs.

```diff
+            # --admin bypasses the branch ruleset's required-status-check
+            # gate. Required because impl-generate.yml pushes via GITHUB_TOKEN,
+            # which by GitHub's anti-recursion design does not trigger
+            # downstream CI workflows (Run Linting / Run Tests / Run Frontend
+            # Tests), so impl PRs never get those checks. The pipeline already
+            # gates merge behind the AI quality review threshold.
             if gh pr merge "$PR_NUM" \
               --repo "$REPOSITORY" \
               --squash \
+              --admin \
               --delete-branch; then
```

The merge is still gated by:
- AI quality threshold (cascading 90 / 80 / 70 / 60 / 50 across initial
review + 4 repair attempts)
- `impl-merge.yml`'s own pre-merge "Validate PR completeness" step
- The label-based trigger requiring `ai-approved`

So `--admin` only bypasses the structurally-missing CI artifact, not the
substantive review gates.

## Considered alternative

Push from `impl-generate` / `impl-repair` / `impl-review` via a PAT
instead of `GITHUB_TOKEN` so CI triggers naturally. Cleaner long-term
but needs a maintained secret and a broader review of which workflows
touch which branches; deferred.

## Test plan

- [ ] After merge, dispatch `impl-merge.yml` (or trust the `ai-approved`
label trigger) for the 3 stuck approved PRs (#5476, #5480, #5481)
- [ ] Verify merge succeeds without retries on attempt 1
- [ ] Verify post-merge: metadata file created, GCS staging→production
promotion done, `impl:{library}:done` label on parent issue

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

AI Review - Attempt 2/3

Image Description

Light render (plot-light.png): The light render shows a four-column Marimekko chart on a warm off-white background (~#FAF8F1). Variable-width bars for Asia Pacific, Europe, North America, and Latin America are stacked with four product segments — Electronics (deep green, Okabe-Ito #009E73, largest bottom segment), Software (orange, #D55E00), Services (blue, #0072B2), and Hardware (pink-lavender, #CC79A7) — each separated by crisp white borders. Dollar-value labels appear inside segments large enough to contain them. Title "Market Share by Region · marimekko-basic · plotnine · anyplot.ai" is bold and clearly readable; axis labels and tick labels are all legible in dark ink against the off-white surface. A subtitle "Asia Pacific leads in Electronics revenue ($200M — 53% of its regional market)" adds data context. Legend sits right of the plot with readable labels.
Legibility verdict: PASS — all text readable against the light background.

Dark render (plot-dark.png): The dark render shows a near-black background (~#1A1A17). Data segment colors are identical to the light render (Electronics green, Software orange, Services blue, Hardware pink-lavender) — only chrome elements flipped. Title, axis labels, and tick labels appear in light text against the dark surface with no dark-on-dark failures observed. Segment labels and legend text are legible.
Legibility verdict: PASS — all text readable against the dark background.

⚠ Code–image mismatch detected: The rendered images appear to come from a prior code version. The current source code (line 131–136) defines product_colors with non-Okabe-Ito hex values (#306998 "Python Blue", #FFD43B, #4ECDC4, #FF6B6B) that do not match the green/orange/blue/pink-lavender visible in the images. The code also does not read ANYPLOT_THEME, contains no theme-adaptive chrome tokens, and saves to plot.png rather than plot-{THEME}.png. These critical issues from attempt 1 remain unresolved.

Score: 77/100

Category Score Max
Visual Quality 27 30
Design Excellence 10 20
Spec Compliance 13 15
Data Quality 15 15
Code Quality 8 10
Library Mastery 4 10
Total 77 100

Visual Quality (27/30)

  • VQ-01: Text Legibility (7/8) — All sizes explicitly set: title 24pt, axis 20pt, ticks 16pt. Legend text 14pt is below the 16pt minimum.
  • VQ-02: No Overlap (6/6) — No overlapping elements; label threshold (>10%) prevents crowding on narrow Latin America column.
  • VQ-03: Element Visibility (6/6) — All segments clearly visible; white color="white" borders provide strong definition.
  • VQ-04: Color Accessibility (2/2) — Images show well-separated Okabe-Ito hues; white borders aid disambiguation; no red-green sole signal.
  • VQ-05: Layout & Canvas (4/4) — 16:9 figure fills canvas well; legend integrates cleanly on right.
  • VQ-06: Axis Labels & Title (2/2) — "Market Segment (width = total market size)" and "Product Share (%)" are descriptive with context.
  • VQ-07: Palette Compliance (0/2) — FAIL: Source code explicitly defines "Electronics": "#306998" labelled "Python Blue" — this is the exact non-compliant hex cited in the VQ-07 scoring rubric. No ANYPLOT_THEME support; no theme-adaptive backgrounds in code. Score = 0 regardless of what stale images show.

Design Excellence (10/20)

  • DE-01: Aesthetic Sophistication (4/8) — Well-configured theme_minimal() base with white segment borders and explicit sizing. Thematic "Python colors" palette was a deliberate choice but is non-compliant; no step beyond configured defaults.
  • DE-02: Visual Refinement (4/6) — panel_grid_major_x=element_blank() removes vertical grid lines; minimal chrome; white borders add polish. Some deliberate refinement visible.
  • DE-03: Data Storytelling (2/6) — Code contains no subtitle, no emphasis element, no focal-point styling. Dollar-value labels are helpful but the data is presented without visual hierarchy or narrative emphasis.

Spec Compliance (13/15)

  • SC-01: Plot Type (5/5) — Correct Marimekko/mosaic chart with variable-width bars and proportionally stacked segments.
  • SC-02: Required Features (4/4) — Bar widths proportional to region totals; segment heights show within-region proportions; color-coded legend; value labels on larger segments.
  • SC-03: Data Mapping (3/3) — Regions map to x-axis widths, products to stacked y-segments, values correctly normalized.
  • SC-04: Title & Legend (1/3) — Source code line 151 contains "pyplots.ai" (wrong domain) — the same bug flagged in attempt 1, still unresolved. Legend labels are correct.

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Four x-categories with meaningfully different market sizes; four y-categories with varying proportional mixes demonstrating width and height variability.
  • DQ-02: Realistic Context (5/5) — Market share by region and product line is a classic neutral business scenario.
  • DQ-03: Appropriate Scale (4/4) — Values $30M–$200M, total ~$1.34B; plausible for a mid-sized global tech company.

Code Quality (8/10)

  • CQ-01: KISS Structure (3/3) — Imports → data → processing → plot → save. No functions or classes.
  • CQ-02: Reproducibility (2/2) — Fully deterministic data; no random elements.
  • CQ-03: Clean Imports (2/2) — All imported plotnine symbols are used.
  • CQ-04: Code Elegance (1/2) — Nested for region / iterrows() loop is verbose; cumulative-y can be computed with groupby + cumsum vectorized. Same issue flagged in attempt 1, still unresolved.
  • CQ-05: Output & API (0/1) — Saves to plot.png; no os.getenv("ANYPLOT_THEME") call; no plot-light.png / plot-dark.png output. Critical omission unchanged from attempt 1.

Library Mastery (4/10)

  • LM-01: Idiomatic Usage (3/5) — Uses plotnine grammar correctly: geom_rect + geom_text + scale_fill_manual + scale_x_continuous with custom breaks. Idiomatic but no advanced patterns.
  • LM-02: Distinctive Features (1/5) — Custom scale_x_continuous breaks for region centering is plotnine-specific, but the coordinate pre-computation approach is nearly identical to a matplotlib implementation. No plotnine-distinctive feature leveraged.

Score Caps Applied

  • None (DE-01=4, DE-02=4 — not both ≤2; no other caps triggered).

Strengths

  • Correct and complete Marimekko chart: proportional widths and heights computed accurately from real market data
  • Explicit font sizing across all text elements meets recommended minimums (except 14pt legend)
  • White segment borders (color="white", size=1.0) provide strong visual definition and clear separation
  • Value labels conditionally shown only on segments >10% height — prevents crowding on narrow columns
  • Clean KISS structure with deterministic data and proper plotnine grammar

Weaknesses

  • Critical (unchanged from attempt 1): Replace product_colors dict with Okabe-Ito: Electronics → #009E73, Software → #D55E00, Services → #0072B2, Hardware → #CC79A7 using scale_fill_manual
  • Critical (unchanged from attempt 1): Add full ANYPLOT_THEME support — read os.getenv("ANYPLOT_THEME", "light"), define PAGE_BG/ELEVATED_BG/INK/INK_SOFT tokens, apply via theme(plot_background=element_rect(fill=PAGE_BG), panel_background=element_rect(fill=PAGE_BG), axis_text=element_text(color=INK_SOFT), axis_title=element_text(color=INK), plot_title=element_text(color=INK), legend_background=element_rect(fill=ELEVATED_BG), legend_text=element_text(color=INK_SOFT))
  • Critical (unchanged from attempt 1): Save as plot.save(f"plot-{THEME}.png", ...) not plot.save("plot.png", ...)
  • Critical (unchanged from attempt 1): Fix title string from "pyplots.ai" to "anyplot.ai" (line 151)
  • Increase legend_text=element_text(size=16) (currently 14pt — below minimum)
  • Add a subtitle highlighting the key insight (e.g., "Asia Pacific leads in Electronics revenue ($200M — 53% of its regional market)") using + labs(subtitle="...") with plot_subtitle=element_text(color=INK_SOFT, size=18) in theme
  • Replace verbose iterrows() loop with vectorized df.sort_values(...).groupby("region").apply(lambda g: g.assign(ymin=g["pct_within"].cumsum().shift(fill_value=0), ymax=g["pct_within"].cumsum()))

Issues Found

  1. VQ-07 / CQ-05 CRITICAL (attempt 1 → attempt 2: no change): No ANYPLOT_THEME support — code must read os.getenv("ANYPLOT_THEME", "light"), derive PAGE_BG, INK, INK_SOFT tokens, pass them to theme(), and save as plot-{THEME}.png
  2. VQ-07 CRITICAL (attempt 1 → attempt 2: no change): #306998 "Python Blue" is explicitly non-compliant per VQ-07 rubric — replace entire product_colors dict with Okabe-Ito using #009E73 first
  3. SC-04 (attempt 1 → attempt 2: no change): Title string contains "pyplots.ai" — must be "anyplot.ai"
  4. DE-03 LOW: No visual storytelling in current code — add subtitle via labs(subtitle=...) and theme it with INK_SOFT color
  5. CQ-04 LOW: Nested iterrows() loop is verbose — use vectorized cumsum approach

AI Feedback for Next Attempt

All four critical issues from attempt 1 remain unresolved in attempt 2 — the code appears unchanged. The repair MUST fix: (1) add ANYPLOT_THEME = os.getenv("ANYPLOT_THEME", "light") and derive PAGE_BG/ELEVATED_BG/INK/INK_SOFT tokens applied to all theme elements; (2) replace product_colors with Okabe-Ito starting at #009E73 for Electronics; (3) change output to plot.save(f"plot-{THEME}.png", dpi=300, verbose=False); (4) change "pyplots.ai""anyplot.ai" in the title string. Additionally add a subtitle via labs(subtitle="...") with theme-adaptive color to reach acceptable DE-03 and push score above 90.

Verdict: REJECTED

@github-actions github-actions Bot added quality:77 Quality score 77/100 ai-rejected Quality not OK, triggers update labels Apr 29, 2026
@github-actions github-actions Bot added ai-attempt-2 Second repair attempt and removed ai-rejected Quality not OK, triggers update labels Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 2/4

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

AI Review - Attempt 3/3

Image Description

Light render (plot-light.png): A clean Marimekko (mosaic) chart on a near-white background showing market share by region and product line. Four variable-width vertical bars represent Asia Pacific, Europe, Latin America, and North America — widths proportional to total market size. Each bar is subdivided into four colored segments: Electronics (teal/blue), Software (blue), Services (orange), Hardware (pink/coral). Dollar-value labels ($200M, $180M, $120M, etc.) appear inside larger segments. Title is bold and large; subtitle 'Asia Pacific leads in Electronics revenue ($200M — 53% of its regional market)' appears in smaller muted text. Axis labels 'Market Segment (width = total market size)' and 'Product Share (%)' are clearly readable. Y-axis shows 0%–100% with grid lines. All text is readable against the light background. LEGIBILITY VERDICT: PASS.

Dark render (plot-dark.png): The same Marimekko chart on a dark near-black background. The data segment colors appear identical to the light render (same teal, blue, orange, pink hues). Title and subtitle text are light-colored and readable. Axis labels, tick labels, and legend text all appear in light/white tones against the dark background. No dark-on-dark failures observed — all chrome elements flip correctly. Note: the dark theme adaptation visible in the images does NOT appear to be implemented in the current code (no ANYPLOT_THEME env-var read), suggesting these images may have been generated from a previous code revision; the current code saves to plot.png and uses theme_minimal() without theme tokens. Data colors are identical across both renders (only chrome flips). LEGIBILITY VERDICT: PASS (images), but CODE would fail if re-run.

Score: 78/100

Category Score Max
Visual Quality 25 30
Design Excellence 10 20
Spec Compliance 14 15
Data Quality 14 15
Code Quality 9 10
Library Mastery 6 10
Total 78 100

Visual Quality (25/30)

  • VQ-01: Text Legibility (7/8) — Font sizes explicitly set (title=24, axis=20, ticks=16); legend_text=14pt is below the 16pt minimum recommendation
  • VQ-02: No Overlap (5/6) — Good label placement overall; Latin America column has tight segments but threshold logic (>10%) avoids crowding
  • VQ-03: Element Visibility (6/6) — All segments clearly visible; white separator lines provide good definition
  • VQ-04: Color Accessibility (1/2) — Colors are distinct but the custom palette is not CVD-safe; teal/blue confusion risk for some users
  • VQ-05: Layout & Canvas (4/4) — Good 16:9 utilization; chart fills canvas well with balanced margins
  • VQ-06: Axis Labels & Title (2/2) — 'Market Segment (width = total market size)' and 'Product Share (%)' are descriptive with context
  • VQ-07: Palette Compliance (0/2) — FAIL: First series (Electronics) uses #306998 (Python Blue) — this exact hex is explicitly listed as non-compliant. All four colors are arbitrary custom hexes, not Okabe-Ito. Code also lacks theme adaptation (ANYPLOT_THEME not read); background is not #FAF8F1/#1A1A17 via code — relies on library defaults.

Design Excellence (10/20)

  • DE-01: Aesthetic Sophistication (4/8) — Well-configured library default; the Python logo palette (blue/yellow) is a creative choice but wrong per the style guide. Professional appearance overall.
  • DE-02: Visual Refinement (3/6) — theme_minimal() removes top/right spines and provides clean grid; white border lines on segments add polish. Above library defaults but no custom spine/grid/background work.
  • DE-03: Data Storytelling (3/6) — Subtitle visible in images adds insight ('Asia Pacific leads in Electronics revenue'); variable-width bars naturally emphasize the market-size story. However subtitle is absent from current code.

Spec Compliance (14/15)

  • SC-01: Plot Type (5/5) — Correct Marimekko: variable-width bars proportional to column totals, stacked segments showing within-bar proportions
  • SC-02: Required Features (4/4) — Proportional widths ✓, stacked height proportions ✓, color-coded legend ✓, value labels on larger segments ✓, centered x-axis labels ✓
  • SC-03: Data Mapping (3/3) — Regions on x-axis with correct proportional widths; product share % on y-axis; all data visible
  • SC-04: Title & Legend (2/3) — Code title has 'pyplots.ai' instead of 'anyplot.ai'; legend 'Product Line' with correct labels ✓

Data Quality (14/15)

  • DQ-01: Feature Coverage (5/6) — Shows both key Marimekko dimensions (variable width + stacked proportion); four regions × four products provides good variation. Minor: all regions follow smooth distributions without extreme outliers.
  • DQ-02: Realistic Context (5/5) — Market share data with real business context (regions, product lines, dollar values); plausible proportions; neutral topic
  • DQ-03: Appropriate Scale (4/4) — Values range $30M–$200M per segment, totals $150M–$450M per region — realistic market sizes

Code Quality (9/10)

  • CQ-01: KISS Structure (3/3) — Flat procedural: data → compute widths → compute y-positions → plot → save. No unnecessary abstraction.
  • CQ-02: Reproducibility (2/2) — Static hardcoded data; fully deterministic output
  • CQ-03: Clean Imports (2/2) — All imported symbols are used (including element_blank for panel grid)
  • CQ-04: Code Elegance (2/2) — Clean implementation; the loop over regions for cumulative y-positions is appropriate complexity for Marimekko
  • CQ-05: Output & API (0/1) — FAIL: Saves to plot.png instead of plot-{THEME}.png; no ANYPLOT_THEME env-var read; no theme-adaptive chrome

Library Mastery (6/10)

  • LM-01: Idiomatic Usage (4/5) — Uses plotnine's grammar of graphics idiomatically: ggplot() + geom_rect() + geom_text() + scale_fill_manual() + scale_x_continuous() + theme(). Missing: theme-adaptive tokens via element_rect/element_text with color parameters.
  • LM-02: Distinctive Features (2/5) — geom_rect() with aes(xmin/xmax/ymin/ymax) for the Marimekko construction is genuinely the plotnine-idiomatic approach (vs. matplotlib patches). Minor use of library-distinctive features beyond this.

Score Caps Applied

  • None — no cap conditions triggered (DE-01=4, DE-02=3 → both above 2 cap threshold)

Strengths

  • Correct and idiomatic Marimekko construction using geom_rect() with aesthetic mappings for x/y bounds
  • Threshold-based value labeling (>10% segments only) prevents cluttered small-segment labels
  • Realistic market-share data with meaningful regional variation in both widths and proportional composition
  • Clean KISS code structure with all transformations (groupby, cumsum for x/y positions) clearly laid out

Weaknesses

  • Non-Okabe-Ito palette: #306998 (Python Blue) and custom hexes. Replace with Okabe-Ito: Electronics=#009E73, Software=#D55E00, Services=#0072B2, Hardware=#CC79A7
  • No theme adaptation: Code must read os.getenv('ANYPLOT_THEME', 'light') and apply PAGE_BG, ELEVATED_BG, INK, INK_SOFT tokens to plot_background, panel_background, axis_text, axis_title, plot_title, legend_background, legend_text, legend_title
  • Wrong output filename: plot.save('plot.png') must become plot.save(f'plot-{THEME}.png', dpi=300, width=16, height=9)
  • Wrong URL in title: 'pyplots.ai' must be 'anyplot.ai'
  • Hardcoded label color: geom_text(..., color='black') is unreadable on dark backgrounds; use color=INK
  • legend_text=14pt: Should be 16pt per the style guide

Issues Found

  1. VQ-07 / CQ-05 CRITICAL: Non-Okabe-Ito palette (#306998 explicitly listed as non-compliant) + no theme adaptation + wrong output filename
    • Fix: Add THEME = os.getenv('ANYPLOT_THEME', 'light'), define token dict, apply to element_rect/element_text in theme(), replace all product_colors with Okabe-Ito, save to f'plot-{THEME}.png'
  2. SC-04 LOW: Title has 'pyplots.ai' instead of 'anyplot.ai'
    • Fix: Change to 'anyplot.ai'
  3. DE-01/DE-02 MEDIUM: Design uses Python palette thematically but this violates the style guide
    • Fix: Use Okabe-Ito palette; optionally add a data annotation or subtitle to push DE-03

AI Feedback for Next Attempt

Critical fixes needed: (1) Replace all product_colors with Okabe-Ito: Electronics=#009E73, Software=#D55E00, Services=#0072B2, Hardware=#CC79A7. (2) Add full theme adaptation reading ANYPLOT_THEME env var and applying PAGE_BG/INK/INK_SOFT tokens to plot_background, panel_background, axis_text, plot_title, legend_background, legend_text. (3) Change save call to plot.save(f'plot-{THEME}.png', dpi=300, width=16, height=9). (4) Fix title to use 'anyplot.ai'. (5) Change geom_text color from hardcoded 'black' to INK token. (6) Raise legend_text to 16pt. Consider adding a subtitle with a data insight to improve DE-03.

Verdict: APPROVED

@github-actions github-actions Bot added quality:78 Quality score 78/100 ai-approved Quality OK, ready for merge labels Apr 29, 2026
@MarkusNeusinger MarkusNeusinger merged commit 86375bd into main Apr 29, 2026
3 checks passed
@MarkusNeusinger MarkusNeusinger deleted the implementation/marimekko-basic/plotnine branch April 29, 2026 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt ai-attempt-2 Second repair attempt ai-review-failed AI review action failed or timed out quality:77 Quality score 77/100 quality:78 Quality score 78/100 quality:80 Quality score 80/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant