Skip to content

feat(matplotlib): implement line-multi#5796

Merged
MarkusNeusinger merged 3 commits into
mainfrom
implementation/line-multi/matplotlib
May 6, 2026
Merged

feat(matplotlib): implement line-multi#5796
MarkusNeusinger merged 3 commits into
mainfrom
implementation/line-multi/matplotlib

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented May 6, 2026

Implementation: line-multi - python/matplotlib

Implements the python/matplotlib version of line-multi.

File: plots/line-multi/implementations/python/matplotlib.py

Parent Issue: #1825


🤖 impl-generate workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 6, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): The plot displays three product sales trends over 12 months on a warm off-white background (#FAF8F1). The title "line-multi · matplotlib · anyplot.ai" is prominent and clearly readable in dark text. The x-axis shows month names (Jan-Dec) and y-axis shows "Sales ($ thousands)" with values ranging from ~20 to ~110. Three distinct lines represent:

  • Product A (solid line with circle markers) in brand green (#009E73)
  • Product B (dashed line with square markers) in vermillion (#D55E00)
  • Product C (dash-dot line with triangle markers) in blue (#0072B2)

A subtle y-axis grid provides reference without overwhelming the data. The legend in the upper left has a styled frame and light background. An annotation box highlighting the peak value ($107K) appears at the maximum point of Product A. All text (title, axis labels, tick labels, legend, annotation) is clearly readable against the light background. Legibility verdict: PASS

Dark render (plot-dark.png): The same plot renders on a warm near-black background (#1A1A17). All data colors remain identical to the light render: #009E73, #D55E00, #0072B2 (only chrome inverts). Title and axis labels are now in light text, clearly visible against the dark background. Tick labels, grid, and legend frame have appropriately inverted colors. The annotation box maintains good contrast and readability. No text is dark-on-dark; all elements are clearly distinguishable. Legibility verdict: PASS

Score: 90/100

Category Score Max
Visual Quality 30 30
Design Excellence 14 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 6 10
Total 90 100

Visual Quality (30/30)

  • VQ-01: Text Legibility (8/8) - All font sizes explicitly set (title=24pt, labels=20pt, ticks=16pt); perfectly readable in both themes
  • VQ-02: No Overlap (6/6) - All text well-positioned, no collisions, annotation doesn't interfere with data
  • VQ-03: Element Visibility (6/6) - Lines, markers (circle/square/triangle), and data points all clearly distinguishable
  • VQ-04: Color Accessibility (2/2) - Okabe-Ito palette ensures CVD-safe colors; good contrast between series
  • VQ-05: Layout & Canvas (4/4) - Perfect 16:9 aspect ratio, plot fills canvas well, balanced margins
  • VQ-06: Axis Labels & Title (2/2) - Descriptive axis labels with units, correct title format
  • VQ-07: Palette Compliance (2/2) - First series is #009E73 ✓, multi-series follows Okabe-Ito order (#009E73, #D55E00, #0072B2) ✓, backgrounds are #FAF8F1 (light) / #1A1A17 (dark) ✓, both renders theme-correct ✓

Design Excellence (14/20)

  • DE-01: Aesthetic Sophistication (5/8) - Clean, professional styling with intentional choices (three distinct line styles: solid, dashed, dash-dot; three distinct markers). Peak annotation adds thoughtful polish. Could be more sophisticated overall.
  • DE-02: Visual Refinement (4/6) - Top and right spines removed, subtle y-axis grid (alpha=0.15), generous whitespace, styled legend frame with subtle border. Good refinement visible.
  • DE-03: Data Storytelling (5/6) - Three products tell distinct stories: Product A shows steady growth with Q4 bump, Product B shows dip and recovery pattern, Product C shows exponential growth. Visual hierarchy through color and markers. Annotation highlights the focal peak value.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) - Correct multi-line comparison plot showing three series for direct comparison
  • SC-02: Required Features (4/4) - Multiple series ✓, distinct colors ✓, legend ✓, varying line styles ✓, markers at data points ✓
  • SC-03: Data Mapping (3/3) - X-axis shows months 1-12, y-axis shows sales values, all data visible and correctly scaled
  • SC-04: Title & Legend (3/3) - Title format "line-multi · matplotlib · anyplot.ai" ✓, legend labels are descriptive ("Product A (Electronics)", "Product B (Appliances)", "Product C (Software)") ✓

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) - Demonstrates all multi-line plot aspects: multiple series with different characteristics, distinct trends (growth, dip/recovery, exponential), peak annotation shows data insight
  • DQ-02: Realistic Context (5/5) - Product sales scenario is real-world, comprehensible, and neutral. Monthly sales in thousands is a familiar business context.
  • DQ-03: Appropriate Scale (4/4) - Values range 20-107 thousands (sensible product sales range), time period is 12 months (standard planning cycle)

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) - Straightforward sequential code: imports → data generation → plot → save; no functions or classes
  • CQ-02: Reproducibility (2/2) - Uses np.random.seed(42) for deterministic noise generation
  • CQ-03: Clean Imports (2/2) - Only necessary imports used: os (for THEME env var), matplotlib, numpy
  • CQ-04: Code Elegance (2/2) - Clean, Pythonic code; no over-engineering; no fake interactive features
  • CQ-05: Output & API (1/1) - Correctly saves as plot-{THEME}.png with proper DPI and facecolor settings

Library Mastery (6/10)

  • LM-01: Idiomatic Usage (4/5) - Proper use of Axes methods (ax.plot, ax.annotate), correct theme token implementation, proper spine and grid styling
  • LM-02: Distinctive Features (2/5) - Uses matplotlib-specific line styles (solid, dashed, dash-dot) and markers (o, s, ^), but overall usage is standard

Score Caps Applied

  • None / Plot passes all thresholds without caps

Strengths

  • Perfect visual quality execution with readable text in both light and dark themes
  • Well-implemented theme-adaptive styling with proper color tokens
  • Thoughtful use of line styles and markers for series differentiation
  • Informative peak value annotation that adds insight without faking interactivity
  • Realistic and relatable product sales data showing diverse trends
  • Clean, reproducible code with explicit font sizing and styling

Weaknesses

  • Design could be more sophisticated with additional visual enhancements (beyond default Okabe-Ito styling)
  • Limited use of distinctive matplotlib features beyond standard line/scatter API
  • Could leverage more advanced matplotlib capabilities for enhanced visual hierarchy

Issues Found

  • None identified. All technical requirements met.

AI Feedback for Next Attempt

N/A — Implementation meets approval threshold.

Verdict: APPROVED

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 6, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): The plot renders on a warm off-white (#FAF8F1) background with three clearly distinct lines: Product A (Electronics) in brand green (#009E73, solid line with circle markers), Product B (Appliances) in vermillion (#D55E00, dashed line with square markers), and Product C (Software) in blue (#0072B2, dash-dot line with triangle markers). The x-axis runs Jan–Dec with clear month labels; the y-axis shows "Sales ($ thousands)" with a range of ~28–110. A peak annotation "$107K" appears in the top-right at the December maximum of Product A, with a styled callout box. The legend is positioned upper-left with a themed frame. Title "line-multi · matplotlib · anyplot.ai" is bold at the top. All text (title, axis labels, tick labels, legend, annotation) is clearly readable against the light background — PASS.

Dark render (plot-dark.png): The same plot on a warm near-black (#1A1A17) background. All three data series retain their identical Okabe-Ito colors (green, orange, blue) — chrome elements correctly flip: the title, axis labels, tick labels, and legend text are all rendered in light tones (near-white/light-gray), confirming correct theme-adaptive token usage. Grid lines are subtle against the dark background. The "$107K" annotation box uses the elevated dark background (#242420) with light-gray text — readable. No dark-on-dark failures detected. All text is clearly readable against the dark background — PASS.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 92/100

Category Score Max
Visual Quality 30 30
Design Excellence 14 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 8 10
Total 92 100

Visual Quality (30/30)

  • VQ-01: Text Legibility (8/8) — All font sizes explicitly set: title 24pt, axis labels 20pt, ticks 16pt, legend 16pt; all perfectly readable in both themes
  • VQ-02: No Overlap (6/6) — No overlapping text; month labels well-spaced, legend placed away from dense data region
  • VQ-03: Element Visibility (6/6) — linewidth=3 and markersize=10 are well-suited for 12 monthly data points; all three series clearly distinguishable
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito is CVD-safe; additional distinction via line styles and marker shapes exceeds color-alone requirement
  • VQ-05: Layout & Canvas (4/4) — Plot fills canvas well with balanced margins; annotation fits without crowding
  • VQ-06: Axis Labels & Title (2/2) — Y-axis "Sales ($ thousands)" has units; X-axis "Month" is descriptive
  • VQ-07: Palette Compliance (2/2) — First series #009E73, second #D55E00, third #0072B2 (canonical Okabe-Ito order 1→3); backgrounds correctly #FAF8F1 / #1A1A17; all chrome theme-correct in both renders

Design Excellence (14/20)

  • DE-01: Aesthetic Sophistication (6/8) — Strong design clearly above defaults: custom Okabe-Ito palette, three distinct line styles (solid/dashed/dash-dot), three distinct marker shapes, themed peak annotation with styled callout box — well above a generic default configuration
  • DE-02: Visual Refinement (4/6) — Top and right spines removed; Y-axis-only grid at alpha=0.15 (subtle); legend frame themed with ELEVATED_BG and INK_SOFT edge; spine colors use INK_SOFT token — good refinement, some room to tighten (e.g., tick mark removal, slightly lower grid alpha)
  • DE-03: Data Storytelling (4/6) — Three lines with genuinely different narratives (steady growth, mid-year dip/recovery, exponential launch curve); peak annotation highlights the key insight; viewer is guided to the story rather than left to find it

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct multi-line comparison plot with 3 series on shared axes
  • SC-02: Required Features (4/4) — Distinct colors per line, legend, varying line styles, markers at data points — all spec recommendations implemented
  • SC-03: Data Mapping (3/3) — X = shared sequential month values; Y = sales; all 12 months visible with no clipping
  • SC-04: Title & Legend (3/3) — Title "line-multi · matplotlib · anyplot.ai" format correct; legend labels "Product A (Electronics)", "Product B (Appliances)", "Product C (Software)" are descriptive and accurate

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Demonstrates growth trend, decline+recovery pattern, and exponential growth curve — covers full diversity of multi-line behaviors
  • DQ-02: Realistic Context (5/5) — Monthly sales for product lines (Electronics, Appliances, Software) is a real, neutral, comprehensible business scenario
  • DQ-03: Appropriate Scale (4/4) — Sales values in ~$28K–$110K range are plausible for product line monthly revenues; relationships and seasonal patterns are realistic

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Flat: imports → theme tokens → data → plot → save; no functions or classes
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set before all data generation
  • CQ-03: Clean Imports (2/2) — Only os, matplotlib.pyplot, numpy — all used
  • CQ-04: Code Elegance (2/2) — Clean, Pythonic; annotation complexity is appropriate for the visualization; no over-engineering
  • CQ-05: Output & API (1/1) — Saves as plot-{THEME}.png with dpi=300, bbox_inches='tight', facecolor=PAGE_BG

Library Mastery (8/10)

  • LM-01: Idiomatic Usage (5/5) — Uses Axes methods throughout (not pyplot); ax.annotate() with arrowprops; plt.setp() for legend text; tight_layout() — all idiomatic matplotlib patterns
  • LM-02: Distinctive Features (3/5) — Uses ax.annotate() with styled arrowprops and bbox (matplotlib-distinctive); custom spine color control; plt.setp() on legend texts — library-specific but not a deeply distinctive feature that couldn't be replicated elsewhere

Score Caps Applied

  • None — no caps triggered

Strengths

  • Perfect theme-adaptive implementation: all chrome tokens (PAGE_BG, ELEVATED_BG, INK, INK_SOFT) correctly applied to every element in both light and dark renders
  • Data tells three genuinely distinct stories with seasonal/trend variation; annotation highlights the key peak insight
  • Triple distinction on series (color + line style + marker shape) exceeds spec requirements

Weaknesses

  • DE-01 ceiling: annotation fontsize (14pt) is below tick label size (16pt); could raise to 16pt for consistency
  • LM-02: could leverage more distinctive matplotlib capabilities (e.g., ax.fill_between() for confidence bands, mpl.patheffects for line glow, or inset_axes for detail view)
  • DE-02: tick marks could be hidden (ax.tick_params(length=0)) for a cleaner minimal look; grid alpha=0.15 is at upper end of recommended range

Issues Found

  1. LM-02 MODERATE: Implementation uses correct but generic matplotlib features — no truly distinctive capability
    • Fix: Consider ax.fill_between() for trend bands or mpl.patheffects for visual emphasis
  2. MINOR: Annotation fontsize=14 is below the 16pt tick label standard
    • Fix: Set annotation fontsize=16 for typographic consistency

AI Feedback for Next Attempt

This is a high-quality implementation that meets the threshold. If regenerating: raise annotation fontsize to 16pt; add ax.tick_params(length=0) for cleaner tick removal; consider ax.fill_between() with alpha confidence/range bands to use a more distinctive matplotlib feature; lower grid alpha to 0.10 for subtlety. Data and structure are excellent — focus improvements on design polish and library-distinctive technique.

Verdict: APPROVED

@github-actions github-actions Bot added quality:92 Quality score 92/100 ai-approved Quality OK, ready for merge labels May 6, 2026
@MarkusNeusinger MarkusNeusinger merged commit e3ae5bf into main May 6, 2026
3 checks passed
@MarkusNeusinger MarkusNeusinger deleted the implementation/line-multi/matplotlib branch May 6, 2026 21:48
MarkusNeusinger added a commit that referenced this pull request May 6, 2026
…view (#5819)

## Summary

Two minimal resilience patches for the dominant transient failures
observed in the daily-regen audit (2026-05-06):

1. **`impl-review.yml` — \"Extract PR info\" step** (5 failures in 24h).
Wrap `gh pr view` in a 3-attempt retry with exponential backoff. When
the GitHub API blips, the entire review job aborts and the PR ends up
unlabeled — blocking the review → repair → merge cascade. Concretely
caused 4 of the 14 stuck PRs we recovered today (#5696, #5789, #5796,
#5801 had no labels at all because review never made it past step 1).

2. **`impl-generate.yml` — \"Create library metadata file\" step** (6
failures in 24h). Wrap the final `git push origin \"\$BRANCH\"` in a
3-attempt retry that does `fetch + rebase` between attempts. The
dominant failure mode is racing against Claude's earlier push to the
same branch — when the metadata commit hits a non-fast-forward, the
whole generation aborts and the PR never opens.

Both fixes are inline bash retries — no new action dependency. Each step
still hard-fails after 3 attempts so persistent issues still surface (we
don't want to mask real bugs).

## Out of scope (deferred)

- **`daily-regen.yml` \"pick\" job clean exit**: the existing logic
already writes `count=0` when no specs are eligible, and downstream is
gated on `if: needs.pick.outputs.count != '0'`. The 2 reported
\"cancellations\" in the audit period look unrelated to the pick step
itself (likely scheduler-level events).
- **Auto-retry on `ai-review-failed`**: would require a more invasive
`if: failure()` job-level step. Holding until we see whether the
Extract-PR-info retry alone reduces the rate enough.

## Context

This branch is the Stage 5 follow-up to today's recovery work, which
manually shepherded 13 of 14 stuck PRs through the review/repair/merge
pipeline. The full investigation + recovery plan lives at
\`/home/tirao/.claude/plans/bitte-schaue-dir-alle-peppy-bunny.md\`
(local).

## Test plan

- [x] YAML-validate both edited workflows (passes)
- [x] No-op for the happy path: first attempt of each retry preserves
existing behavior exactly
- [ ] CI green on this PR
- [ ] Post-merge: watch the next 24h of impl-generate / impl-review run
conclusions; expect failure rate to drop from ~10% to near 0% on these
two steps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge quality:92 Quality score 92/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant