Gpt image 2 support by yangjunx21 · Pull Request #193 · OpenCoworkAI/open-codesign

yangjunx21 · 2026-04-23T05:24:48Z

Summary

Lets the agent call gpt-image-2 (or OpenRouter image models) on demand to generate bitmap assets (logos, hero images, illustrations) while producing a design, and embeds them seamlessly in preview + exports. Off by default; users opt in from Settings.

Type of change

New feature

Linked issue

Checklist

I read docs/VISION.md, docs/PRINCIPLES.md, and CLAUDE.md before starting
Commits are signed with DCO (git commit -s)
pnpm lint && pnpm typecheck && pnpm test passes locally
Added/updated tests for the change
Added a changeset (pnpm changeset) if user-visible
Updated docs if behavior changed

Dependency additions (if any)

Screenshots / recordings (UI changes)

Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>

POSIX-specific assertions were baked into six tests, which made the full `pnpm test` suite fail on Windows even though the production code is already cross-platform: - token-store: `0o600` mode bits aren't enforceable on NTFS (reports 0o666); guard the assertion with `process.platform !== 'win32'`. - skills/loader: `new URL(...).pathname` yields `/D:/...` on Windows, so `readdir` sees zero files; use `fileURLToPath()` instead. - opencode-config, locale-ipc, preferences-ipc: replace hard-coded forward-slash path strings with `path.join()`-built expectations that mirror whatever separator the host OS uses. - boot-fallback: `/dev/null/...` is only guaranteed-unwritable on POSIX; build a parent-is-a-regular-file path instead so `mkdirSync` throws ENOTDIR on both platforms. All 10 workspace packages' tests now pass on Windows. Made-with: Cursor Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>

Made-with: Cursor Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>

CodeQL flagged the anchored-quantifier pair `/\/+$/` + `/^\/+/` inside `joinEndpoint` (packages/providers/src/images.ts) as a potential ReDoS on library-supplied input. Replace both regex calls with explicit single-pass scans over the trailing/leading `/` characters — same behaviour, trivially linear, no CodeQL alert. Also unblock the Windows test run on this branch: - `token-store.test.ts`: a new 0o600-mode assertion added on main fails on NTFS (always reports 0o666); guard it the same way the existing sibling assertion is guarded. - `safe-read.test.ts`: the symlink-acceptance case requires admin / Developer Mode on Windows and otherwise throws EPERM; skip the case when symlink creation is denied, keeping full coverage on POSIX CI. Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn> Made-with: Cursor

Resolve conflict in Settings.tsx: close JSX tags around the new ImageGenerationPanel Save button/wrappers that were dropped during the merge, and drop hardcoded hex fallbacks (#16a34a, #d97706) and text-[10px] in the image-generation status badge in favor of --color-success / --color-warning / text-[var(--text-xs)] tokens. Made-with: Cursor

github-actions

Findings

[Major] Enabled image generation silently degrades by disabling the image tool when credentials are missing. This violates the project rule to avoid silent fallbacks and can produce non-image outputs without explicit user-facing failure. Evidence: apps/desktop/src/main/image-generation-settings.ts:124, apps/desktop/src/main/index.ts:378, apps/desktop/src/main/index.ts:433.
Suggested fix:

const cfg = getCachedConfig();
const imageConfig = cfg ? resolveImageGenerationConfig(cfg) : null;
if (cfg?.imageGeneration?.enabled === true && imageConfig === null) {
  throw new CodesignError(
    'Image generation is enabled but credentials are missing or invalid. Update Settings > Image Generation.',
    ERROR_CODES.PROVIDER_AUTH_MISSING,
  );
}

Summary

Review mode: initial
Not found in repo/docs: docs/VISION.md, docs/PRINCIPLES.md.

Testing

Not run (automation)

open-codesign Bot

github-actions · 2026-04-23T05:28:11Z

+            throw err;
+          }
+        }
+      : undefined;


[Major] This branch silently disables generate_image_asset when image generation is enabled but credentials are unavailable (resolveImageGenerationConfig returns null). That is a silent fallback rather than a surfaced error.

Suggested fix:

const cfg = getCachedConfig(); const imageConfig = cfg ? resolveImageGenerationConfig(cfg) : null; if (cfg?.imageGeneration?.enabled === true && imageConfig === null) { throw new CodesignError( 'Image generation is enabled but credentials are missing or invalid. Update Settings > Image Generation.', ERROR_CODES.PROVIDER_AUTH_MISSING, ); }

… 1 of #225) (#241) ## Summary Phase 1 of #225: a single-image → componentized `ui_kit/` decomposition pipeline that emits a coding-agent-ready bundle, plus deterministic + vision verifiers that self-check parity using a 12-question boolean rubric and re-iterate on gaps. Uses existing `userImages` plumbing (PR #193) and adds three new agent tools that mirror existing patterns (`done.ts` / `generate-image-asset.ts`). Ends in the chat sidebar with a one-click trigger that fires a structured prompt, walks the agent through decompose → verify → reconcile → done, and surfaces per-decompose cost as a toast. No new prod deps, no SQLite schema change, in-memory output via the Files panel. This PR addresses Phase 1 of #225 only. The Phase 2 (gpt-image-2 generation in the loop) and Phase 3 (multi-page flow) cuts I committed to in the issue thread are intentionally not included. ## 2026-06-06 rebase update Rebased onto current `OpenCoworkAI/open-codesign:main` at `b2d020d` and force-pushed the PR branch to `eed7cbc`. GitHub now reports the PR as mergeable again. The conflict resolution preserves current `main` architecture: - `packages/core/src/index.ts` keeps the current `inspect_workspace` public exports and only appends the visual parity types/functions. The legacy `read_design_system` core public export was not restored. - Generate IPC wiring now lives in `apps/desktop/src/main/ipc/generate.ts`; runtime FS source-image seeding lives in `apps/desktop/src/main/ipc/runtime-fs.ts`. - Renderer cost-toast logic now lives in the sliced store at `apps/desktop/src/renderer/src/store/slices/chat.ts`. - The first image attachment is seeded as `source.png` for `verify_ui_kit_visual_parity`, with regression coverage in `apps/desktop/src/main/index.workspace.test.ts`. Local verification after rebase: - `pnpm lint` - `pnpm --filter @open-codesign/core typecheck` - `pnpm --filter @open-codesign/desktop typecheck` - `pnpm --filter @open-codesign/core test` - `pnpm --filter @open-codesign/desktop test -- src/main/index.workspace.test.ts src/main/ipc/generate.workspace-rename.test.ts` - `pnpm --filter @open-codesign/providers test` Note: local pre-push full `pnpm test` hit a transient timeout in `packages/providers/src/codex/oauth-server.test.ts` during the concurrent turbo run; the same providers test passed immediately when rerun directly. GitHub CI is now the source of truth for the full matrix on the pushed head. ## Type of change - [x] New feature ## Linked issue Refs #225 (Phase 1 only — Phase 2/3 deferred per [my comment](#225 (comment))) ## What's in here **3 new agent tools** in `packages/core/src/tools/`: 1. `decompose-to-ui-kit.ts` — orchestrator. Takes a source image (from chat context) + design brief, emits `ui_kits/<slug>/{index.html, components/*.tsx, tokens.css, manifest.json, README.md}` to the virtual FS. Output carries `schemaVersion: 1` so downstream coding agents (Claude Code, Cursor) can evolve safely. 2. `verify-ui-kit-parity.ts` — deterministic verifier. 3 signals: element-count parity, visible-text coverage, token coverage. Returns a `ParityReport` with `passCount/totalChecks` derived score (no LLM in the loop, no floats). 3. `verify-ui-kit-visual-parity.ts` — vision-LLM judge wrapper. Takes a host-injected `judgeVisualParity` callback, runs a 12-check boolean rubric across 5 dimensions (layout / color / typography / content / components), returns `parityScore = passCount / totalChecks` and a bounded-enum `status` (`verified | needs_review | needs_iteration | failed | unavailable`). **Host wiring** in `apps/desktop/src/main/`: - `render-ui-kit.ts` — offscreen `BrowserWindow.capturePage()` for the rendered ui_kit - `judge-visual-parity.ts` — vision-judge prompt builder + LLM dispatcher using the existing `complete()` provider abstraction - `ipc/generate.ts` — injects `renderUiKit` + `judgeVisualParity` into the agent runtime alongside `generate_image_asset` - `ipc/runtime-fs.ts` — seeds image attachments into the runtime FS, including default `source.png` for visual parity **Renderer**: - `AddMenu.tsx` — new "Decompose to UI Kit" entry, disabled when no artifact / generation in flight - `Sidebar.tsx` — `triggerDecompose(designId, locale)` action wired to the menu item - `store.ts` / `store/slices/chat.ts` — 3-branch toast feedback (busy / unavailable / started) + per-tool-call cost row when the visual judge resolves - `hooks/decomposePrompt.ts` — locale-aware (EN/ZH) structured prompt that walks the agent through decompose → verify → reconcile → iterate (max 2) → done with HONEST cost summary **Tests** — full vitest coverage in `*.test.ts` next to each tool: - `decompose-to-ui-kit.test.ts` (263 LOC) - `verify-ui-kit-parity.test.ts` (180 LOC) - `verify-ui-kit-visual-parity.test.ts` (295 LOC) **i18n** — 9 new keys × EN + ZH for the menu entry, toast titles/descriptions, and cost row. ## Design decisions **Boolean rubric, not floats.** Every visual parity check is `{passed: boolean}`, derived `parityScore = passCount / totalChecks`. The `status` field is a bounded enum derived from thresholds (100% → `verified`, ≥85% → `needs_review`, ≥60% → `needs_iteration`, <60% → `failed`). No LLM-fabricated confidence floats, no scoring inflation. Aligns with the project's `HONEST_SCORES` precedent (`done.ts`'s `verified: boolean` field). **Host-injected callbacks, not framework lock-in.** `verify-ui-kit-visual-parity.ts` doesn't import any LLM SDK or any Electron API. It takes `RenderUiKitFn` and `JudgeVisualParityFn` as deps. If the host doesn't inject them (e.g. a future headless CLI), the tool returns `status: 'unavailable'` honestly instead of crashing. Mirrors how `generate_image_asset` is keyed on `deps.generateImageAsset`. **In-memory output via Files panel, no schema bump.** Per my open binary in the issue thread, this PR ships option (a): the `ui_kits/<slug>/` lands in the design's virtual FS, surfaces in the existing Files panel, and uses the existing ZIP export for handoff to a coding agent. No SQLite migration, smallest blast radius, consistent with how `polishPrompt.ts`'s second-pass mutates only in-memory state. **`schemaVersion: 1` on the manifest.** Downstream consumers (Claude Code, Cursor) need a stable contract. Adding fields requires no version bump; renaming or removing fields requires `schemaVersion: 2` and a parallel-emit window. ## Anti-hallucination guardrails The deterministic verifier (`verify-ui-kit-parity.ts`) checks visible-text coverage on the emitted ui_kit vs the source brief — if the agent dropped any text content, it fails BEFORE the LLM judge runs. This catches data hallucination cheap. The LLM judge then handles only semantic-quality dimensions (visual hierarchy, color harmony, typography pairing, etc.). ## Cost surfacing Every `verify_ui_kit_visual_parity` resolution pushes a toast with `passCount/totalChecks · status · $cost.NNNN`. Reads defensively from `result.details` so future contract drift degrades silently rather than crashing the renderer. The `done` tool's prompt-driven summary additionally requires the agent to report total run cost, per the `HONEST_STATUS` precedent. ## Checklist - [x] I read [`docs/VISION.md`](../docs/VISION.md), [`docs/PRINCIPLES.md`](../docs/PRINCIPLES.md), and [`CLAUDE.md`](../CLAUDE.md) before starting - [x] Commits are signed with DCO (`git commit -s`) - [x] Rebased onto current `main`; `pnpm lint`, targeted typechecks, core test, desktop runtime/generate tests, and providers test pass locally (full GitHub CI is re-running on `eed7cbc`) - [x] Added/updated tests for the change (738 LOC across 3 new test files) - [x] Added a changeset (`pnpm changeset`) — see `.changeset/decompose-to-ui-kit.md` - [x] Updated docs if behavior changed — `BENCHMARKS.md` (new), `README.md` + `README.zh-CN.md` (Decompose to UI Kit feature card + hero PNG + iter-reel GIF) ## Dependency additions (if any) None. All three new tools use only `@mariozechner/pi-agent-core`'s `AgentTool` factory pattern that's already a prod dep. ## Screenshots / recordings (UI changes) **Side-by-side hero — source vs agent-emitted ui_kit (`e2e-opus-final` run, parityScore 0.90):** ![Decompose to UI Kit hero](https://raw.githubusercontent.com/HomenShum/open-codesign/feat/decompose-to-ui-kit/website/public/screenshots/decompose-to-ui-kit.png) **4-frame reconcile reel from the `e2e-nodebench-iter` run (iter-0 → iter-1 with honest score drift 0.82 → 0.78 — boolean rubric exposes the regression instead of hiding it):** ![Iter reel](https://raw.githubusercontent.com/HomenShum/open-codesign/feat/decompose-to-ui-kit/website/public/demos/decompose-iter-reel.gif) [MP4 version](https://raw.githubusercontent.com/HomenShum/open-codesign/feat/decompose-to-ui-kit/website/public/demos/decompose-iter-reel.mp4) for higher fidelity. **Live-recorded session demo** (real Electron app, no stitching) — recording in progress, will edit this PR description when the GIF is ready. ETA same day. ## Cross-tier benchmarks `BENCHMARKS.md` at repo root has the full methodology + run-by-run real-data results across model tiers (Opus, Pro+Pro+iterate, Kimi+Gemini3, NodeBench iter), reproducibility instructions, honest non-claims, and research citations (WebDevJudge, Prometheus-Vision, Trust-but-Verify ICCV 2025). | Run | Decompose | Judge | parityScore | Gaps surfaced | |---|---|---|---:|---:| | e2e-opus-final | claude-opus-4-1 | claude-opus-4-1 | 0.90 | 4 | | e2e-nodebench-iter (iter-0) | gemini-3-pro-preview | gemini-3-pro-preview | 0.82 | 6 | | e2e-nodebench-iter (iter-1) | gemini-3-pro-preview | gemini-3-pro-preview | 0.78 | 5 | | e2e-bank-kimi-gemini3 | kimi-k2.6 | gemini-3-pro-preview | 0.78 | 8 | | e2e-nodebench-B | kimi-k2.6 | gemini-3-pro-preview | 0.60 | 7 | Note the iter-0 → iter-1 regression on the same source: agent fixed some gaps but introduced new layout drift. The boolean rubric exposes this honestly rather than fudging the score upward. This is the intended behavior, not a bug. ## Scope discipline notes - **PR size**: ~1500 LOC of substantive change (3 tools + 3 test files + agent wiring + i18n + 1 hook). Most of the diff stat (`pnpm-lock.yaml`) is mechanical regen. This is over the soft 400-LOC bar in CONTRIBUTING.md, but it's been pre-discussed in #225 and the change is a single concern (one new feature path, no refactor mixed in). Happy to split into 3 PRs (per-tool) if maintainer prefers — say the word. - **What's NOT in scope** (from #225 thread): multi-page flow (Phase 3, separate issue), gpt-image-2 generation step (Phase 2, separate Discussion), persistence-to-disk (option (b) from the binary I posed — staying with option (a) for blast radius) - **Three systemic dependencies surfaced during dogfood** (rollback / capability-aware failover / spiral-detector): filing as separate Discussions in `Ideas` category, not bundling here. Each is a meaningful subsystem that deserves alignment before code. ## Branch state at PR open - 9 commits ahead of `upstream/main` - 11 commits behind (mostly `chore(deps)` bumps including pi-agent-core 0.67.68 → 0.70.2; my branch is on 0.67.68) - **Will rebase against latest main on request** — wanted to open the PR with the as-built state for clarity first. The pi-agent-core 0.70.2 bump may require small adjustments to the new tools' `AgentTool` shape; I'll handle that in the rebase pass. ## Why this is ready to review now - Real cross-tier benchmarks in `BENCHMARKS.md`, not synthetic - Visual proof embedded above (hero + reel) - Test coverage matches existing tools - Pattern conformance: every new file mirrors an existing precedent - Deliberate scope: closes Phase 1 of the issue cleanly, defers the rest visibly Looking forward to feedback. Happy to address structural concerns first before iterating on smaller polish. --------- Signed-off-by: homen <hshum2018@gmail.com> Signed-off-by: Sun-sunshine06 <Sun-sunshine06@users.noreply.github.com> Co-authored-by: Sun-sunshine06 <Sun-sunshine06@users.noreply.github.com>

yangjunx21 and others added 7 commits April 23, 2026 11:31

feat(providers): add image generation client

d017617

Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>

feat(core): add image asset generation tool

42487a1

Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>

feat(desktop): add image generation settings

a19b76a

Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>

fix(image-gen): harden image-asset pipeline and polish settings UX

02f85a4

Made-with: Cursor Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>

github-actions Bot added area:desktop apps/desktop (Electron shell, renderer) area:core packages/core (generation orchestration) area:providers packages/providers (pi-ai adapter, model calls) labels Apr 23, 2026

github-actions Bot reviewed Apr 23, 2026

View reviewed changes

yangjunx21 merged commit 1377b4f into main Apr 23, 2026
7 checks passed

yangjunx21 deleted the gpt-image-2-support branch April 23, 2026 05:32

hqhq1025 mentioned this pull request Apr 23, 2026

feat(desktop): workspace for each design projects #173

Merged

7 tasks

This was referenced Apr 26, 2026

[Feature]: image 2 已经够厉害了，最需要的是如何把生成好的UI 变成组件化，再到原型的过程！ #225

Open

feat(core): add decompose-to-ui-kit + boolean parity verifiers (Phase 1 of #225) #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gpt image 2 support#193

Gpt image 2 support#193
yangjunx21 merged 7 commits into
mainfrom
gpt-image-2-support

yangjunx21 commented Apr 23, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yangjunx21 commented Apr 23, 2026

Summary

Type of change

Linked issue

Checklist

Dependency additions (if any)

Screenshots / recordings (UI changes)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant