Skip to content

Gpt image 2 support#193

Merged
yangjunx21 merged 7 commits into
mainfrom
gpt-image-2-support
Apr 23, 2026
Merged

Gpt image 2 support#193
yangjunx21 merged 7 commits into
mainfrom
gpt-image-2-support

Conversation

@yangjunx21

Copy link
Copy Markdown
Collaborator

Summary

Lets the agent call gpt-image-2 (or OpenRouter image models) on demand to generate bitmap assets (logos, hero images, illustrations) while producing a design, and embeds them seamlessly in preview + exports. Off by default; users opt in from Settings.

Type of change

  • New feature

Linked issue

Checklist

  • I read docs/VISION.md, docs/PRINCIPLES.md, and CLAUDE.md before starting
  • Commits are signed with DCO (git commit -s)
  • pnpm lint && pnpm typecheck && pnpm test passes locally
  • Added/updated tests for the change
  • Added a changeset (pnpm changeset) if user-visible
  • Updated docs if behavior changed

Dependency additions (if any)

Screenshots / recordings (UI changes)

yangjunx21 and others added 7 commits April 23, 2026 11:31
Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>
Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>
Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>
POSIX-specific assertions were baked into six tests, which made the full
`pnpm test` suite fail on Windows even though the production code is
already cross-platform:

- token-store: `0o600` mode bits aren't enforceable on NTFS (reports
  0o666); guard the assertion with `process.platform !== 'win32'`.
- skills/loader: `new URL(...).pathname` yields `/D:/...` on Windows, so
  `readdir` sees zero files; use `fileURLToPath()` instead.
- opencode-config, locale-ipc, preferences-ipc: replace hard-coded
  forward-slash path strings with `path.join()`-built expectations that
  mirror whatever separator the host OS uses.
- boot-fallback: `/dev/null/...` is only guaranteed-unwritable on POSIX;
  build a parent-is-a-regular-file path instead so `mkdirSync` throws
  ENOTDIR on both platforms.

All 10 workspace packages' tests now pass on Windows.

Made-with: Cursor
Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>
Made-with: Cursor
Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>
CodeQL flagged the anchored-quantifier pair `/\/+$/` + `/^\/+/` inside
`joinEndpoint` (packages/providers/src/images.ts) as a potential ReDoS
on library-supplied input. Replace both regex calls with explicit
single-pass scans over the trailing/leading `/` characters — same
behaviour, trivially linear, no CodeQL alert.

Also unblock the Windows test run on this branch:
- `token-store.test.ts`: a new 0o600-mode assertion added on main
  fails on NTFS (always reports 0o666); guard it the same way the
  existing sibling assertion is guarded.
- `safe-read.test.ts`: the symlink-acceptance case requires admin /
  Developer Mode on Windows and otherwise throws EPERM; skip the
  case when symlink creation is denied, keeping full coverage on
  POSIX CI.

Signed-off-by: 杨峻骁 <yangjunx21@mails.tsinghua.edu.cn>
Made-with: Cursor
Resolve conflict in Settings.tsx: close JSX tags around the new ImageGenerationPanel Save button/wrappers that were dropped during the merge, and drop hardcoded hex fallbacks (#16a34a, #d97706) and text-[10px] in the image-generation status badge in favor of --color-success / --color-warning / text-[var(--text-xs)] tokens.

Made-with: Cursor
@github-actions github-actions Bot added area:desktop apps/desktop (Electron shell, renderer) area:core packages/core (generation orchestration) area:providers packages/providers (pi-ai adapter, model calls) labels Apr 23, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Enabled image generation silently degrades by disabling the image tool when credentials are missing. This violates the project rule to avoid silent fallbacks and can produce non-image outputs without explicit user-facing failure. Evidence: apps/desktop/src/main/image-generation-settings.ts:124, apps/desktop/src/main/index.ts:378, apps/desktop/src/main/index.ts:433.
    Suggested fix:
    const cfg = getCachedConfig();
    const imageConfig = cfg ? resolveImageGenerationConfig(cfg) : null;
    if (cfg?.imageGeneration?.enabled === true && imageConfig === null) {
      throw new CodesignError(
        'Image generation is enabled but credentials are missing or invalid. Update Settings > Image Generation.',
        ERROR_CODES.PROVIDER_AUTH_MISSING,
      );
    }

Summary

  • Review mode: initial
  • Not found in repo/docs: docs/VISION.md, docs/PRINCIPLES.md.

Testing

  • Not run (automation)

open-codesign Bot

throw err;
}
}
: undefined;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] This branch silently disables generate_image_asset when image generation is enabled but credentials are unavailable (resolveImageGenerationConfig returns null). That is a silent fallback rather than a surfaced error.

Suggested fix:

const cfg = getCachedConfig();
const imageConfig = cfg ? resolveImageGenerationConfig(cfg) : null;
if (cfg?.imageGeneration?.enabled === true && imageConfig === null) {
  throw new CodesignError(
    'Image generation is enabled but credentials are missing or invalid. Update Settings > Image Generation.',
    ERROR_CODES.PROVIDER_AUTH_MISSING,
  );
}

@yangjunx21 yangjunx21 merged commit 1377b4f into main Apr 23, 2026
7 checks passed
@yangjunx21 yangjunx21 deleted the gpt-image-2-support branch April 23, 2026 05:32
Sun-sunshine06 added a commit that referenced this pull request Jun 6, 2026
… 1 of #225) (#241)

## Summary

Phase 1 of #225: a single-image → componentized `ui_kit/` decomposition
pipeline that emits a coding-agent-ready bundle, plus deterministic +
vision verifiers that self-check parity using a 12-question boolean
rubric and re-iterate on gaps. Uses existing `userImages` plumbing (PR
#193) and adds three new agent tools that mirror existing patterns
(`done.ts` / `generate-image-asset.ts`). Ends in the chat sidebar with a
one-click trigger that fires a structured prompt, walks the agent
through decompose → verify → reconcile → done, and surfaces
per-decompose cost as a toast. No new prod deps, no SQLite schema
change, in-memory output via the Files panel.

This PR addresses Phase 1 of #225 only. The Phase 2 (gpt-image-2
generation in the loop) and Phase 3 (multi-page flow) cuts I committed
to in the issue thread are intentionally not included.

## 2026-06-06 rebase update

Rebased onto current `OpenCoworkAI/open-codesign:main` at `b2d020d` and
force-pushed the PR branch to `eed7cbc`. GitHub now reports the PR as
mergeable again.

The conflict resolution preserves current `main` architecture:
- `packages/core/src/index.ts` keeps the current `inspect_workspace`
public exports and only appends the visual parity types/functions. The
legacy `read_design_system` core public export was not restored.
- Generate IPC wiring now lives in
`apps/desktop/src/main/ipc/generate.ts`; runtime FS source-image seeding
lives in `apps/desktop/src/main/ipc/runtime-fs.ts`.
- Renderer cost-toast logic now lives in the sliced store at
`apps/desktop/src/renderer/src/store/slices/chat.ts`.
- The first image attachment is seeded as `source.png` for
`verify_ui_kit_visual_parity`, with regression coverage in
`apps/desktop/src/main/index.workspace.test.ts`.

Local verification after rebase:
- `pnpm lint`
- `pnpm --filter @open-codesign/core typecheck`
- `pnpm --filter @open-codesign/desktop typecheck`
- `pnpm --filter @open-codesign/core test`
- `pnpm --filter @open-codesign/desktop test --
src/main/index.workspace.test.ts
src/main/ipc/generate.workspace-rename.test.ts`
- `pnpm --filter @open-codesign/providers test`

Note: local pre-push full `pnpm test` hit a transient timeout in
`packages/providers/src/codex/oauth-server.test.ts` during the
concurrent turbo run; the same providers test passed immediately when
rerun directly. GitHub CI is now the source of truth for the full matrix
on the pushed head.

## Type of change

- [x] New feature

## Linked issue

Refs #225 (Phase 1 only — Phase 2/3 deferred per [my
comment](#225 (comment)))

## What's in here

**3 new agent tools** in `packages/core/src/tools/`:
1. `decompose-to-ui-kit.ts` — orchestrator. Takes a source image (from
chat context) + design brief, emits `ui_kits/<slug>/{index.html,
components/*.tsx, tokens.css, manifest.json, README.md}` to the virtual
FS. Output carries `schemaVersion: 1` so downstream coding agents
(Claude Code, Cursor) can evolve safely.
2. `verify-ui-kit-parity.ts` — deterministic verifier. 3 signals:
element-count parity, visible-text coverage, token coverage. Returns a
`ParityReport` with `passCount/totalChecks` derived score (no LLM in the
loop, no floats).
3. `verify-ui-kit-visual-parity.ts` — vision-LLM judge wrapper. Takes a
host-injected `judgeVisualParity` callback, runs a 12-check boolean
rubric across 5 dimensions (layout / color / typography / content /
components), returns `parityScore = passCount / totalChecks` and a
bounded-enum `status` (`verified | needs_review | needs_iteration |
failed | unavailable`).

**Host wiring** in `apps/desktop/src/main/`:
- `render-ui-kit.ts` — offscreen `BrowserWindow.capturePage()` for the
rendered ui_kit
- `judge-visual-parity.ts` — vision-judge prompt builder + LLM
dispatcher using the existing `complete()` provider abstraction
- `ipc/generate.ts` — injects `renderUiKit` + `judgeVisualParity` into
the agent runtime alongside `generate_image_asset`
- `ipc/runtime-fs.ts` — seeds image attachments into the runtime FS,
including default `source.png` for visual parity

**Renderer**:
- `AddMenu.tsx` — new "Decompose to UI Kit" entry, disabled when no
artifact / generation in flight
- `Sidebar.tsx` — `triggerDecompose(designId, locale)` action wired to
the menu item
- `store.ts` / `store/slices/chat.ts` — 3-branch toast feedback (busy /
unavailable / started) + per-tool-call cost row when the visual judge
resolves
- `hooks/decomposePrompt.ts` — locale-aware (EN/ZH) structured prompt
that walks the agent through decompose → verify → reconcile → iterate
(max 2) → done with HONEST cost summary

**Tests** — full vitest coverage in `*.test.ts` next to each tool:
- `decompose-to-ui-kit.test.ts` (263 LOC)
- `verify-ui-kit-parity.test.ts` (180 LOC)
- `verify-ui-kit-visual-parity.test.ts` (295 LOC)

**i18n** — 9 new keys × EN + ZH for the menu entry, toast
titles/descriptions, and cost row.

## Design decisions

**Boolean rubric, not floats.** Every visual parity check is `{passed:
boolean}`, derived `parityScore = passCount / totalChecks`. The `status`
field is a bounded enum derived from thresholds (100% → `verified`, ≥85%
→ `needs_review`, ≥60% → `needs_iteration`, <60% → `failed`). No
LLM-fabricated confidence floats, no scoring inflation. Aligns with the
project's `HONEST_SCORES` precedent (`done.ts`'s `verified: boolean`
field).

**Host-injected callbacks, not framework lock-in.**
`verify-ui-kit-visual-parity.ts` doesn't import any LLM SDK or any
Electron API. It takes `RenderUiKitFn` and `JudgeVisualParityFn` as
deps. If the host doesn't inject them (e.g. a future headless CLI), the
tool returns `status: 'unavailable'` honestly instead of crashing.
Mirrors how `generate_image_asset` is keyed on
`deps.generateImageAsset`.

**In-memory output via Files panel, no schema bump.** Per my open binary
in the issue thread, this PR ships option (a): the `ui_kits/<slug>/`
lands in the design's virtual FS, surfaces in the existing Files panel,
and uses the existing ZIP export for handoff to a coding agent. No
SQLite migration, smallest blast radius, consistent with how
`polishPrompt.ts`'s second-pass mutates only in-memory state.

**`schemaVersion: 1` on the manifest.** Downstream consumers (Claude
Code, Cursor) need a stable contract. Adding fields requires no version
bump; renaming or removing fields requires `schemaVersion: 2` and a
parallel-emit window.

## Anti-hallucination guardrails

The deterministic verifier (`verify-ui-kit-parity.ts`) checks
visible-text coverage on the emitted ui_kit vs the source brief — if the
agent dropped any text content, it fails BEFORE the LLM judge runs. This
catches data hallucination cheap. The LLM judge then handles only
semantic-quality dimensions (visual hierarchy, color harmony, typography
pairing, etc.).

## Cost surfacing

Every `verify_ui_kit_visual_parity` resolution pushes a toast with
`passCount/totalChecks · status · $cost.NNNN`. Reads defensively from
`result.details` so future contract drift degrades silently rather than
crashing the renderer. The `done` tool's prompt-driven summary
additionally requires the agent to report total run cost, per the
`HONEST_STATUS` precedent.

## Checklist

- [x] I read [`docs/VISION.md`](../docs/VISION.md),
[`docs/PRINCIPLES.md`](../docs/PRINCIPLES.md), and
[`CLAUDE.md`](../CLAUDE.md) before starting
- [x] Commits are signed with DCO (`git commit -s`)
- [x] Rebased onto current `main`; `pnpm lint`, targeted typechecks,
core test, desktop runtime/generate tests, and providers test pass
locally (full GitHub CI is re-running on `eed7cbc`)
- [x] Added/updated tests for the change (738 LOC across 3 new test
files)
- [x] Added a changeset (`pnpm changeset`) — see
`.changeset/decompose-to-ui-kit.md`
- [x] Updated docs if behavior changed — `BENCHMARKS.md` (new),
`README.md` + `README.zh-CN.md` (Decompose to UI Kit feature card + hero
PNG + iter-reel GIF)

## Dependency additions (if any)

None. All three new tools use only `@mariozechner/pi-agent-core`'s
`AgentTool` factory pattern that's already a prod dep.

## Screenshots / recordings (UI changes)

**Side-by-side hero — source vs agent-emitted ui_kit (`e2e-opus-final`
run, parityScore 0.90):**

![Decompose to UI Kit
hero](https://raw.githubusercontent.com/HomenShum/open-codesign/feat/decompose-to-ui-kit/website/public/screenshots/decompose-to-ui-kit.png)

**4-frame reconcile reel from the `e2e-nodebench-iter` run (iter-0 →
iter-1 with honest score drift 0.82 → 0.78 — boolean rubric exposes the
regression instead of hiding it):**

![Iter
reel](https://raw.githubusercontent.com/HomenShum/open-codesign/feat/decompose-to-ui-kit/website/public/demos/decompose-iter-reel.gif)

[MP4
version](https://raw.githubusercontent.com/HomenShum/open-codesign/feat/decompose-to-ui-kit/website/public/demos/decompose-iter-reel.mp4)
for higher fidelity.

**Live-recorded session demo** (real Electron app, no stitching) —
recording in progress, will edit this PR description when the GIF is
ready. ETA same day.

## Cross-tier benchmarks

`BENCHMARKS.md` at repo root has the full methodology + run-by-run
real-data results across model tiers (Opus, Pro+Pro+iterate,
Kimi+Gemini3, NodeBench iter), reproducibility instructions, honest
non-claims, and research citations (WebDevJudge, Prometheus-Vision,
Trust-but-Verify ICCV 2025).

| Run | Decompose | Judge | parityScore | Gaps surfaced |
|---|---|---|---:|---:|
| e2e-opus-final | claude-opus-4-1 | claude-opus-4-1 | 0.90 | 4 |
| e2e-nodebench-iter (iter-0) | gemini-3-pro-preview |
gemini-3-pro-preview | 0.82 | 6 |
| e2e-nodebench-iter (iter-1) | gemini-3-pro-preview |
gemini-3-pro-preview | 0.78 | 5 |
| e2e-bank-kimi-gemini3 | kimi-k2.6 | gemini-3-pro-preview | 0.78 | 8 |
| e2e-nodebench-B | kimi-k2.6 | gemini-3-pro-preview | 0.60 | 7 |

Note the iter-0 → iter-1 regression on the same source: agent fixed some
gaps but introduced new layout drift. The boolean rubric exposes this
honestly rather than fudging the score upward. This is the intended
behavior, not a bug.

## Scope discipline notes

- **PR size**: ~1500 LOC of substantive change (3 tools + 3 test files +
agent wiring + i18n + 1 hook). Most of the diff stat (`pnpm-lock.yaml`)
is mechanical regen. This is over the soft 400-LOC bar in
CONTRIBUTING.md, but it's been pre-discussed in #225 and the change is a
single concern (one new feature path, no refactor mixed in). Happy to
split into 3 PRs (per-tool) if maintainer prefers — say the word.
- **What's NOT in scope** (from #225 thread): multi-page flow (Phase 3,
separate issue), gpt-image-2 generation step (Phase 2, separate
Discussion), persistence-to-disk (option (b) from the binary I posed —
staying with option (a) for blast radius)
- **Three systemic dependencies surfaced during dogfood** (rollback /
capability-aware failover / spiral-detector): filing as separate
Discussions in `Ideas` category, not bundling here. Each is a meaningful
subsystem that deserves alignment before code.

## Branch state at PR open

- 9 commits ahead of `upstream/main`
- 11 commits behind (mostly `chore(deps)` bumps including pi-agent-core
0.67.68 → 0.70.2; my branch is on 0.67.68)
- **Will rebase against latest main on request** — wanted to open the PR
with the as-built state for clarity first. The pi-agent-core 0.70.2 bump
may require small adjustments to the new tools' `AgentTool` shape; I'll
handle that in the rebase pass.

## Why this is ready to review now

- Real cross-tier benchmarks in `BENCHMARKS.md`, not synthetic
- Visual proof embedded above (hero + reel)
- Test coverage matches existing tools
- Pattern conformance: every new file mirrors an existing precedent
- Deliberate scope: closes Phase 1 of the issue cleanly, defers the rest
visibly

Looking forward to feedback. Happy to address structural concerns first
before iterating on smaller polish.

---------

Signed-off-by: homen <hshum2018@gmail.com>
Signed-off-by: Sun-sunshine06 <Sun-sunshine06@users.noreply.github.com>
Co-authored-by: Sun-sunshine06 <Sun-sunshine06@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core packages/core (generation orchestration) area:desktop apps/desktop (Electron shell, renderer) area:providers packages/providers (pi-ai adapter, model calls)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant