Skip to content

fix(projects-list,doctor): dedup project ids; ASCII glyphs for cp1252 stdout#2

Open
martinduncanson wants to merge 1 commit into
malphas-gh:mainfrom
martinduncanson:upstream-pr/projects-list-dedup-and-encoding
Open

fix(projects-list,doctor): dedup project ids; ASCII glyphs for cp1252 stdout#2
martinduncanson wants to merge 1 commit into
malphas-gh:mainfrom
martinduncanson:upstream-pr/projects-list-dedup-and-encoding

Conversation

@martinduncanson
Copy link
Copy Markdown
Contributor

Goal

Two long-standing bugs that surface on real-world portfolios but are easy to miss in single-project test setups. Both reproduce on a clean upstream checkout — no Phase-N features required.

Bug 1 — duplicate rows in projects list --all

discover_projects returned one row per directory rather than per id. When a portfolio has sibling directories that each carry a .project/settings.toml with the same id (e.g. a canonical clone alongside two worktree clones, or arb-prd / arb-prd-experiment / arb-prd-backup), the list emits the id N times — confusing for the operator and noisy for any consumer of the JSON output.

Fix: dedup by id, preferring the canonical directory (name == id) over worktree clones. When no directory matches the id, the fallback is sorted-first-seenPath.iterdir() ordering is not guaranteed across filesystems, so we sort by directory name before iterating to keep the choice reproducible.

Bug 2 — UnicodeEncodeError on Windows cp1252 stdout

Five hand-written click.echo lines emit non-ASCII glyphs:

Line Glyph Code Command path
cli.py:276 U+25CB projects list --all (untracked-repo block)
cli.py:481 U+2713 doctor success line
cli.py:946 U+2192 next (in-progress block)
cli.py:951 U+2717 next (blocked block)
cli.py:1566 U+2713 setup --check success line
cli.py:1762 / U+2713 / U+25CB issues list status column

On Windows, the default stdout encoding is cp1252, which can't encode these characters. Running clawpm projects list --all on Windows prints a few rows of the table and then crashes mid-render with:

UnicodeEncodeError: 'charmap' codec can't encode character '\u25cb' in position 2

Fix: ASCII-only in hand-written click.echo lines (-, [OK], x, ->). Tabulate-style table rendering is unchanged — those formatters adapt their box-drawing chars to live stdout encoding at runtime, so they emit Unicode in Linux/Mac and ASCII on Windows automatically.

Tests

tests/test_dedup_and_encoding.py (4 new tests, all pass alongside the existing suite):

  • test_discover_projects_dedups_by_id — three worktree-style siblings produce one row
  • test_canonical_dir_wins_over_worktreename == id wins over siblings
  • test_fallback_is_deterministic_when_no_canonical_dir — sorted-first-seen when nothing matches the id
  • test_untracked_block_is_cp1252_safe — untracked-repo block contains no banned glyphs

Scope

  • src/clawpm/discovery.pydiscover_projects dedup + sorted iteration
  • src/clawpm/cli.py — five glyph replacements in hand-written echo lines
  • tests/test_dedup_and_encoding.py — new

No changes to public command shapes, JSON output keys, or settings.toml schema. Tabulated output rendering is unchanged.

Note

This PR ports two upstream-native bugs surfaced by our fork's heavier dogfooding (multi-project portfolios with worktree clones, daily Windows usage). The fork carries additional Phase-N features that depend on errors="replace" reads etc. — those stay fork-only; this PR keeps to fixes that apply to upstream as-is.

… stdout

Two long-standing bugs that surface on real-world portfolios but are
relatively easy to miss in single-project test setups.

1. `clawpm projects list --all` could emit the same project id multiple
   times when sibling directories each carried a `.project/settings.toml`
   with the same `id` (e.g. a canonical clone alongside two worktree
   clones). `discover_projects` now dedups by id, preferring the
   canonical directory (`name == id`) over worktree-style siblings.
   When no directory's name matches the id, the fallback is
   sorted-first-seen — `Path.iterdir()` order is not guaranteed across
   filesystems, so we sort by directory name before iterating.

2. Five `click.echo` lines emit non-ASCII glyphs (`○` U+25CB, `✓` U+2713,
   `✗` U+2717, `→` U+2192) which `UnicodeEncodeError` on Windows cp1252
   stdout. Symptom: `projects list --all`, `doctor`, `next`, and
   `issues list` print a few rows then crash mid-render. Swapped each
   for ASCII (`-`, `[OK]`, `x`, `->`). Tabulate-style table rendering
   is unchanged — it adapts to live stdout encoding at runtime.

Tests in `tests/test_dedup_and_encoding.py` (4 new) cover:
- one row per id with three worktree-style siblings
- canonical-dir-wins over worktree clones
- deterministic fallback when no directory matches the id
- untracked-repo block uses ASCII glyphs only

All existing tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant