fix(projects-list,doctor): dedup project ids; ASCII glyphs for cp1252 stdout#2
Open
martinduncanson wants to merge 1 commit into
Conversation
… stdout Two long-standing bugs that surface on real-world portfolios but are relatively easy to miss in single-project test setups. 1. `clawpm projects list --all` could emit the same project id multiple times when sibling directories each carried a `.project/settings.toml` with the same `id` (e.g. a canonical clone alongside two worktree clones). `discover_projects` now dedups by id, preferring the canonical directory (`name == id`) over worktree-style siblings. When no directory's name matches the id, the fallback is sorted-first-seen — `Path.iterdir()` order is not guaranteed across filesystems, so we sort by directory name before iterating. 2. Five `click.echo` lines emit non-ASCII glyphs (`○` U+25CB, `✓` U+2713, `✗` U+2717, `→` U+2192) which `UnicodeEncodeError` on Windows cp1252 stdout. Symptom: `projects list --all`, `doctor`, `next`, and `issues list` print a few rows then crash mid-render. Swapped each for ASCII (`-`, `[OK]`, `x`, `->`). Tabulate-style table rendering is unchanged — it adapts to live stdout encoding at runtime. Tests in `tests/test_dedup_and_encoding.py` (4 new) cover: - one row per id with three worktree-style siblings - canonical-dir-wins over worktree clones - deterministic fallback when no directory matches the id - untracked-repo block uses ASCII glyphs only All existing tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
Two long-standing bugs that surface on real-world portfolios but are easy to miss in single-project test setups. Both reproduce on a clean upstream checkout — no Phase-N features required.
Bug 1 — duplicate rows in
projects list --alldiscover_projectsreturned one row per directory rather than per id. When a portfolio has sibling directories that each carry a.project/settings.tomlwith the sameid(e.g. a canonical clone alongside two worktree clones, orarb-prd / arb-prd-experiment / arb-prd-backup), the list emits the id N times — confusing for the operator and noisy for any consumer of the JSON output.Fix: dedup by id, preferring the canonical directory (
name == id) over worktree clones. When no directory matches the id, the fallback is sorted-first-seen —Path.iterdir()ordering is not guaranteed across filesystems, so we sort by directory name before iterating to keep the choice reproducible.Bug 2 —
UnicodeEncodeErroron Windows cp1252 stdoutFive hand-written
click.echolines emit non-ASCII glyphs:cli.py:276○projects list --all(untracked-repo block)cli.py:481✓doctorsuccess linecli.py:946→next(in-progress block)cli.py:951✗next(blocked block)cli.py:1566✓setup --checksuccess linecli.py:1762✓/○issues liststatus columnOn Windows, the default stdout encoding is
cp1252, which can't encode these characters. Runningclawpm projects list --allon Windows prints a few rows of the table and then crashes mid-render with:Fix: ASCII-only in hand-written
click.echolines (-,[OK],x,->). Tabulate-style table rendering is unchanged — those formatters adapt their box-drawing chars to live stdout encoding at runtime, so they emit Unicode in Linux/Mac and ASCII on Windows automatically.Tests
tests/test_dedup_and_encoding.py(4 new tests, all pass alongside the existing suite):test_discover_projects_dedups_by_id— three worktree-style siblings produce one rowtest_canonical_dir_wins_over_worktree—name == idwins over siblingstest_fallback_is_deterministic_when_no_canonical_dir— sorted-first-seen when nothing matches the idtest_untracked_block_is_cp1252_safe— untracked-repo block contains no banned glyphsScope
src/clawpm/discovery.py—discover_projectsdedup + sorted iterationsrc/clawpm/cli.py— five glyph replacements in hand-written echo linestests/test_dedup_and_encoding.py— newNo changes to public command shapes, JSON output keys, or settings.toml schema. Tabulated output rendering is unchanged.
Note
This PR ports two upstream-native bugs surfaced by our fork's heavier dogfooding (multi-project portfolios with worktree clones, daily Windows usage). The fork carries additional Phase-N features that depend on
errors="replace"reads etc. — those stay fork-only; this PR keeps to fixes that apply to upstream as-is.