Skip to content

Skill: CLI Behaviour testing#197

Draft
sacOO7 wants to merge 1 commit intomainfrom
feature/skill-behavior-testing
Draft

Skill: CLI Behaviour testing#197
sacOO7 wants to merge 1 commit intomainfrom
feature/skill-behavior-testing

Conversation

@sacOO7
Copy link
Copy Markdown
Contributor

@sacOO7 sacOO7 commented Mar 27, 2026

  • Added skill to perform behavioural testing of given cli command group.
  • Generates report for the same under CLAUDE-BEHAVIOR-TESTING

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cli-web-cli Ready Ready Preview, Comment Mar 27, 2026 2:36pm

Request Review

@maratal
Copy link
Copy Markdown
Contributor

maratal commented Mar 30, 2026

Nice, prompted with "test only push send with device-id ...".

All Tests Passed

# Command Scenario Result
T1 push publish --help Help output complete, all flags listed ✓ PASS
T2 push publish --device-id --title --body Basic notification ✓ PASS
T3 push publish --device-id --title --body --data With custom data ✓ PASS
T4 push publish --device-id --payload <json> Inline JSON payload ✓ PASS
T5 push publish --device-id --payload ./file File payload ✓ PASS
T6 push publish --json JSON envelope valid, stdout clean ✓ PASS
T7 push publish (no target) Error: "A target is required" ✓ PASS
T8 push publish --device-id (no payload) Error: "No push payload provided" ✓ PASS
T9 push publish --json (no target) JSON error envelope, valid JSON ✓ PASS
T10 push publish --device-id --channel Channel-ignored warning emitted ✓ PASS*
T11 push batch-publish --help Help complete, all flags/examples listed ✓ PASS
T12 push batch-publish <json> --force Inline JSON batch ✓ PASS
T13 push batch-publish ./file --force File-based batch ✓ PASS
T14 push batch-publish <json> --json JSON output, valid envelope ✓ PASS
T15 push batch-publish (empty stdin) Error: invalid JSON array ✓ PASS
T16 push batch-publish "not-json" Error: invalid JSON array ✓ PASS
T17 push batch-publish '{"not":"array"}' Error: must be JSON array ✓ PASS
T18 push batch-publish (missing routing key) Error: "recipient" or "channels" required ✓ PASS
T19 push batch-publish --json (invalid) JSON error envelope, valid JSON ✓ PASS

*T10: channel-ignored warning appears on stdout

All sent pushes received by browser:

Screenshot 2026-03-30 at 02 51 35

@sacOO7
Copy link
Copy Markdown
Contributor Author

sacOO7 commented Mar 30, 2026

@maratal , great, so based on the --help section, it's able to run behavioural tests properly. Let me know if there are any edge cases we need to improve upon.

Copy link
Copy Markdown
Contributor

@AndyTWF AndyTWF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this has been done as an LLM workflow, and not as strengthening our existing testing infrastructure?

LLMs are inherently nondeterministic, there's no guarantee that they're going to spot the same things each time and they can't be effectively CI gated. There's no protection against regression (a markdown report doesn't protect us, a failing test does).

For example:

  • JSON/stdout cleanliness, this belongs in unit tests.
  • Cross-command workflows (subscribe + publish), that's what our E2E tests cover.

@sacOO7
Copy link
Copy Markdown
Contributor Author

sacOO7 commented Mar 30, 2026

Is there a reason why this has been done as an LLM workflow, and not as strengthening our existing testing infrastructure?

LLMs are inherently nondeterministic, there's no guarantee that they're going to spot the same things each time and they can't be effectively CI gated. There's no protection against regression (a markdown report doesn't protect us, a failing test does).

For example:

  • JSON/stdout cleanliness, this belongs in unit tests.
  • Cross-command workflows (subscribe + publish), that's what our E2E tests cover.

There are couple of reasons to do this.

  1. We don't need to manually run commands each time feature is updated/refactored or bug-fix is addressed.
  2. It provides direct insight into CLI behavior through black-box testing by simulating diverse test cases, including edge cases based on the skill configuration.
  3. Generates 3 reports REPORT_NON_JSON.md, REPORT_JSON.md and REPORT_PRIMARY.md
  4. REPORT_NON_JSON.md contains all sub-commands run without --json/--pretty-json flags, so we can just go through the all of the commands, edge-cases and their respective output. This makes it easy to review commands, edge cases, and their corresponding outputs, helping quickly identify errors.
  5. REPORT_JSON.md contains all sub-commands run with --pretty-json flags, so similarly, we can manually go through each command with respective input command and respective output.
  6. REPORT_PRIMARY.md -> This is the only thing that seems nondeterministic, since LLM needs to compare reports with respect to CLAUDE.md conventions, accordingly add it to the review report. Other than this, both reports REPORT_NON_JSON.md and REPORT_JSON.md are very deterministic which give direct insight into the actual command execution and it's respective output.

We already found few output specific formatting issues with this skill =>

For prompts:

1. Test ably rooms command group =>

Issues (actionable)

# Severity Command Issue Output Modes Affected
1 Minor rooms reactions send Domain fields (emoji, metadata, room) are spread at the top level of the JSON envelope instead of nested under a reaction domain key, deviating from the JSON data nesting convention in CLAUDE.md. Expected: {"type":"result","command":"rooms:reactions:send","success":true,"reaction":{"emoji":"thumbs_up","metadata":null},"room":"..."}. Actual: {"type":"result","command":"rooms:reactions:send","success":true,"emoji":"thumbs_up","metadata":null,"room":"..."} --json, --pretty-json
2 Minor rooms reactions subscribe Domain fields (clientId, metadata, name, room, timestamp) are spread at the top level instead of nested under a reaction domain key. Expected: data nested under "reaction":{...}. Actual: all fields at envelope level. --json, --pretty-json
3 Minor rooms messages subscribe "Duration elapsed" message printed before the final received message during shutdown — race condition in output ordering. No data loss occurs, but the output sequence is confusing to read. Steps to reproduce: subscribe with --duration 8, send a message near the end of the duration window. Human-readable
4 Minor rooms messages subscribe "Listening for messages. Press Ctrl+C to exit." printed once per room in multi-room subscribe (e.g., printed twice when subscribing to 2 rooms). Could consolidate into a single message after all rooms are subscribed. Steps to reproduce: ably rooms messages subscribe room-a room-b --duration 5 Human-readable

2. Test ably channels command group =>

Issue 1: Empty channel name returns exit code 0

  • Severity: Major
  • Affected command(s): channels publish
  • Output mode(s): Both (human-readable and JSON)
  • Description: pnpm cli channels publish "" "test" shows error "Could not find path: /channels/messages" but exits with code 0.
  • Expected: Non-zero exit code when all messages fail to publish. Ideally, validate empty channel name before API call with a clearer error message.
  • Steps to reproduce: pnpm cli channels publish "" "test"

Issue 2: SDK decode error with --encoding utf-8

  • Severity: Medium
  • Affected command(s): channels history (when retrieving messages published with --encoding utf-8)
  • Output mode(s): Both
  • Description: When messages are published with --encoding utf-8, retrieving them via channels history triggers an SDK error on stderr: [AblySDK Error] Error processing the utf-8 encoding, decoder returned 'Expected input of utf8Decode to be a buffer, arraybuffer, or view'; statusCode=400; code=40013. Messages still display correctly but the error pollutes output.
  • Steps to reproduce:
    1. pnpm cli channels publish my-channel "test" --encoding utf-8
    2. pnpm cli channels history my-channel
    3. Observe stderr SDK error

Issue 3: Occupancy subscribe human-readable format inconsistency

  • Severity: Minor
  • Affected command(s): channels occupancy subscribe
  • Output mode(s): Human-readable only
  • Description: The subscribe command dumps occupancy data as raw JSON (Occupancy Data: { "metrics": {...} }) while the get command uses clean labeled format (Connections: 0, Publishers: 0, etc.). Subscribe should use the same labeled format for consistency.
  • Steps to reproduce: pnpm cli channels occupancy subscribe my-channel --duration 5

Issue 4: Occupancy get human-readable missing fields present in JSON

  • Severity: Low
  • Affected command(s): channels occupancy get
  • Output mode(s): Human-readable only
  • Description: Human-readable output omits objectSubscribers and objectPublishers fields that are present in JSON output. Per conventions, non-JSON output should expose the same fields as JSON (omit only null/undefined/empty). All fields are 0, so this is an inconsistency in which zero-valued fields are shown.
  • Steps to reproduce: Compare pnpm cli channels occupancy get my-channel vs pnpm cli channels occupancy get my-channel --json

Issue 5: Occupancy subscribe JSON exposes internal event name

  • Severity: Low
  • Affected command(s): channels occupancy subscribe
  • Output mode(s): JSON only
  • Description: The event field in JSON output contains [meta]occupancy which is an internal Ably meta-channel event name. Consider using a more user-friendly label or omitting it.
  • Steps to reproduce: pnpm cli channels occupancy subscribe my-channel --duration 5 --json

Issue 6: Annotations publish/delete JSON nesting convention

  • Severity: Low
  • Affected command(s): channels annotations publish, channels annotations delete
  • Output mode(s): JSON only
  • Description: Domain data (channel, serial, name) is spread at the top level alongside envelope fields (type, command, success). Per JSON nesting conventions, domain data should be nested under a domain key (e.g., "annotation": {"channel": ..., "serial": ..., "name": ...}).
  • Steps to reproduce: pnpm cli channels annotations publish my-channel "serial" "reactions:unique.v1" --name test --json

Issue 7: Presence enter JSON includes "data": null

  • Severity: Low
  • Affected command(s): channels presence enter
  • Output mode(s): JSON only
  • Description: When no --data is provided, JSON output includes "data": null. Per conventions, null/undefined fields should be omitted for cleaner output.
  • Steps to reproduce: pnpm cli channels presence enter my-channel --client-id test --duration 3 --json

@maratal
Copy link
Copy Markdown
Contributor

maratal commented Mar 31, 2026

@maratal , great, so based on the --help section, it's able to run behavioural tests properly. Let me know if there are any edge cases we need to improve upon.

Make sure that it includes complete list of EXACT commands it executed. I would even expect them in a separate file for easy access and repeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants