Skill: CLI Behaviour testing by sacOO7 · Pull Request #197 · ably/ably-cli

sacOO7 · 2026-03-27T14:36:49Z

Added skill to perform behavioural testing of given cli command group.
Generates report for the same under CLAUDE-BEHAVIOR-TESTING

vercel · 2026-03-27T14:36:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
cli-web-cli	Ready	Preview, Comment	Mar 27, 2026 2:36pm

maratal · 2026-03-30T00:50:44Z

Nice, prompted with "test only push send with device-id ...".

All Tests Passed

#	Command	Scenario	Result
T1	`push publish --help`	Help output complete, all flags listed	✓ PASS
T2	`push publish --device-id --title --body`	Basic notification	✓ PASS
T3	`push publish --device-id --title --body --data`	With custom data	✓ PASS
T4	`push publish --device-id --payload <json>`	Inline JSON payload	✓ PASS
T5	`push publish --device-id --payload ./file`	File payload	✓ PASS
T6	`push publish --json`	JSON envelope valid, stdout clean	✓ PASS
T7	`push publish` (no target)	Error: "A target is required"	✓ PASS
T8	`push publish --device-id` (no payload)	Error: "No push payload provided"	✓ PASS
T9	`push publish --json` (no target)	JSON error envelope, valid JSON	✓ PASS
T10	`push publish --device-id --channel`	Channel-ignored warning emitted	✓ PASS*
T11	`push batch-publish --help`	Help complete, all flags/examples listed	✓ PASS
T12	`push batch-publish <json> --force`	Inline JSON batch	✓ PASS
T13	`push batch-publish ./file --force`	File-based batch	✓ PASS
T14	`push batch-publish <json> --json`	JSON output, valid envelope	✓ PASS
T15	`push batch-publish` (empty stdin)	Error: invalid JSON array	✓ PASS
T16	`push batch-publish "not-json"`	Error: invalid JSON array	✓ PASS
T17	`push batch-publish '{"not":"array"}'`	Error: must be JSON array	✓ PASS
T18	`push batch-publish` (missing routing key)	Error: "recipient" or "channels" required	✓ PASS
T19	`push batch-publish --json` (invalid)	JSON error envelope, valid JSON	✓ PASS

*T10: channel-ignored warning appears on stdout

All sent pushes received by browser:

sacOO7 · 2026-03-30T07:26:39Z

@maratal , great, so based on the --help section, it's able to run behavioural tests properly. Let me know if there are any edge cases we need to improve upon.

AndyTWF

Is there a reason why this has been done as an LLM workflow, and not as strengthening our existing testing infrastructure?

LLMs are inherently nondeterministic, there's no guarantee that they're going to spot the same things each time and they can't be effectively CI gated. There's no protection against regression (a markdown report doesn't protect us, a failing test does).

For example:

JSON/stdout cleanliness, this belongs in unit tests.
Cross-command workflows (subscribe + publish), that's what our E2E tests cover.

sacOO7 · 2026-03-30T09:30:47Z

Is there a reason why this has been done as an LLM workflow, and not as strengthening our existing testing infrastructure?

LLMs are inherently nondeterministic, there's no guarantee that they're going to spot the same things each time and they can't be effectively CI gated. There's no protection against regression (a markdown report doesn't protect us, a failing test does).

For example:

JSON/stdout cleanliness, this belongs in unit tests.

Cross-command workflows (subscribe + publish), that's what our E2E tests cover.

There are couple of reasons to do this.

We don't need to manually run commands each time feature is updated/refactored or bug-fix is addressed.
It provides direct insight into CLI behavior through black-box testing by simulating diverse test cases, including edge cases based on the skill configuration.
Generates 3 reports REPORT_NON_JSON.md, REPORT_JSON.md and REPORT_PRIMARY.md
REPORT_NON_JSON.md contains all sub-commands run without --json/--pretty-json flags, so we can just go through the all of the commands, edge-cases and their respective output. This makes it easy to review commands, edge cases, and their corresponding outputs, helping quickly identify errors.
REPORT_JSON.md contains all sub-commands run with --pretty-json flags, so similarly, we can manually go through each command with respective input command and respective output.
REPORT_PRIMARY.md -> This is the only thing that seems nondeterministic, since LLM needs to compare reports with respect to CLAUDE.md conventions, accordingly add it to the review report. Other than this, both reports REPORT_NON_JSON.md and REPORT_JSON.md are very deterministic which give direct insight into the actual command execution and it's respective output.

We already found few output specific formatting issues with this skill =>

For prompts:

1. `Test ably rooms command group` =>

Issues (actionable)

#	Severity	Command	Issue	Output Modes Affected
1	Minor	`rooms reactions send`	Domain fields (`emoji`, `metadata`, `room`) are spread at the top level of the JSON envelope instead of nested under a `reaction` domain key, deviating from the JSON data nesting convention in CLAUDE.md. Expected: `{"type":"result","command":"rooms:reactions:send","success":true,"reaction":{"emoji":"thumbs_up","metadata":null},"room":"..."}`. Actual: `{"type":"result","command":"rooms:reactions:send","success":true,"emoji":"thumbs_up","metadata":null,"room":"..."}`	`--json`, `--pretty-json`
2	Minor	`rooms reactions subscribe`	Domain fields (`clientId`, `metadata`, `name`, `room`, `timestamp`) are spread at the top level instead of nested under a `reaction` domain key. Expected: data nested under `"reaction":{...}`. Actual: all fields at envelope level.	`--json`, `--pretty-json`
3	Minor	`rooms messages subscribe`	"Duration elapsed" message printed before the final received message during shutdown — race condition in output ordering. No data loss occurs, but the output sequence is confusing to read. Steps to reproduce: subscribe with `--duration 8`, send a message near the end of the duration window.	Human-readable
4	Minor	`rooms messages subscribe`	"Listening for messages. Press Ctrl+C to exit." printed once per room in multi-room subscribe (e.g., printed twice when subscribing to 2 rooms). Could consolidate into a single message after all rooms are subscribed. Steps to reproduce: `ably rooms messages subscribe room-a room-b --duration 5`	Human-readable

2. `Test ably channels command group` =>

Issue 1: Empty channel name returns exit code 0

Severity: Major
Affected command(s): channels publish
Output mode(s): Both (human-readable and JSON)
Description: pnpm cli channels publish "" "test" shows error "Could not find path: /channels/messages" but exits with code 0.
Expected: Non-zero exit code when all messages fail to publish. Ideally, validate empty channel name before API call with a clearer error message.
Steps to reproduce: pnpm cli channels publish "" "test"

Issue 2: SDK decode error with `--encoding utf-8`

Severity: Medium
Affected command(s): channels history (when retrieving messages published with --encoding utf-8)
Output mode(s): Both
Description: When messages are published with --encoding utf-8, retrieving them via channels history triggers an SDK error on stderr: [AblySDK Error] Error processing the utf-8 encoding, decoder returned 'Expected input of utf8Decode to be a buffer, arraybuffer, or view'; statusCode=400; code=40013. Messages still display correctly but the error pollutes output.
Steps to reproduce:
1. pnpm cli channels publish my-channel "test" --encoding utf-8
2. pnpm cli channels history my-channel
3. Observe stderr SDK error

Issue 3: Occupancy subscribe human-readable format inconsistency

Severity: Minor
Affected command(s): channels occupancy subscribe
Output mode(s): Human-readable only
Description: The subscribe command dumps occupancy data as raw JSON (Occupancy Data: { "metrics": {...} }) while the get command uses clean labeled format (Connections: 0, Publishers: 0, etc.). Subscribe should use the same labeled format for consistency.
Steps to reproduce: pnpm cli channels occupancy subscribe my-channel --duration 5

Issue 4: Occupancy get human-readable missing fields present in JSON

Severity: Low
Affected command(s): channels occupancy get
Output mode(s): Human-readable only
Description: Human-readable output omits objectSubscribers and objectPublishers fields that are present in JSON output. Per conventions, non-JSON output should expose the same fields as JSON (omit only null/undefined/empty). All fields are 0, so this is an inconsistency in which zero-valued fields are shown.
Steps to reproduce: Compare pnpm cli channels occupancy get my-channel vs pnpm cli channels occupancy get my-channel --json

Issue 5: Occupancy subscribe JSON exposes internal event name

Severity: Low
Affected command(s): channels occupancy subscribe
Output mode(s): JSON only
Description: The event field in JSON output contains [meta]occupancy which is an internal Ably meta-channel event name. Consider using a more user-friendly label or omitting it.
Steps to reproduce: pnpm cli channels occupancy subscribe my-channel --duration 5 --json

Issue 6: Annotations publish/delete JSON nesting convention

Severity: Low
Affected command(s): channels annotations publish, channels annotations delete
Output mode(s): JSON only
Description: Domain data (channel, serial, name) is spread at the top level alongside envelope fields (type, command, success). Per JSON nesting conventions, domain data should be nested under a domain key (e.g., "annotation": {"channel": ..., "serial": ..., "name": ...}).
Steps to reproduce: pnpm cli channels annotations publish my-channel "serial" "reactions:unique.v1" --name test --json

Issue 7: Presence enter JSON includes `"data": null`

Severity: Low
Affected command(s): channels presence enter
Output mode(s): JSON only
Description: When no --data is provided, JSON output includes "data": null. Per conventions, null/undefined fields should be omitted for cleaner output.
Steps to reproduce: pnpm cli channels presence enter my-channel --client-id test --duration 3 --json

maratal · 2026-03-31T20:40:01Z

@maratal , great, so based on the --help section, it's able to run behavioural tests properly. Let me know if there are any edge cases we need to improve upon.

Make sure that it includes complete list of EXACT commands it executed. I would even expect them in a separate file for easy access and repeat.

Created skill to perform behavior testing of given command group

53e6420

AndyTWF requested changes Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skill: CLI Behaviour testing#197

Skill: CLI Behaviour testing#197
sacOO7 wants to merge 1 commit intomainfrom
feature/skill-behavior-testing

sacOO7 commented Mar 27, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 27, 2026

Uh oh!

maratal commented Mar 30, 2026 •

edited

Loading

Uh oh!

sacOO7 commented Mar 30, 2026

Uh oh!

AndyTWF left a comment

Uh oh!

sacOO7 commented Mar 30, 2026 •

edited

Loading

Uh oh!

maratal commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

sacOO7 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Mar 27, 2026

Uh oh!

maratal commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Tests Passed

Uh oh!

sacOO7 commented Mar 30, 2026

Uh oh!

AndyTWF left a comment

Choose a reason for hiding this comment

Uh oh!

sacOO7 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Test ably rooms command group =>

Issues (actionable)

2. Test ably channels command group =>

Issue 1: Empty channel name returns exit code 0

Issue 2: SDK decode error with --encoding utf-8

Issue 3: Occupancy subscribe human-readable format inconsistency

Issue 4: Occupancy get human-readable missing fields present in JSON

Issue 5: Occupancy subscribe JSON exposes internal event name

Issue 6: Annotations publish/delete JSON nesting convention

Issue 7: Presence enter JSON includes "data": null

Uh oh!

maratal commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

sacOO7 commented Mar 27, 2026 •

edited

Loading

maratal commented Mar 30, 2026 •

edited

Loading

sacOO7 commented Mar 30, 2026 •

edited

Loading

1. `Test ably rooms command group` =>

2. `Test ably channels command group` =>

Issue 2: SDK decode error with `--encoding utf-8`

Issue 7: Presence enter JSON includes `"data": null`