Skip to content

test: add 196 tests across MCP, services, utils, autocomplete#855

Merged
senamakel merged 6 commits intotinyhumansai:mainfrom
jwalin-shah:claude/improve-openhuman-testing-wqThC
Apr 24, 2026
Merged

test: add 196 tests across MCP, services, utils, autocomplete#855
senamakel merged 6 commits intotinyhumansai:mainfrom
jwalin-shah:claude/improve-openhuman-testing-wqThC

Conversation

@jwalin-shah
Copy link
Copy Markdown
Contributor

@jwalin-shah jwalin-shah commented Apr 23, 2026

Adds 2,137 lines of unit test coverage across previously untested modules. Pure test addition — no source changes.

Coverage added

  • MCP: errorHandler, rateLimiter, transport, validation
  • Services: tunnelsApi, coreCommandClient, coreStateApi, errorReportQueue
  • Utils: cryptoKeys, withTimeout
  • Rust: autocomplete/ops.rs, json_rpc_e2e scenarios

Notes

  • Original commit also added integrationTokensCrypto.test.ts; dropped in a follow-up since that crypto module is being removed as dead code (see companion cleanup PR).

Test plan

  • yarn test:unit green
  • cargo test green

Summary by CodeRabbit

  • Tests
    • Added extensive unit tests for error handling, rate limiting, transport, validation, timeout and crypto utilities.
    • Expanded service-level tests for APIs, core clients, state APIs, tunnels and error reporting behavior.
    • Added end-to-end and integration tests covering JSON-RPC flows and autocomplete operations across platforms.

claude and others added 2 commits April 23, 2026 21:03
…tials E2E

Fills the largest coverage gaps identified in a repo-wide audit of untested
hot paths.

Frontend (Vitest, 182 tests / 11 files):
- lib/mcp/: transport (15), validation (24), errorHandler (10), rateLimiter (14)
  — the JSON-RPC client layer had zero prior tests
- services/: coreCommandClient (8), coreStateApi (13), errorReportQueue (25),
  api/tunnelsApi (17)
- utils/: cryptoKeys (18), integrationTokensCrypto (27), withTimeout (11)

Rust (13 tests):
- openhuman::autocomplete::ops — was the only domain with a public ops.rs
  and no #[cfg(test)] module; covers status/stop/history/clear_history and
  the non-macOS error paths on start/start_cli.

JSON-RPC E2E (1 test):
- credentials_crud_roundtrip exercises auth_store_provider_credentials /
  auth_list_provider_credentials / auth_remove_provider_credentials.
  Note: the credentials domain is namespaced under openhuman.auth_*, not
  openhuman.credentials_*.

Notes for follow-up (not addressed here):
- lib/mcp/transport.ts:off() passes the original (unwrapped) handler to
  socket.off(), so handlers registered via on() leak. Tests document the
  current behavior rather than masking the bug.
- errorReportQueue.test.ts uses an explicit @vitest-environment jsdom
  pragma because errorReportQueue.ts calls window.addEventListener() at
  module-load time; the module would crash in any non-browser runtime.

https://claude.ai/code/session_01AXu4XbzT1tYtJy9dzert73
…opped as dead code in review-codebase-improvements
@jwalin-shah jwalin-shah requested a review from a team April 23, 2026 21:27
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Adds comprehensive test suites across the TypeScript/Node.js app and the Rust backend, introducing Vitest and Tokio tests for MCP utilities, services, APIs, crypto utilities, timeout helpers, error reporting, socket transport, autocomplete ops, and a JSON-RPC end-to-end credential CRUD flow.

Changes

Cohort / File(s) Summary
MCP Library Tests
app/src/lib/mcp/errorHandler.test.ts, app/src/lib/mcp/rateLimiter.test.ts, app/src/lib/mcp/transport.test.ts, app/src/lib/mcp/validation.test.ts
Adds Vitest suites covering error formatting/handling, rate-limiter classification/enforcement, SocketIO transport request/response flows and listener switching, and validation utilities with ValidationError behavior.
API & Service Tests
app/src/services/api/tunnelsApi.test.ts, app/src/services/coreCommandClient.test.ts, app/src/services/coreStateApi.test.ts
Adds tests that mock core RPC/command clients to assert correct method names, param shapes, envelope unwrapping, and delegation for tunnels and core state APIs.
Error Reporting Queue Tests
app/src/services/errorReportQueue.test.ts
Introduces comprehensive queue lifecycle, deduplication (time-windowed), subscription/unsubscribe semantics, tagging by Sentry event_id, and send-to-Sentry behavior, including sender invocation and queue mutation.
Utility Function Tests
app/src/utils/cryptoKeys.test.ts, app/src/utils/withTimeout.test.ts
Adds tests for mnemonic generation/validation, deterministic AES/EVM derivations with pinned vectors, and withTimeout behavior across timing/race scenarios (uses fake timers and config mocking).
Rust Autocomplete Tests
src/openhuman/autocomplete/ops.rs
Adds #[cfg(test)] tokio unit/integration tests for autocomplete start/status/stop, CLI error paths, and JSON (de)serialization round-trips for params/results.
JSON-RPC E2E Test
tests/json_rpc_e2e.rs
Adds an end-to-end JSON-RPC test exercising provider-credential CRUD flow against a started core server and mock upstream, validating create/list/filter/remove semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • graycyrus

Poem

"🐇 I hopped through code with test-filled paws,
Checking sockets, mnemonics, and rate-limit laws.
I queued and I tagged, I timed out with glee,
Now tunnels and RPCs are safer — thanks to me! ✨"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: adding 196 tests across multiple areas (MCP, services, utils, autocomplete).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

senamakel
senamakel previously approved these changes Apr 24, 2026
…ypes

TeamWithRole uses nested team._id, TeamMember uses _id, TeamInvite uses _id.
Assertions at listTeams/getTeamMembers/getTeamInvites were checking the old
flat id field and would never have caught a shape regression.
Comment thread app/src/lib/mcp/transport.test.ts
Comment thread app/src/utils/cryptoKeys.test.ts
senamakel
senamakel previously approved these changes Apr 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (8)
tests/json_rpc_e2e.rs (1)

2281-2420: LGTM — solid CRUD coverage for the provider-credential RPCs.

The happy path (store → list → filtered list → remove → verify empty) is well-structured and consistent with the envelope/abort patterns used by the other tests in this file. Comments explaining why a mock upstream is still needed (config validation) and why no session is required (local storage) are helpful.

One optional enhancement to consider for follow-up coverage: exercising a filter that does not match (e.g., {"provider": "anthropic"} after storing openai) to lock down the filter semantics, and asserting profileName / hasToken on the entries returned by auth_list_provider_credentials so that a future drift between the store-summary shape and the list-summary shape would be caught. Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/json_rpc_e2e.rs` around lines 2281 - 2420, Add two extra assertions in
the credentials_crud_roundtrip test: after storing the openai credential, call
the JSON-RPC method "openhuman.auth_list_provider_credentials" (use
post_json_rpc as used elsewhere) with a non-matching filter like
{"provider":"anthropic"} and assert the returned array is empty to verify filter
semantics; and when asserting the successful list results (the profiles array
from auth_list_provider_credentials) also assert that each returned entry
contains profileName == "default" and hasToken == true to lock down the summary
shape (refer to the variables list_result/profiles and
filtered_result/filtered_profiles used in the test).
app/src/services/errorReportQueue.test.ts (1)

48-51: Optional: listener accumulation across vi.resetModules() re-imports.

Each describe's beforeEach calls vi.resetModules() and re-imports the module, which re-runs module-level initGlobalListeners() and registers fresh error/unhandledrejection listeners on the persistent jsdom window. Listeners accumulate across all tests in this file. No current test dispatches those events, so it doesn't cause failures today, but it's a latent foot-gun if future tests simulate window errors. Consider either stubbing window.addEventListener via vi.spyOn for these suites, or exposing a __resetForTests() hook in the module so state and listeners can be cleared deterministically without repeated module loads.

Also applies to: 117-121, 162-165, 224-227, 268-271, 325-328

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/services/errorReportQueue.test.ts` around lines 48 - 51, The tests
re-import the module after vi.resetModules(), causing initGlobalListeners() to
re-register persistent window 'error'/'unhandledrejection' listeners and
accumulate them across tests; to fix, add a test-only cleanup hook exported from
the module (e.g., __resetForTests()) that removes the listeners and clears any
module-level state (undo what initGlobalListeners() registers) and call that
hook from the test suite (before importing or in afterEach), or alternatively
stub window.addEventListener via vi.spyOn in the test before each import to
prevent actual listener registration; reference initGlobalListeners and
export/use __resetForTests in errorReportQueue and update the test
beforeEach/afterEach to invoke it.
app/src/services/api/tunnelsApi.test.ts (1)

208-226: Consider adding a test for double or embedded trailing slashes in ingressUrl.

The current trailing-slash case only covers a single trailing /. If callers can pass URLs like https://api.example.com// (e.g., misconfigured VITE_* values), it would be worth pinning behavior. Not a blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/services/api/tunnelsApi.test.ts` around lines 208 - 226, Add a test
case covering double or embedded trailing slashes for tunnelsApi.ingressUrl:
call tunnelsApi.ingressUrl with inputs like 'https://api.example.com//' (and
optionally a backend URL containing a path with trailing slashes, e.g.,
'https://api.example.com/base//') and assert the returned URL normalizes to a
single slash before '/webhooks/ingress/<uuid>' (e.g.,
'https://api.example.com/webhooks/ingress/test-uuid' and
'https://api.example.com/base/webhooks/ingress/test-uuid'); place this near the
existing tests that import tunnelsApi and use the same pattern as the other
it(...) blocks so the suite verifies handling of multiple trailing slashes.
app/src/services/coreCommandClient.test.ts (1)

67-80: Near-duplicate tests exercise the same code path.

Both cases mock { result: null, logs: [] } and assert the promise resolves to null; they differ only in naming/intent. The "handles undefined params gracefully" case doesn't assert anything about param forwarding (that's already covered by the test at Lines 23-33). Consider dropping one or rewriting Lines 67-72 to actually assert that the forwarded call object contains params: undefined when no params are passed.

♻️ Suggested tightening of the "undefined params" case
   it('handles undefined params gracefully (does not throw)', async () => {
-    mockCallCoreRpc.mockResolvedValueOnce({ result: null, logs: [] });
+    mockCallCoreRpc.mockResolvedValueOnce({ result: 'ok', logs: [] });

     const { callCoreCommand } = await import('./coreCommandClient');
-    await expect(callCoreCommand('openhuman.app_state_snapshot')).resolves.toBeNull();
+    await callCoreCommand('openhuman.app_state_snapshot');
+    const call = mockCallCoreRpc.mock.calls[0][0] as { params: unknown };
+    expect(call).toHaveProperty('params', undefined);
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/services/coreCommandClient.test.ts` around lines 67 - 80, The two
tests both mock mockCallCoreRpc to return { result: null, logs: [] } and assert
null, so remove the duplicate by either deleting one test or changing the
"handles undefined params gracefully" test to assert that callCoreCommand
forwards params: undefined to mockCallCoreRpc; specifically, keep the
mockResolvedValueOnce and resolve assertion but also add an assertion like
expect(mockCallCoreRpc).toHaveBeenCalledWith(expect.objectContaining({ method:
'openhuman.app_state_snapshot', params: undefined })) after calling
callCoreCommand so you verify param forwarding while still ensuring the call
resolves to null.
app/src/lib/mcp/rateLimiter.test.ts (1)

93-148: Functions are tested but not yet integrated into the request handler.

The tests provide good coverage of the rate-limiting logic, but enforceRateLimit and resetRequestCallCount are not called from any production code. These need to be wired into the actual MCP request lifecycle before the tests will catch regressions in real-world behavior. Track as a follow-up integration task.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/lib/mcp/rateLimiter.test.ts` around lines 93 - 148, The tests call
enforceRateLimit and resetRequestCallCount but the MCP request pipeline never
invokes them; integrate these into the live request lifecycle by calling
enforceRateLimit(actionName, permission) at the start of each incoming MCP
request (e.g., in the request handler or middleware that processes actions) to
enforce per-request caps, and ensure resetRequestCallCount() is called at
request completion/cleanup (after response sent or on finally block) so the
per-request counter is cleared; update the MCP handler that dispatches actions
to call these functions with the appropriate action identifier and permission
context.
src/openhuman/autocomplete/ops.rs (2)

408-413: logs.len() == 2 is over-specified and will cascade-fail on any future log addition.

Adding one extra log line in any of these ops (very likely as observability improves — the coding guidelines explicitly ask for "substantial development-oriented logging at entry/exit points, branch decisions, external calls…") breaks four tests simultaneously with no behavioral regression. Prefer asserting >= 2 plus the specific tokens you care about (already done for the content checks).

Suggested relaxation
-        assert_eq!(
-            outcome.logs.len(),
-            2,
-            "expected exactly 2 log lines, got: {:?}",
-            outcome.logs
-        );
+        assert!(
+            outcome.logs.len() >= 2,
+            "expected at least 2 log lines, got: {:?}",
+            outcome.logs
+        );

Apply the same pattern in stop_without_reason_returns_stopped_true_and_two_logs (L462-467), history_returns_outcome_with_two_log_lines (L598-603), and clear_history_returns_outcome_with_two_log_lines (L643-648). The test names can stay — the invariant being checked is "at least the confirmation + structured line".

Also applies to: 462-467, 598-603, 643-648

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/autocomplete/ops.rs` around lines 408 - 413, Replace the
exact-count log assertions that require logs.len() == 2 with a relaxed check
logs.len() >= 2 in the specified tests so future additional logs don't break
tests; update the assertion in
stop_without_reason_returns_stopped_true_and_two_logs,
history_returns_outcome_with_two_log_lines,
clear_history_returns_outcome_with_two_log_lines and the current test (the block
using outcome.logs.len() equality) to use assert!(outcome.logs.len() >= 2, ...)
and keep the existing content/token assertions unchanged to validate the
important log lines.

394-662: File is now ~662 lines — consider moving tests to a sibling tests.rs child module.

As per coding guidelines "Source files should be ≤ ~500 lines; split modules when growing to improve maintainability". Splitting the test module into src/openhuman/autocomplete/ops/tests.rs (via #[cfg(test)] mod tests;) keeps the production module under the limit and leaves room to grow coverage. Optional — tests are lower risk to oversize, but the guideline applies to src/**/*.rs generally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/autocomplete/ops.rs` around lines 394 - 662, The file is too
large because the inline #[cfg(test)] mod tests contains many integration/unit
tests; extract that entire test module into a sibling file named tests.rs and
replace the inline block with a one-line declaration (#[cfg(test)] mod tests;).
Move all test functions and their imports (e.g., use super::*;, tokio tests,
structs like AutocompleteHistoryParams, functions like
autocomplete_status/autocomplete_start/autocomplete_history/autocomplete_clear_history/autocomplete_stop/etc.)
into the new tests.rs, preserving attributes and visibility, and run cargo test
to ensure any path/import adjustments (use super::* vs crate paths) are
corrected.
app/src/utils/cryptoKeys.test.ts (1)

120-129: EIP-55 check only verifies "some casing exists".

A regression that produced, say, a single uppercase letter would still pass. Combined with the lowercase comparison on line 110, there is no test that actually pins the EIP-55 output. If you adopt the literal-checksum assertion suggested above for line 110, this test becomes redundant and can be removed; otherwise consider pinning the exact mixed-case string here instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/utils/cryptoKeys.test.ts` around lines 120 - 129, The current test
for deriveEvmAddressFromMnemonic only checks for mixed-case which is too weak;
either remove this test if you've added a literal EIP-55 checksum assertion
already (so it would be redundant), or replace it by asserting the exact
checksummed address string for KNOWN_MNEMONIC returned by
deriveEvmAddressFromMnemonic to pin the expected EIP-55 output (use the exact
mixed-case address literal you expect). Ensure you update the test name
accordingly and reference deriveEvmAddressFromMnemonic and KNOWN_MNEMONIC when
making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/lib/mcp/transport.test.ts`:
- Around line 111-127: The test doesn't verify that transport.off actually
called socket.off with the expected arguments; add a spy/assertion on the fake
socket's off method: create the socket via makeSocket(), spy on socket.off, call
transport.on('tool_call', handler) and then transport.off('tool_call', handler)
on the SocketIOMCPTransportImpl instance, and assert socket.off was invoked with
the prefixed event name ('mcp:tool_call') and the handler passed to
transport.off (or, if you prefer to mark known-bug behavior, convert the test to
it.todo('off removes the handler') until the wrapped-handler leak is fixed).

In `@app/src/utils/cryptoKeys.test.ts`:
- Around line 85-92: The test "returns a fixed known value for the all-abandon
mnemonic" currently re-calls deriveAesKeyFromMnemonic(KNOWN_MNEMONIC) instead of
asserting a pinned value; replace that self-comparison with a literal expected
hex string (the known 64-char output produced by deriveAesKeyFromMnemonic for
KNOWN_MNEMONIC using the current salt "openhuman-aes-key-v1", iteration count
and dkLen) and assert key === "<PINNED_HEX_VALUE>"; keep the rest of the test
and the KNOWN_MNEMONIC reference so future silent changes to
deriveAesKeyFromMnemonic, its salt, iterations or dkLen will fail.
- Around line 107-111: The test is currently lowercasing the derived address
which hides whether EIP-55 checksum casing is correct; update the assertion for
deriveEvmAddressFromMnemonic(KNOWN_MNEMONIC) to remove .toLowerCase() and assert
the exact expected EIP-55 checksummed address string (pin the canonical
checksummed form returned by deriveEvmAddressFromMnemonic for KNOWN_MNEMONIC) so
the test validates the precise checksum casing rather than just identity.

In `@src/openhuman/autocomplete/ops.rs`:
- Around line 394-507: Tests call the process-global
autocomplete::global_engine() (used by autocomplete_status and
autocomplete_stop), which causes flaky races when tests run in parallel; fix by
serializing engine-touching tests or removing reliance on the global singleton:
either (a) add a process-wide tokio::sync::Mutex guard used by these tests
(acquire/release in each test) or (b) refactor to use a test-only engine
constructor and pass an isolated engine instance into
autocomplete_status/autocomplete_stop, or (c) annotate tests to run on a
single-threaded Tokio runtime and add a serialization guard; implement one of
these options (pick a, b, or c), update the tests to use the chosen
guard/constructor (references: global_engine(), autocomplete_status,
autocomplete_stop, tests module), and add a TODO comment explaining why
serialization is required to prevent future flaky tests.
- Around line 586-661: Tests call clear_history() and list_history() against the
real on-disk KV store because MemoryClient::new_local() uses
default_root_openhuman_dir(), so update the tests
(history_returns_outcome_with_two_log_lines,
history_default_limit_appears_in_log,
clear_history_returns_outcome_with_two_log_lines) to use an isolated temporary
workspace: create a temp dir and construct the store with
MemoryClient::from_workspace_dir(temp_dir) (or otherwise inject a test-scoped
client), then call autocomplete_history(...) and autocomplete_clear_history()
against that client; alternatively, if isolation isn't possible, mark these
tests #[ignore] and move them to integration tests with an explicit fixture —
ensure you replace any direct MemoryClient::new_local() usage and avoid touching
default_root_openhuman_dir() in test setup.

---

Nitpick comments:
In `@app/src/lib/mcp/rateLimiter.test.ts`:
- Around line 93-148: The tests call enforceRateLimit and resetRequestCallCount
but the MCP request pipeline never invokes them; integrate these into the live
request lifecycle by calling enforceRateLimit(actionName, permission) at the
start of each incoming MCP request (e.g., in the request handler or middleware
that processes actions) to enforce per-request caps, and ensure
resetRequestCallCount() is called at request completion/cleanup (after response
sent or on finally block) so the per-request counter is cleared; update the MCP
handler that dispatches actions to call these functions with the appropriate
action identifier and permission context.

In `@app/src/services/api/tunnelsApi.test.ts`:
- Around line 208-226: Add a test case covering double or embedded trailing
slashes for tunnelsApi.ingressUrl: call tunnelsApi.ingressUrl with inputs like
'https://api.example.com//' (and optionally a backend URL containing a path with
trailing slashes, e.g., 'https://api.example.com/base//') and assert the
returned URL normalizes to a single slash before '/webhooks/ingress/<uuid>'
(e.g., 'https://api.example.com/webhooks/ingress/test-uuid' and
'https://api.example.com/base/webhooks/ingress/test-uuid'); place this near the
existing tests that import tunnelsApi and use the same pattern as the other
it(...) blocks so the suite verifies handling of multiple trailing slashes.

In `@app/src/services/coreCommandClient.test.ts`:
- Around line 67-80: The two tests both mock mockCallCoreRpc to return { result:
null, logs: [] } and assert null, so remove the duplicate by either deleting one
test or changing the "handles undefined params gracefully" test to assert that
callCoreCommand forwards params: undefined to mockCallCoreRpc; specifically,
keep the mockResolvedValueOnce and resolve assertion but also add an assertion
like expect(mockCallCoreRpc).toHaveBeenCalledWith(expect.objectContaining({
method: 'openhuman.app_state_snapshot', params: undefined })) after calling
callCoreCommand so you verify param forwarding while still ensuring the call
resolves to null.

In `@app/src/services/errorReportQueue.test.ts`:
- Around line 48-51: The tests re-import the module after vi.resetModules(),
causing initGlobalListeners() to re-register persistent window
'error'/'unhandledrejection' listeners and accumulate them across tests; to fix,
add a test-only cleanup hook exported from the module (e.g., __resetForTests())
that removes the listeners and clears any module-level state (undo what
initGlobalListeners() registers) and call that hook from the test suite (before
importing or in afterEach), or alternatively stub window.addEventListener via
vi.spyOn in the test before each import to prevent actual listener registration;
reference initGlobalListeners and export/use __resetForTests in errorReportQueue
and update the test beforeEach/afterEach to invoke it.

In `@app/src/utils/cryptoKeys.test.ts`:
- Around line 120-129: The current test for deriveEvmAddressFromMnemonic only
checks for mixed-case which is too weak; either remove this test if you've added
a literal EIP-55 checksum assertion already (so it would be redundant), or
replace it by asserting the exact checksummed address string for KNOWN_MNEMONIC
returned by deriveEvmAddressFromMnemonic to pin the expected EIP-55 output (use
the exact mixed-case address literal you expect). Ensure you update the test
name accordingly and reference deriveEvmAddressFromMnemonic and KNOWN_MNEMONIC
when making the change.

In `@src/openhuman/autocomplete/ops.rs`:
- Around line 408-413: Replace the exact-count log assertions that require
logs.len() == 2 with a relaxed check logs.len() >= 2 in the specified tests so
future additional logs don't break tests; update the assertion in
stop_without_reason_returns_stopped_true_and_two_logs,
history_returns_outcome_with_two_log_lines,
clear_history_returns_outcome_with_two_log_lines and the current test (the block
using outcome.logs.len() equality) to use assert!(outcome.logs.len() >= 2, ...)
and keep the existing content/token assertions unchanged to validate the
important log lines.
- Around line 394-662: The file is too large because the inline #[cfg(test)] mod
tests contains many integration/unit tests; extract that entire test module into
a sibling file named tests.rs and replace the inline block with a one-line
declaration (#[cfg(test)] mod tests;). Move all test functions and their imports
(e.g., use super::*;, tokio tests, structs like AutocompleteHistoryParams,
functions like
autocomplete_status/autocomplete_start/autocomplete_history/autocomplete_clear_history/autocomplete_stop/etc.)
into the new tests.rs, preserving attributes and visibility, and run cargo test
to ensure any path/import adjustments (use super::* vs crate paths) are
corrected.

In `@tests/json_rpc_e2e.rs`:
- Around line 2281-2420: Add two extra assertions in the
credentials_crud_roundtrip test: after storing the openai credential, call the
JSON-RPC method "openhuman.auth_list_provider_credentials" (use post_json_rpc as
used elsewhere) with a non-matching filter like {"provider":"anthropic"} and
assert the returned array is empty to verify filter semantics; and when
asserting the successful list results (the profiles array from
auth_list_provider_credentials) also assert that each returned entry contains
profileName == "default" and hasToken == true to lock down the summary shape
(refer to the variables list_result/profiles and
filtered_result/filtered_profiles used in the test).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 645cca45-e0f6-436c-8702-7304e52799d5

📥 Commits

Reviewing files that changed from the base of the PR and between 1408666 and e51d0e9.

📒 Files selected for processing (12)
  • app/src/lib/mcp/errorHandler.test.ts
  • app/src/lib/mcp/rateLimiter.test.ts
  • app/src/lib/mcp/transport.test.ts
  • app/src/lib/mcp/validation.test.ts
  • app/src/services/api/tunnelsApi.test.ts
  • app/src/services/coreCommandClient.test.ts
  • app/src/services/coreStateApi.test.ts
  • app/src/services/errorReportQueue.test.ts
  • app/src/utils/cryptoKeys.test.ts
  • app/src/utils/withTimeout.test.ts
  • src/openhuman/autocomplete/ops.rs
  • tests/json_rpc_e2e.rs

Comment thread app/src/lib/mcp/transport.test.ts Outdated
Comment thread app/src/utils/cryptoKeys.test.ts
Comment thread app/src/utils/cryptoKeys.test.ts
Comment on lines +394 to +507
#[cfg(test)]
mod tests {
use super::*;

// ── autocomplete_status ────────────────────────────────────────────────────

/// Happy path: `autocomplete_status` always succeeds and produces exactly
/// two log lines with the expected key tokens.
#[tokio::test]
async fn status_returns_outcome_with_two_log_lines() {
let outcome = autocomplete_status()
.await
.expect("autocomplete_status must not return Err");

assert_eq!(
outcome.logs.len(),
2,
"expected exactly 2 log lines, got: {:?}",
outcome.logs
);
assert!(
outcome.logs[0].contains("autocomplete status fetched"),
"first log should confirm fetch: {:?}",
outcome.logs[0]
);
assert!(
outcome.logs[1].contains("[autocomplete] status"),
"second log should contain the structured prefix: {:?}",
outcome.logs[1]
);
}

/// The status payload has the expected boolean/string fields and a non-empty phase.
#[tokio::test]
async fn status_payload_has_expected_fields() {
let outcome = autocomplete_status()
.await
.expect("autocomplete_status must not return Err");

let status = &outcome.value;
// Phase must be a non-empty string (default is "idle").
assert!(
!status.phase.is_empty(),
"phase must not be empty, got: {:?}",
status.phase
);
// debounce_ms is always set to a positive value by the engine default (120 ms).
assert!(
status.debounce_ms > 0,
"debounce_ms must be positive, got {}",
status.debounce_ms
);
}

// ── autocomplete_stop ──────────────────────────────────────────────────────

/// Happy path: stopping a not-yet-running engine reports `stopped: true`
/// and produces two log lines.
#[tokio::test]
async fn stop_without_reason_returns_stopped_true_and_two_logs() {
let outcome = autocomplete_stop(None)
.await
.expect("autocomplete_stop must not return Err");

assert!(
outcome.value.stopped,
"stopped must be true even when engine was already idle"
);
assert_eq!(
outcome.logs.len(),
2,
"expected 2 log lines, got: {:?}",
outcome.logs
);
assert!(
outcome.logs[0].contains("autocomplete stopped"),
"first log should confirm stop: {:?}",
outcome.logs[0]
);
}

/// When a `reason` is supplied, the structured log line must include it.
#[tokio::test]
async fn stop_with_reason_includes_reason_in_log() {
let payload = Some(AutocompleteStopParams {
reason: Some("test-shutdown".to_string()),
});

let outcome = autocomplete_stop(payload)
.await
.expect("autocomplete_stop must not return Err");

let structured_log = &outcome.logs[1];
assert!(
structured_log.contains("test-shutdown"),
"structured log must contain the supplied reason; got: {:?}",
structured_log
);
}

/// When no reason is supplied, the structured log line must record "none".
#[tokio::test]
async fn stop_without_reason_logs_none_as_reason() {
let outcome = autocomplete_stop(None)
.await
.expect("autocomplete_stop must not return Err");

let structured_log = &outcome.logs[1];
assert!(
structured_log.contains("reason=none"),
"structured log must record reason=none when no reason is given; got: {:?}",
structured_log
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Parallel tests share the global_engine() singleton; asserts will be flaky once start actually succeeds.

By default cargo test runs test functions concurrently across threads, and every test here drives the same process-global autocomplete::global_engine() (status/stop/start). It happens to be stable today on non-macOS because start always errors (so the engine never transitions to running), but the invariant these tests encode (e.g. stopped: true "even when engine was already idle", a specific phase being observable, structured-log contents) is only coincidentally true and will begin to race as soon as:

  • the suite is run on macOS, or
  • any other #[tokio::test] in this module starts/stops the engine concurrently (e.g. a future set-style test), or
  • this file is used as a template for further tests.

Recommend one of: (a) serialize engine-touching tests with a process-wide tokio::sync::Mutex (or the serial_test crate), (b) expose a test-only engine constructor and pass it in explicitly instead of going through global_engine(), or (c) annotate these with #[tokio::test(flavor = "multi_thread")] + a serialization guard. The current setup is fine for landing coverage, but please leave a TODO so the next author doesn't pile more shared-state tests on top.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/autocomplete/ops.rs` around lines 394 - 507, Tests call the
process-global autocomplete::global_engine() (used by autocomplete_status and
autocomplete_stop), which causes flaky races when tests run in parallel; fix by
serializing engine-touching tests or removing reliance on the global singleton:
either (a) add a process-wide tokio::sync::Mutex guard used by these tests
(acquire/release in each test) or (b) refactor to use a test-only engine
constructor and pass an isolated engine instance into
autocomplete_status/autocomplete_stop, or (c) annotate tests to run on a
single-threaded Tokio runtime and add a serialization guard; implement one of
these options (pick a, b, or c), update the tests to use the chosen
guard/constructor (references: global_engine(), autocomplete_status,
autocomplete_stop, tests module), and add a TODO comment explaining why
serialization is required to prevent future flaky tests.

Comment thread src/openhuman/autocomplete/ops.rs
- cryptoKeys.test.ts: pin deriveAesKeyFromMnemonic output to literal hex
  (dce707ee...) so KDF param regressions are caught (addresses @senamakel
  and @coderabbitai on cryptoKeys.test.ts:92)
- cryptoKeys.test.ts: pin deriveEvmAddressFromMnemonic to EIP-55 checksummed
  form, removing .toLowerCase() that hid checksum casing errors (addresses
  @coderabbitai on cryptoKeys.test.ts:111)
- transport.test.ts: convert tautological off-handler test to it.todo with
  explanatory comment documenting the known wrapped-handler leak bug
  (addresses @senamakel on transport.test.ts:151 and @coderabbitai on
  transport.test.ts:127)
- autocomplete/ops.rs: add TODO comment on global_engine() concurrency risk
  so future authors know serialization is required when macOS support lands
  (addresses @coderabbitai on ops.rs:507)
- autocomplete/ops.rs: mark history/clear_history integration tests #[ignore]
  to prevent wiping real on-disk KV store on cargo test runs (addresses
  @coderabbitai on ops.rs:661)
HEAD introduced credentials_crud_roundtrip; main introduced
skills_uninstall_rpc_e2e. Both tests are independent (different RPC
methods, non-overlapping port ranges 5001-5005 vs 6001-6003) and must
be preserved.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/openhuman/autocomplete/ops.rs (1)

419-484: Brittle exact log-count and unchecked logs[1] indexing.

A couple of minor resiliency concerns that would produce confusing failures if the ops layer ever adds a log line:

  • assert_eq!(outcome.logs.len(), 2, …) (lines 419-424, 473-478, 616-621, 663-668) tightly couples these tests to the current log vector length. Any future log addition — including development-oriented logs (entry/exit, state transitions) encouraged by the repo’s logging guideline — will break all of them at once even though behavior is unchanged. Prefer >= 2 (or assert on the content of specific prefixes you care about).
  • Tests stop_with_reason_includes_reason_in_log (line 497), stop_without_reason_logs_none_as_reason (line 512), and history_default_limit_appears_in_log (line 646) index outcome.logs[1] without first asserting outcome.logs.len() >= 2. If the vector is shorter the test will panic with an unhelpful index-out-of-bounds instead of the descriptive assert! message below it.

As per coding guidelines (**/*.{rs,ts,tsx}: Add substantial development-oriented logging at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error paths), future log additions in autocomplete_* ops are explicitly expected, so == on log counts is likely to churn.

♻️ Suggested tightening (illustrative, apply to each affected test)
-        assert_eq!(
-            outcome.logs.len(),
-            2,
-            "expected exactly 2 log lines, got: {:?}",
-            outcome.logs
-        );
+        assert!(
+            outcome.logs.len() >= 2,
+            "expected at least 2 log lines, got: {:?}",
+            outcome.logs
+        );
+        assert!(
+            outcome.logs.len() >= 2,
+            "structured log missing; got: {:?}",
+            outcome.logs
+        );
         let structured_log = &outcome.logs[1];
         assert!(
             structured_log.contains("test-shutdown"),
             "structured log must contain the supplied reason; got: {:?}",
             structured_log
         );

Also applies to: 497-517, 616-650

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/autocomplete/ops.rs` around lines 419 - 484, Change brittle
exact log-count checks and unsafe indexing in the autocomplete op tests: replace
assert_eq!(outcome.logs.len(), 2, ...) with an assertion that allows additional
logs (e.g., assert!(outcome.logs.len() >= 2, ...)) or, better, assert on the
presence of specific log content using outcome.logs.iter().any(|l|
l.contains("...")); additionally, before using outcome.logs[1] in tests like
stop_with_reason_includes_reason_in_log,
stop_without_reason_logs_none_as_reason, history_default_limit_appears_in_log,
add a guard (assert!(outcome.logs.len() >= 2, ...)) or switch to searching for
the expected substring across the log lines to avoid index-out-of-bounds panics
(apply changes to tests named autocomplete_status,
stop_without_reason_returns_stopped_true_and_two_logs and the other affected
test functions).
app/src/utils/cryptoKeys.test.ts (1)

77-83: Optional: extract the second mnemonic into a shared constant.

'zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo wrong' is duplicated across the AES key and EVM address describe blocks. Hoisting it next to KNOWN_MNEMONIC (e.g. OTHER_MNEMONIC) would make the intent clearer and keep both "different-mnemonic" tests in sync if you ever swap it.

♻️ Proposed refactor
 const KNOWN_MNEMONIC =
   'abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about';
+// Another valid BIP39 mnemonic, used for "different input → different output" checks.
+const OTHER_MNEMONIC =
+  'zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo wrong';
@@
-    const other = 'zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo wrong';
-    expect(validateMnemonicPhrase(other)).toBe(true);
     const keyA = deriveAesKeyFromMnemonic(KNOWN_MNEMONIC);
-    const keyB = deriveAesKeyFromMnemonic(other);
+    const keyB = deriveAesKeyFromMnemonic(OTHER_MNEMONIC);
     expect(keyA).not.toBe(keyB);
@@
-    const other = 'zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo wrong';
     const addrA = deriveEvmAddressFromMnemonic(KNOWN_MNEMONIC);
-    const addrB = deriveEvmAddressFromMnemonic(other);
+    const addrB = deriveEvmAddressFromMnemonic(OTHER_MNEMONIC);
     expect(addrA).not.toBe(addrB);

You may want to keep the validateMnemonicPhrase(OTHER_MNEMONIC) pre-assertion inside one of the two tests (or hoist it into a standalone it in the validateMnemonicPhrase block) so the "sanity-check the fixture" intent isn't lost.

Also applies to: 112-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/utils/cryptoKeys.test.ts` around lines 77 - 83, Introduce a shared
constant for the second mnemonic (e.g. OTHER_MNEMONIC) next to KNOWN_MNEMONIC
and replace the inline string occurrences in both the AES key test and the EVM
address tests; keep the validateMnemonicPhrase(other) pre-assertion (or move it
to a single sanity-check test) so the fixture is still validated, and update
deriveAesKeyFromMnemonic and any tests referencing the literal mnemonic to use
OTHER_MNEMONIC to avoid duplication and keep tests in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/src/utils/cryptoKeys.test.ts`:
- Around line 77-83: Introduce a shared constant for the second mnemonic (e.g.
OTHER_MNEMONIC) next to KNOWN_MNEMONIC and replace the inline string occurrences
in both the AES key test and the EVM address tests; keep the
validateMnemonicPhrase(other) pre-assertion (or move it to a single sanity-check
test) so the fixture is still validated, and update deriveAesKeyFromMnemonic and
any tests referencing the literal mnemonic to use OTHER_MNEMONIC to avoid
duplication and keep tests in sync.

In `@src/openhuman/autocomplete/ops.rs`:
- Around line 419-484: Change brittle exact log-count checks and unsafe indexing
in the autocomplete op tests: replace assert_eq!(outcome.logs.len(), 2, ...)
with an assertion that allows additional logs (e.g., assert!(outcome.logs.len()
>= 2, ...)) or, better, assert on the presence of specific log content using
outcome.logs.iter().any(|l| l.contains("...")); additionally, before using
outcome.logs[1] in tests like stop_with_reason_includes_reason_in_log,
stop_without_reason_logs_none_as_reason, history_default_limit_appears_in_log,
add a guard (assert!(outcome.logs.len() >= 2, ...)) or switch to searching for
the expected substring across the log lines to avoid index-out-of-bounds panics
(apply changes to tests named autocomplete_status,
stop_without_reason_returns_stopped_true_and_two_logs and the other affected
test functions).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b22cdbf3-6e48-44bf-82e7-86ed72db2dcb

📥 Commits

Reviewing files that changed from the base of the PR and between e51d0e9 and 30e23d3.

📒 Files selected for processing (4)
  • app/src/lib/mcp/transport.test.ts
  • app/src/utils/cryptoKeys.test.ts
  • src/openhuman/autocomplete/ops.rs
  • tests/json_rpc_e2e.rs

@senamakel senamakel merged commit 95504d4 into tinyhumansai:main Apr 24, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants