Skip to content

feat: onboarding Gmail integration + LinkedIn profile enrichment#524

Merged
senamakel merged 12 commits intotinyhumansai:mainfrom
senamakel:feat/sunday-3
Apr 13, 2026
Merged

feat: onboarding Gmail integration + LinkedIn profile enrichment#524
senamakel merged 12 commits intotinyhumansai:mainfrom
senamakel:feat/sunday-3

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 13, 2026

Summary

  • Onboarding now shows Gmail integration — replaced the static "Connect Integrations Later" step with a live Gmail connect card (Notion and others available after setup)
  • Context gathering step — after connecting Gmail, a new onboarding step calls the Rust-side LinkedIn enrichment pipeline and shows progress
  • LinkedIn enrichment pipeline (Rust) — searches Gmail HTML bodies for comm/in/<username> links, scrapes the profile via Apify (dev_fusion/linkedin-profile-scraper), passes through LLM summarisation, and writes PROFILE.md to workspace
  • PROFILE.md replaces USER.md — the agent prompt system now loads PROFILE.md (generated from real user data) instead of the generic USER.md template
  • Composio tools hidden from main agentToolCategory::Skill tools (Composio, Apify) are filtered out of the orchestrator/main agent prompt; only the skills_agent subagent sees them via category_filter = "skill"

Key files

Area Files
Onboarding UI app/src/pages/onboarding/steps/SkillsStep.tsx, ContextGatheringStep.tsx, Onboarding.tsx
LinkedIn enrichment src/openhuman/learning/linkedin_enrichment.rs, schemas.rs
Profile → prompt src/openhuman/context/prompt.rs (USER.md → PROFILE.md)
Tool filtering src/openhuman/agent/harness/instructions.rs, channels/runtime/startup.rs
Controller registry src/core/all.rs

Test plan

  • cargo test --lib -- linkedin_enrichment — 5 regex tests pass
  • cargo check — clean build
  • tsc --noEmit — clean typecheck
  • CLI test: openhuman learning linkedin_enrichment — full pipeline ran successfully, scraped real LinkedIn profile, LLM summarised, PROFILE.md written
  • Manual: complete onboarding flow with Gmail connected, verify context gathering step shows progress
  • Manual: verify orchestrator agent no longer shows Composio tools in its prompt
  • Manual: verify skills_agent still has access to Composio tools

Summary by CodeRabbit

  • New Features

    • Apify integration: start actors, view run status, and fetch results.
    • LinkedIn enrichment: discover LinkedIn profiles from Gmail and save profile summaries.
    • New onboarding context‑gathering step with staged progress.
  • Improvements

    • Skills onboarding now centers on Gmail connection with clearer states and flow.
    • Added "Run Apify Actors" capability to the catalog.
    • Identity/profile uses PROFILE.md instead of USER.md.
    • Composio toolkit icons render without an extra size wrapper.

…ing steps

- Removed the old `toolkitMeta.ts` file and replaced it with a new `toolkitMeta.tsx` file that includes updated metadata handling for Composio toolkits, enhancing the integration with React components.
- Updated the `ComposioConnectModal` to directly render icons without additional markup, streamlining the component structure.
- Modified the `Skills` page to utilize the new icon rendering method, improving consistency across the application.
- Enhanced the onboarding process by introducing a new `ContextGatheringStep` component, which gathers user context from connected integrations, improving the onboarding experience.
- Updated the `SkillsStep` to reflect changes in toolkit connection handling and display, ensuring a smoother user interaction during onboarding.
…d status retrieval

- Added new tools for running Apify actors and fetching their run statuses, enhancing automation capabilities.
- Updated the integration schema to include an `apify` toggle for user configuration, allowing for flexible integration management.
- Enhanced the onboarding experience by modifying the SkillsStep to focus on Gmail integration, streamlining user interactions.
- Improved documentation and comments for clarity on the new Apify functionalities and their usage.
- Introduced a new `linkedin_enrichment` module for enriching user profiles by scraping LinkedIn data from Gmail.
- Implemented the `run_linkedin_enrichment` function to handle the enrichment pipeline, including Gmail search, scraping via Apify, and data persistence.
- Added controller schemas for the learning domain, enabling integration with the existing controller framework.
- Updated the `all.rs` file to register the new learning controllers and schemas, enhancing the overall functionality of the learning system.
- Added functionality to generate a PROFILE.md file from scraped LinkedIn data, summarizing user profiles for agent context.
- Updated the `run_linkedin_enrichment` function to write PROFILE.md to the workspace, enhancing data persistence.
- Introduced helper functions for rendering and summarizing LinkedIn profiles, improving the overall enrichment process.
- Ensured minimal PROFILE.md creation even when scraping fails, maintaining essential user context.
…t pipeline

- Updated the ContextGatheringStep to integrate a new pipeline for LinkedIn enrichment, replacing the previous Gmail profile fetching stages.
- Implemented a progress animation and logging for the enrichment process, improving user feedback during data retrieval.
- Refactored stage definitions to align with the new pipeline structure, enhancing clarity and maintainability.
- Introduced error handling and status updates for each stage of the enrichment process, ensuring robust user experience.
…and flexibility

- Introduced helper functions `tool_instructions_preamble` and `append_tool_entry` to streamline the construction of tool instructions.
- Updated `build_tool_instructions` to utilize the new helper functions, improving code readability and maintainability.
- Added `build_tool_instructions_filtered` to allow for generating instructions from a filtered list of tools, enhancing flexibility in tool usage.
- Adjusted the startup process to use the filtered instructions, ensuring only relevant tools are included in the system prompt.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 23bd8de5-e16c-49da-8a58-b633d8ddfe94

📥 Commits

Reviewing files that changed from the base of the PR and between e97643e and 1cfcd89.

📒 Files selected for processing (6)
  • .github/workflows/test.yml
  • app/src/components/composio/toolkitMeta.tsx
  • app/src/pages/onboarding/Onboarding.tsx
  • src/api/config.rs
  • src/openhuman/learning/linkedin_enrichment.rs
  • src/openhuman/learning/schemas.rs

📝 Walkthrough

Walkthrough

Adds a LinkedIn enrichment onboarding step (Gmail → Apify → LLM + memory), implements Apify integration tools and config, migrates Composio toolkit metadata/icons to a React .tsx module and updates icon rendering, and replaces USER.md bootstrap handling with PROFILE.md across prompts/tests.

Changes

Cohort / File(s) Summary
Composio frontend
app/src/components/composio/toolkitMeta.tsx, app/src/components/composio/ComposioConnectModal.tsx, app/src/pages/Skills.tsx, app/src/pages/__tests__/Skills.composio-catalog.test.tsx
New React toolkitMeta.tsx with branded React-node icons and fallback; removed explicit <span className="text-lg"> wrappers and updated Skills page and tests to use icons directly and include additional fallback names.
Onboarding UI
app/src/pages/onboarding/Onboarding.tsx, app/src/pages/onboarding/steps/SkillsStep.tsx, app/src/pages/onboarding/steps/ContextGatheringStep.tsx
Added ContextGatheringStep; reworked SkillsStep for Gmail-focused Composio onboarding with modal and connectedSources persistence; advanced onboarding flow and added logging and backend onboarding-complete calls.
Apify integration (backend)
src/openhuman/integrations/apify.rs, src/openhuman/integrations/mod.rs, src/openhuman/integrations/types.rs, src/openhuman/tools/ops.rs
Added Apify tools (run actor, get status, get results), unit tests, pricing entry support, and conditional tool registration behind integrations.apify toggle.
Learning pipeline & controller
src/openhuman/learning/linkedin_enrichment.rs, src/openhuman/learning/schemas.rs, src/openhuman/learning/mod.rs, src/core/all.rs
New LinkedIn enrichment pipeline (Gmail mining → Apify scraping → optional LLM summary → workspace/memory persistence), controller schema and registration, and wired learning namespace into registry.
Prompts / identity / workspace
src/openhuman/context/prompt.rs, src/openhuman/context/channels_prompt.rs, src/openhuman/subconscious/prompt.rs, src/openhuman/workspace/ops.rs, src/openhuman/channels/tests/*, src/openhuman/channels/runtime/startup.rs
Removed USER.md from bootstrap; introduced/expected PROFILE.md in workspace and tests; adjusted prompt injection and startup logic to exclude skill tools from system prompt.
Tool instructions refactor
src/openhuman/agent/harness/instructions.rs, src/openhuman/agent/harness/mod.rs
Extracted shared preamble and per-tool append helpers; added build_tool_instructions_filtered and re-exported it for filtered instruction generation.
Catalog & capabilities
src/openhuman/about_app/catalog.rs
Added capability entry skills.run_apify_actors to the capabilities catalog.
Tests / helpers
src/openhuman/channels/tests/common.rs, src/openhuman/channels/tests/identity.rs, src/openhuman/channels/tests/prompt.rs
Test workspace helper now writes PROFILE.md; tests updated to assert PROFILE.md presence/absence and content.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend
    participant Backend
    participant Composio as Gmail/Composio
    participant Apify as Apify API
    participant LLM as LLM Service
    participant Memory as Memory Store

    User->>Frontend: Trigger ContextGatheringStep / run enrichment
    Frontend->>Backend: callCoreRpc("openhuman.learning_linkedin_enrichment")
    activate Backend
    Backend->>Composio: Search Gmail for linkedin.com
    Composio-->>Backend: Email messages (possible profile URLs)
    Backend->>Backend: Extract canonical LinkedIn URL
    alt profile URL found
        Backend->>Apify: POST run actor (scrape profile)
        Apify-->>Backend: Run ID / status
        Backend->>Apify: Poll run status
        Apify-->>Backend: Final status + results
        Backend->>LLM: Optional summarize profile
        LLM-->>Backend: Summary or error
        Backend->>Memory: Store profile/summary
        Memory-->>Backend: Ack
    else no URL or scrape failed
        Backend->>Backend: Persist URL-only fallback
    end
    Backend-->>Frontend: {profile_url, profile_data, log}
    deactivate Backend
    Frontend->>User: Display progress/results, enable Continue
Loading
sequenceDiagram
    participant SkillsStep
    participant ComposioModal
    participant Integrations as useComposioIntegrations

    User->>SkillsStep: Click "Connect Gmail" card
    SkillsStep->>ComposioModal: Open modal for gmail
    ComposioModal->>ComposioModal: Run OAuth flow
    ComposioModal-->>SkillsStep: onClose (connected)
    SkillsStep->>Integrations: Re-fetch integrations state
    Integrations-->>SkillsStep: connected=true
    SkillsStep->>User: Show Connected badge, enable Continue
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped in, nose a-tingle at code so spry,

Gmail traces led to Apify in the sky,
Scraped profiles tucked in Memory’s den,
Onboarding’s wiser — I’ll hop back again. 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main features introduced: onboarding Gmail integration and LinkedIn profile enrichment pipeline.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/context/prompt.rs (1)

355-375: ⚠️ Potential issue | 🟠 Major

Align identity file usage across agent and channel prompts.

The main agent prompt no longer injects USER.md (replaced with PROFILE.md), but channels_prompt.rs (line 41) still lists USER.md in bootstrap_files, and subconscious/prompt.rs still loads identity context from USER.md. Additionally, workspace/ops.rs (line 11) still includes the default USER.md content.

If USER.md is being phased out in favor of PROFILE.md, update channels_prompt.rs and subconscious/prompt.rs to use PROFILE.md and remove the USER.md default from workspace/ops.rs. If channels and subconscious intentionally retain USER.md while main agents use PROFILE.md, document this separation in code comments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/context/prompt.rs` around lines 355 - 375, The repo is
inconsistent: main prompt code now uses PROFILE.md but channels_prompt.rs's
bootstrap_files and subconscious/prompt.rs still reference USER.md and
workspace/ops.rs still provides a USER.md default; update channels_prompt.rs
(bootstrap_files) and subconscious/prompt.rs (where identity is loaded) to
reference "PROFILE.md" instead of "USER.md", and remove the USER.md default
content from workspace/ops.rs (or if the intent is to keep both, add a clear
code comment in channels_prompt.rs and subconscious/prompt.rs documenting why
channels/subconscious retain USER.md while main agents use PROFILE.md). Ensure
you update the relevant constants/arrays and any calls that read or inject the
identity file to use "PROFILE.md" (or add the explanatory comment) so file usage
is consistent.
🧹 Nitpick comments (5)
app/src/pages/onboarding/steps/ContextGatheringStep.tsx (1)

113-121: Add debug logging for pipeline execution.

Per coding guidelines, add namespaced debug logs for new flows to aid tracing.

🔧 Suggested improvement
 async function runPipeline() {
+  console.debug('[onboarding:context-gathering] starting enrichment pipeline');
   // Mark all stages as active (pipeline runs as one call).
   setStageStatuses(prev => ({ ...prev, 'gmail-search': 'active' }));

   try {
     const raw = await callCoreRpc<unknown>({
       method: 'openhuman.learning_linkedin_enrichment',
     });
+    console.debug('[onboarding:context-gathering] RPC completed', { raw });
     const result = unwrapCliEnvelope<EnrichmentResult>(raw);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` around lines 113 -
121, The runPipeline function is missing namespaced debug logging for tracing;
add debug logs (using the project's logger or a namespaced console.debug) at key
points in runPipeline: before starting the pipeline (after setStageStatuses),
immediately before and after calling
callCoreRpc('openhuman.learning_linkedin_enrichment'), and after
unwrapCliEnvelope(EnrichmentResult) to log the unwrapped result or error
context; reference the runPipeline function and the
callCoreRpc/unwrapCliEnvelope calls when inserting these concise, namespaced
debug messages to aid tracing.
src/openhuman/learning/linkedin_enrichment.rs (1)

512-546: Consider reusing MemoryClient instance.

Both persist_linkedin_profile and persist_linkedin_url_only create separate MemoryClient::new_local() instances. Since both are called from the same pipeline, consider passing the client as a parameter or creating it once in run_linkedin_enrichment.

This is minor since the pipeline runs once during onboarding.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/learning/linkedin_enrichment.rs` around lines 512 - 546,
persist_linkedin_profile and persist_linkedin_url_only each call
MemoryClient::new_local(), causing duplicate client creation; change the design
to create a single MemoryClient in run_linkedin_enrichment and pass a reference
or owned client into persist_linkedin_profile and persist_linkedin_url_only
(e.g., add a parameter like memory: &MemoryClient or memory: MemoryClient),
update their signatures and call sites in run_linkedin_enrichment accordingly,
and remove the MemoryClient::new_local() calls from those functions so they
reuse the shared client instance.
app/src/components/composio/toolkitMeta.tsx (1)

293-305: Inconsistency between CATALOG and KNOWN_COMPOSIO_TOOLKITS.

The CATALOG includes both slug variants (googlecalendar / google_calendar, googledrive / google_drive, googlesheets / google_sheets), but KNOWN_COMPOSIO_TOOLKITS only includes one variant for each. This could cause display issues if the backend returns the alternate variant.

Consider including both variants in KNOWN_COMPOSIO_TOOLKITS or documenting that the list is non-exhaustive:

♻️ Suggested fix
 export const KNOWN_COMPOSIO_TOOLKITS = Object.freeze([
   'gmail',
   'googlecalendar',
+  'google_calendar',
   'googledrive',
+  'google_drive',
   'notion',
   'github',
   'slack',
   'linear',
   'facebook',
   'google_sheets',
+  'googlesheets',
   'instagram',
   'reddit',
 ]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/composio/toolkitMeta.tsx` around lines 293 - 305, The
KNOWN_COMPOSIO_TOOLKITS array is missing alternate slug variants present in
CATALOG (e.g., google_calendar vs googlecalendar, google_drive vs googledrive,
google_sheets vs googlesheets); update KNOWN_COMPOSIO_TOOLKITS to include both
slug variants for each toolkit or explicitly document that the list is
non‑exhaustive and used only as a hint. Locate the KNOWN_COMPOSIO_TOOLKITS
constant and add the alternate strings (google_calendar, google_drive,
google_sheets and any other duplicate variants found in CATALOG) so the frontend
can handle either slug returned by the backend.
app/src/pages/onboarding/Onboarding.tsx (2)

135-160: Add debug logging for context completion flow.

The function handles onboarding completion correctly with good error handling. Consider adding entry debug logging for traceability.

🔧 Suggested improvement
 const handleContextNext = async () => {
+  console.debug('[onboarding] handleContextNext: completing onboarding', {
+    connectedSources: draft.connectedSources,
+  });
   await setOnboardingTasks({
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/Onboarding.tsx` around lines 135 - 160, Add entry
debug logging at the start of handleContextNext to trace the context completion
flow: log a clear message (including relevant state like draft.connectedSources
or a minimal marker) before calling setOnboardingTasks, and add similar debug
logs before the userApi.onboardingComplete call and before
setOnboardingCompletedFlag to aid tracing; use the existing logging mechanism
(console.debug/console.log or the project logger) and include the function name
handleContextNext in each message to make logs searchable.

130-133: Add debug logging for new flow entry point.

Per coding guidelines, new flows should have substantial development-oriented logs. Consider adding a namespaced debug log when handleSkillsNext is invoked.

🔧 Suggested improvement
 const handleSkillsNext = async (connectedSources: string[]) => {
+  console.debug('[onboarding] handleSkillsNext called', { connectedSources });
   setDraft(prev => ({ ...prev, connectedSources }));
   handleNext();
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/Onboarding.tsx` around lines 130 - 133, Add a
namespaced debug log at the start of the handleSkillsNext function so developer
telemetry records when this new flow entry point is invoked; specifically,
inside handleSkillsNext (which calls setDraft and handleNext) log a clear
namespaced message (e.g., "onboarding:handleSkillsNext") along with the
connectedSources payload and any relevant draft state before calling
setDraft/handleNext to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/composio/toolkitMeta.tsx`:
- Line 1: Prettier formatting failed in
app/src/components/composio/toolkitMeta.tsx; run the project's Prettier (and
ESLint autofix) in the app workspace, format this file (and any changed files)
and re-run linting so the file (toolkitMeta.tsx) adheres to the code style
rules, then stage and commit the formatted changes before pushing.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx`:
- Line 1: Run Prettier on the ContextGatheringStep.tsx file to resolve
formatting issues reported by the pipeline; specifically run `npm run format` or
`npx prettier --write` in the app workspace, then stage the formatted changes
for commit. Ensure the exported component/function ContextGatheringStep (and any
imports at top of the file) are correctly formatted per project Prettier rules
before pushing.
- Around line 96-111: The effect in useEffect checks hasGmail and synchronously
calls setStageStatuses, setStageDetails, and setFinished which triggers the
react-hooks/set-state-in-effect lint rule; instead, move the "skipped"
derivation out of the effect (derive skipped statuses from STAGES/hasGmail
during render) or wrap the state updates in a microtask so they are async (e.g.,
schedule via setTimeout/queueMicrotask) and keep the existing early-return
behavior in useEffect; update the logic around ranRef, hasGmail, STAGES,
setStageStatuses, setStageDetails, setFinished, and runPipeline accordingly so
no synchronous setState occurs directly in the effect body.

In `@app/src/pages/onboarding/steps/SkillsStep.tsx`:
- Around line 63-76: The step currently hard-codes displayToolkits to Gmail and
computes connectedCount from all connections; update it to derive
displayToolkits from useComposioIntegrations().toolkits by selecting the Gmail
toolkit (or an empty array when not present) so the UI reflects the backend
allowlist, then compute connectedCount and connectedSources only by iterating
connectionByToolkit for the slugs in displayToolkits (use the toolkit.slug to
filter), and change the loading/unavailable logic to show a retry/unavailable
card when composioError is set rather than always rendering an actionable Gmail
card; ensure composioToolkitMeta('gmail') is only used to map metadata for a
toolkit that exists in toolkits before adding to displayToolkits.

In `@src/core/all.rs`:
- Around line 138-139: The new learning controllers registered via
controllers.extend(crate::openhuman::learning::all_learning_registered_controllers())
lack a user-facing description because namespace_description() does not return a
description for the "learning" namespace; update the namespace_description
function to include a descriptive entry for "learning" (and any duplicate
namespace_description match arm referenced later around the other occurrence) so
CLI/help discovery shows a human-readable description for the learning RPC
surface—locate namespace_description and add a case for "learning" (or the exact
namespace string used when registering via all_learning_registered_controllers)
with a short explanatory string.

In `@src/openhuman/agent/harness/instructions.rs`:
- Around line 4-16: The formatting failure is caused by long string literals in
function tool_instructions_preamble(); run cargo fmt to auto-fix or manually
wrap/break the long s.push_str(...) calls into shorter multi-line string
literals (use concatenated or raw/multi-line strings) so they adhere to rustfmt
rules, updating the s.push_str(...) invocations around the code block and
"CRITICAL"/"Example" paragraphs to the shorter, formatted forms suggested in the
review.

In `@src/openhuman/channels/runtime/startup.rs`:
- Around line 220-229: The system prompt still includes Skill-category tool
descriptions because build_system_prompt(...) is being called with the full
tool_descs list; before constructing tool_descs (or before calling
build_system_prompt), filter tools_registry the same way you did for the
appended instruction block: create a non-skill collection (e.g., reuse
non_skill_tools/non_skill_refs logic) and build tool_descs only from those
non-skill tools so build_system_prompt(...) will not receive or include
Skill-category entries like Composio; alternatively, remove Skill entries from
the existing tool_descs vector prior to calling build_system_prompt.

In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Line 1: Run rustfmt on the repository and fix formatting in this module: run
`cargo fmt --all` (and then `cargo check`) and commit the changes; specifically
ensure src/openhuman/learning/linkedin_enrichment.rs is reformatted to comply
with rustfmt rules (fix imports, spacing, line breaks, and doc comment alignment
for the LinkedIn enrichment module and any functions/impls within it) so cargo
fmt no longer reports failures.

---

Outside diff comments:
In `@src/openhuman/context/prompt.rs`:
- Around line 355-375: The repo is inconsistent: main prompt code now uses
PROFILE.md but channels_prompt.rs's bootstrap_files and subconscious/prompt.rs
still reference USER.md and workspace/ops.rs still provides a USER.md default;
update channels_prompt.rs (bootstrap_files) and subconscious/prompt.rs (where
identity is loaded) to reference "PROFILE.md" instead of "USER.md", and remove
the USER.md default content from workspace/ops.rs (or if the intent is to keep
both, add a clear code comment in channels_prompt.rs and subconscious/prompt.rs
documenting why channels/subconscious retain USER.md while main agents use
PROFILE.md). Ensure you update the relevant constants/arrays and any calls that
read or inject the identity file to use "PROFILE.md" (or add the explanatory
comment) so file usage is consistent.

---

Nitpick comments:
In `@app/src/components/composio/toolkitMeta.tsx`:
- Around line 293-305: The KNOWN_COMPOSIO_TOOLKITS array is missing alternate
slug variants present in CATALOG (e.g., google_calendar vs googlecalendar,
google_drive vs googledrive, google_sheets vs googlesheets); update
KNOWN_COMPOSIO_TOOLKITS to include both slug variants for each toolkit or
explicitly document that the list is non‑exhaustive and used only as a hint.
Locate the KNOWN_COMPOSIO_TOOLKITS constant and add the alternate strings
(google_calendar, google_drive, google_sheets and any other duplicate variants
found in CATALOG) so the frontend can handle either slug returned by the
backend.

In `@app/src/pages/onboarding/Onboarding.tsx`:
- Around line 135-160: Add entry debug logging at the start of handleContextNext
to trace the context completion flow: log a clear message (including relevant
state like draft.connectedSources or a minimal marker) before calling
setOnboardingTasks, and add similar debug logs before the
userApi.onboardingComplete call and before setOnboardingCompletedFlag to aid
tracing; use the existing logging mechanism (console.debug/console.log or the
project logger) and include the function name handleContextNext in each message
to make logs searchable.
- Around line 130-133: Add a namespaced debug log at the start of the
handleSkillsNext function so developer telemetry records when this new flow
entry point is invoked; specifically, inside handleSkillsNext (which calls
setDraft and handleNext) log a clear namespaced message (e.g.,
"onboarding:handleSkillsNext") along with the connectedSources payload and any
relevant draft state before calling setDraft/handleNext to aid debugging.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx`:
- Around line 113-121: The runPipeline function is missing namespaced debug
logging for tracing; add debug logs (using the project's logger or a namespaced
console.debug) at key points in runPipeline: before starting the pipeline (after
setStageStatuses), immediately before and after calling
callCoreRpc('openhuman.learning_linkedin_enrichment'), and after
unwrapCliEnvelope(EnrichmentResult) to log the unwrapped result or error
context; reference the runPipeline function and the
callCoreRpc/unwrapCliEnvelope calls when inserting these concise, namespaced
debug messages to aid tracing.

In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Around line 512-546: persist_linkedin_profile and persist_linkedin_url_only
each call MemoryClient::new_local(), causing duplicate client creation; change
the design to create a single MemoryClient in run_linkedin_enrichment and pass a
reference or owned client into persist_linkedin_profile and
persist_linkedin_url_only (e.g., add a parameter like memory: &MemoryClient or
memory: MemoryClient), update their signatures and call sites in
run_linkedin_enrichment accordingly, and remove the MemoryClient::new_local()
calls from those functions so they reuse the shared client instance.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47534480-d111-45b8-a4b5-4902dfdc16de

📥 Commits

Reviewing files that changed from the base of the PR and between ec138c8 and 6bf5e48.

📒 Files selected for processing (22)
  • app/src/components/composio/ComposioConnectModal.tsx
  • app/src/components/composio/toolkitMeta.ts
  • app/src/components/composio/toolkitMeta.tsx
  • app/src/pages/Skills.tsx
  • app/src/pages/__tests__/Skills.composio-catalog.test.tsx
  • app/src/pages/onboarding/Onboarding.tsx
  • app/src/pages/onboarding/steps/ContextGatheringStep.tsx
  • app/src/pages/onboarding/steps/SkillsStep.tsx
  • src/core/all.rs
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/agent/harness/instructions.rs
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/channels/runtime/startup.rs
  • src/openhuman/config/schema/tools.rs
  • src/openhuman/context/prompt.rs
  • src/openhuman/integrations/apify.rs
  • src/openhuman/integrations/mod.rs
  • src/openhuman/integrations/types.rs
  • src/openhuman/learning/linkedin_enrichment.rs
  • src/openhuman/learning/mod.rs
  • src/openhuman/learning/schemas.rs
  • src/openhuman/tools/ops.rs
💤 Files with no reviewable changes (1)
  • app/src/components/composio/toolkitMeta.ts

Comment thread app/src/components/composio/toolkitMeta.tsx
@@ -0,0 +1,315 @@
/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address Prettier formatting issues.

The pipeline reports Prettier code style issues. Run npm run format or npx prettier --write on this file before merging.

As per coding guidelines: "Run Prettier and ESLint formatting/linting in the app workspace before merging."

🧰 Tools
🪛 GitHub Actions: Type Check

[warning] 1-1: Prettier reported code style issues in this file during --check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` at line 1, Run
Prettier on the ContextGatheringStep.tsx file to resolve formatting issues
reported by the pipeline; specifically run `npm run format` or `npx prettier
--write` in the app workspace, then stage the formatted changes for commit.
Ensure the exported component/function ContextGatheringStep (and any imports at
top of the file) are correctly formatted per project Prettier rules before
pushing.

Comment thread app/src/pages/onboarding/steps/ContextGatheringStep.tsx
Comment thread app/src/pages/onboarding/steps/SkillsStep.tsx
Comment thread src/core/all.rs
Comment thread src/openhuman/agent/harness/instructions.rs
Comment thread src/openhuman/channels/runtime/startup.rs
Comment thread src/openhuman/learning/linkedin_enrichment.rs
…bility

- Refactored the `BrandIcon` component to streamline its props definition, enhancing code clarity.
- Consolidated SVG path definitions in various icons for better readability and maintainability.
- Updated the `ContextGatheringStep` to simplify the RPC call syntax, improving code conciseness.
- Enhanced logging in the LinkedIn enrichment process for clearer tracking of Gmail searches and scraping stages.
…l filtering, and quality fixes

- Replace USER.md with PROFILE.md across all prompt paths: channels_prompt.rs,
  subconscious/prompt.rs, workspace/ops.rs bootstrap, and channel tests
- Remove Composio tool description from main agent system prompt (tool_descs)
  so skills_agent is the only agent that sees Skill-category tools
- Add "learning" namespace description for CLI help discovery
- Fix react-hooks/set-state-in-effect: wrap synchronous setState in
  queueMicrotask in ContextGatheringStep
- Derive SkillsStep displayToolkits from backend allowlist with error/retry UI
- Add KNOWN_COMPOSIO_TOOLKITS alternate slug variants (google_calendar, etc.)
- Add namespaced debug logging to onboarding handlers and pipeline
- Deduplicate MemoryClient creation in linkedin_enrichment persist functions
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (1)
app/src/pages/onboarding/steps/SkillsStep.tsx (1)

71-77: ⚠️ Potential issue | 🟠 Major

The loading state is still unreachable here.

displayToolkits always includes 'gmail' while composioLoading is true, so the displayToolkits.length === 0 branch never renders. Users still get an actionable Gmail card before the allowlist fetch has actually resolved.

Also applies to: 125-129

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/SkillsStep.tsx` around lines 71 - 77, The
current filter always includes 'gmail' while composioLoading is true, so
displayToolkits never becomes empty; change the logic so that while
composioLoading is true you return an empty array and only compute
ONBOARDING_SLUGS.filter(...).map(...) after composioLoading is false (i.e., set
displayToolkits = composioLoading ? [] : ONBOARDING_SLUGS.filter(slug =>
backendToolkits.map(t=>t.toLowerCase()).includes(slug)).map(slug =>
composioToolkitMeta(slug))). Apply the same fix to the other occurrence (lines
referenced around composioLoading/backendToolkits/composioToolkitMeta).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/composio/toolkitMeta.tsx`:
- Around line 335-340: The fallback currently builds name by only uppercasing
the first character of key; update the logic in toolkitMeta.tsx (the block that
computes name from key and returns { slug, name, description }) to split key on
both '_' and '-' into tokens, title-case each token (capitalize first letter and
lowercase the rest), join tokens with spaces to form the human-friendly name,
and use that name in the returned object and description (leave slug as key).
Ensure this replaces the single-char uppercase logic so inputs like
"google_calendar" or "hubspot-contacts" render as "Google Calendar" and "Hubspot
Contacts".

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx`:
- Around line 125-129: The console.debug call in ContextGatheringStep.tsx is
logging PII via result.profile_url; change the console.debug invocation that
references result.profile_url (the enrichment result logging block) to avoid
exposing the full URL—either log a boolean hasProfileUrl (e.g.,
result.profile_url != null) or a redacted form (e.g., mask domain/path) instead
of result.profile_url, and keep the other fields (logLines, hasProfileData)
intact; update the log message key/name to reflect the change (e.g.,
hasProfileUrl) so the frontend never prints raw profile_url.
- Around line 116-130: The runPipeline function currently waits for a single
long callCoreRpc call and only calls applyLogToStages after completion, so
update runPipeline to consume progress incrementally: use a streaming/paginated
RPC or poll the core for progress events (instead of the current callCoreRpc
one-shot) and, as each chunk/event arrives, parse it into the same log format
and call applyLogToStages(...) to update setStageStatuses in real time; keep the
initial setStageStatuses(prev => ({...prev,'gmail-search':'active'})) but also
update stages based on partial logs, and ensure the cleanup/error branch closes
the stream and finalizes statuses after the final envelope is received
(referencing runPipeline, callCoreRpc, applyLogToStages, setStageStatuses,
unwrapCliEnvelope).

In `@app/src/pages/onboarding/steps/SkillsStep.tsx`:
- Around line 107-109: The privacy text in the SkillsStep component
(SkillsStep.tsx) is inaccurate because onboarding now sends data to remote
services (see src/openhuman/learning/linkedin_enrichment.rs which calls an Apify
actor and backend LLM summarization); update the paragraph text rendered in that
<p> to remove “never leaves your device” and instead clearly state that certain
data may be processed by third‑party/remote services for enrichment and
summarization (e.g., “Some data will be sent to our backend and trusted
third‑party services for enrichment and summarization; it is handled according
to our privacy policy”), so the copy matches the actual data flow.

In `@src/openhuman/context/channels_prompt.rs`:
- Around line 46-51: Add debug/tracing logs around the PROFILE.md branch in the
prompt assembly: when checking if workspace_dir.join("PROFILE.md").is_file() (in
the function/module that calls inject_workspace_file), emit a debug/trace log
indicating "PROFILE.md found, injecting into prompt" before calling
inject_workspace_file(prompt, workspace_dir, "PROFILE.md", max_chars_per_file)
and emit a matching debug/trace log "PROFILE.md not found, skipping injection"
in the else branch so callers can see the decision; use the project's
tracing/log macro consistent with other logs in channels_prompt.rs (e.g., debug!
or trace!) and include workspace_dir/display or filename and max_chars_per_file
for richer context.

In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Around line 130-140: In the Err(e) branch of the scrape (inside
linkedin_enrichment.rs) you need to append a PROFILE.md stage entry to
result.log so the UI reflects that the PROFILE.md fallback was attempted and can
show write failures: after calling write_profile_md_url_only(config, &url) push
either a success message like "PROFILE.md written (URL-only fallback)" to
result.log on Ok, or push a failure message like "PROFILE.md write failed:
{err}" when it returns Err (and retain the existing tracing::warn). Also, after
calling persist_linkedin_url_only(mem, &url).await, capture its Result and, on
Err, push a corresponding failure message into result.log so write/persist
failures are surfaced to the UI; reference the functions
write_profile_md_url_only and persist_linkedin_url_only and the result.log
vector to locate where to add these pushes.
- Line 71: The tracing::info! call that currently logs the full LinkedIn URL
(tracing::info!(url = %url, "[linkedin_enrichment] found LinkedIn profile URL"))
and other similar info-level logs must be changed to avoid emitting PII: lower
the level to debug or trace, and replace the full URL/slug with a redacted
identifier or non-PII metric (e.g., a boolean, count, or a deterministic short
hash/prefix), e.g. log url_redacted or url_hash instead of %url and/or log
found_profile = true; update the similar tracing calls throughout the
linkedin_enrichment module (the other tracing::info!/tracing::debug! uses noted
in the review) to follow the same pattern so no full profile identifiers appear
in logs.

In `@src/openhuman/subconscious/prompt.rs`:
- Around line 3-4: Update the module-level doc comment in prompt.rs to
accurately state where each identity file is loaded from: specify that SOUL.md
is read from the resolved prompts directory and PROFILE.md is read from the
workspace root (instead of implying both come from the workspace); edit the top
doc comment that currently mentions SOUL.md and PROFILE.md so it clearly
differentiates the two locations and mirrors the behavior implemented in the
functions that load those files.
- Around line 156-162: Add structured debug/trace logging around the PROFILE.md
branch in the identity context: when calling load_file_excerpt(workspace_dir,
"PROFILE.md") log a trace/debug event with a stable prefix (e.g. target
"subconscious::prompt" and field action="profile_load") that includes whether
the file was loaded or skipped and relevant context (workspace_dir and the
length or existence boolean of profile); also log the path attempted and any
error info if load_file_excerpt returns an Err variant. Place logs immediately
before/after the load_file_excerpt call and before the ctx.push_str calls so the
behavior of load_file_excerpt, ctx.push_str, and the PROFILE.md branch is
observable.

---

Duplicate comments:
In `@app/src/pages/onboarding/steps/SkillsStep.tsx`:
- Around line 71-77: The current filter always includes 'gmail' while
composioLoading is true, so displayToolkits never becomes empty; change the
logic so that while composioLoading is true you return an empty array and only
compute ONBOARDING_SLUGS.filter(...).map(...) after composioLoading is false
(i.e., set displayToolkits = composioLoading ? [] : ONBOARDING_SLUGS.filter(slug
=> backendToolkits.map(t=>t.toLowerCase()).includes(slug)).map(slug =>
composioToolkitMeta(slug))). Apply the same fix to the other occurrence (lines
referenced around composioLoading/backendToolkits/composioToolkitMeta).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a133e8d-191f-49ea-9ba4-66c0502c4f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf5e48 and b0c37af.

📒 Files selected for processing (14)
  • app/src/components/composio/toolkitMeta.tsx
  • app/src/pages/onboarding/Onboarding.tsx
  • app/src/pages/onboarding/steps/ContextGatheringStep.tsx
  • app/src/pages/onboarding/steps/SkillsStep.tsx
  • src/core/all.rs
  • src/openhuman/agent/harness/instructions.rs
  • src/openhuman/channels/runtime/startup.rs
  • src/openhuman/channels/tests/common.rs
  • src/openhuman/channels/tests/identity.rs
  • src/openhuman/channels/tests/prompt.rs
  • src/openhuman/context/channels_prompt.rs
  • src/openhuman/learning/linkedin_enrichment.rs
  • src/openhuman/subconscious/prompt.rs
  • src/openhuman/workspace/ops.rs
✅ Files skipped from review due to trivial changes (1)
  • src/openhuman/channels/tests/identity.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/agent/harness/instructions.rs

Comment on lines +335 to +340
// Fallback: title-case the slug and bucket it under "Other".
const name = key.charAt(0).toUpperCase() + key.slice(1);
return {
slug: key,
name,
description: `Composio integration for ${name}.`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Humanize fallback slugs before rendering them.

This fallback only uppercases the first character, so unknown slugs like google_calendar or hubspot-contacts render as raw backend identifiers instead of the “title-cased” names promised in the module docs. Split on _/- and title-case each token before building name.

Possible fix
-  const name = key.charAt(0).toUpperCase() + key.slice(1);
+  const name = key
+    .split(/[_-]+/)
+    .filter(Boolean)
+    .map(part => part.charAt(0).toUpperCase() + part.slice(1))
+    .join(' ');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Fallback: title-case the slug and bucket it under "Other".
const name = key.charAt(0).toUpperCase() + key.slice(1);
return {
slug: key,
name,
description: `Composio integration for ${name}.`,
// Fallback: title-case the slug and bucket it under "Other".
const name = key
.split(/[_-]+/)
.filter(Boolean)
.map(part => part.charAt(0).toUpperCase() + part.slice(1))
.join(' ');
return {
slug: key,
name,
description: `Composio integration for ${name}.`,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/composio/toolkitMeta.tsx` around lines 335 - 340, The
fallback currently builds name by only uppercasing the first character of key;
update the logic in toolkitMeta.tsx (the block that computes name from key and
returns { slug, name, description }) to split key on both '_' and '-' into
tokens, title-case each token (capitalize first letter and lowercase the rest),
join tokens with spaces to form the human-friendly name, and use that name in
the returned object and description (leave slug as key). Ensure this replaces
the single-char uppercase logic so inputs like "google_calendar" or
"hubspot-contacts" render as "Google Calendar" and "Hubspot Contacts".

Comment on lines +116 to +130
async function runPipeline() {
console.debug('[onboarding:context] runPipeline started');
// Mark all stages as active (pipeline runs as one call).
setStageStatuses(prev => ({ ...prev, 'gmail-search': 'active' }));

try {
console.debug('[onboarding:context] calling learning_linkedin_enrichment');
const raw = await callCoreRpc<unknown>({ method: 'openhuman.learning_linkedin_enrichment' });
const result = unwrapCliEnvelope<EnrichmentResult>(raw);
console.debug('[onboarding:context] enrichment result', {
profileUrl: result.profile_url,
logLines: result.log.length,
hasProfileData: result.profile_data !== null,
});
applyLogToStages(result.log, result.profile_url);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This does not provide live stage progress yet.

The UI waits for one long callCoreRpc() and only calls applyLogToStages() after the RPC returns. During the entire Apify/LLM run, the step stays on 'gmail-search', so users never see the stage transitions this screen is trying to represent. If live progress is part of the feature, this needs streamed progress events or polling rather than post-hoc log parsing.

Also applies to: 150-187

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` around lines 116 -
130, The runPipeline function currently waits for a single long callCoreRpc call
and only calls applyLogToStages after completion, so update runPipeline to
consume progress incrementally: use a streaming/paginated RPC or poll the core
for progress events (instead of the current callCoreRpc one-shot) and, as each
chunk/event arrives, parse it into the same log format and call
applyLogToStages(...) to update setStageStatuses in real time; keep the initial
setStageStatuses(prev => ({...prev,'gmail-search':'active'})) but also update
stages based on partial logs, and ensure the cleanup/error branch closes the
stream and finalizes statuses after the final envelope is received (referencing
runPipeline, callCoreRpc, applyLogToStages, setStageStatuses,
unwrapCliEnvelope).

Comment on lines +125 to +129
console.debug('[onboarding:context] enrichment result', {
profileUrl: result.profile_url,
logLines: result.log.length,
hasProfileData: result.profile_data !== null,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t log the full LinkedIn URL in the browser console.

result.profile_url contains user-identifying data, and this debug log ships it straight to the frontend console. Log hasProfileUrl or a redacted form instead.

Possible fix
       console.debug('[onboarding:context] enrichment result', {
-        profileUrl: result.profile_url,
+        hasProfileUrl: result.profile_url !== null,
         logLines: result.log.length,
         hasProfileData: result.profile_data !== null,
       });

As per coding guidelines: "Never log secrets, raw JWTs, API keys, or full PII in TypeScript/React app code; redact or omit sensitive fields".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.debug('[onboarding:context] enrichment result', {
profileUrl: result.profile_url,
logLines: result.log.length,
hasProfileData: result.profile_data !== null,
});
console.debug('[onboarding:context] enrichment result', {
hasProfileUrl: result.profile_url !== null,
logLines: result.log.length,
hasProfileData: result.profile_data !== null,
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` around lines 125 -
129, The console.debug call in ContextGatheringStep.tsx is logging PII via
result.profile_url; change the console.debug invocation that references
result.profile_url (the enrichment result logging block) to avoid exposing the
full URL—either log a boolean hasProfileUrl (e.g., result.profile_url != null)
or a redacted form (e.g., mask domain/path) instead of result.profile_url, and
keep the other fields (logLines, hasProfileData) intact; update the log message
key/name to reflect the change (e.g., hasProfileUrl) so the frontend never
prints raw profile_url.

Comment on lines 107 to +109
<p className="text-stone-600 text-sm">
OpenHuman no longer installs local QuickJS skills during onboarding. You can connect
channels and Composio integrations later from the Integrations page once setup is
complete.
Connect your Gmail so OpenHuman can learn about you and build context for your agent. Your
data is saved locally and never leaves your device.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This privacy copy is now inaccurate.

The new onboarding flow sends data through remote services (src/openhuman/learning/linkedin_enrichment.rs calls an Apify actor and backend LLM summarization), so “never leaves your device” is no longer true. Please reword this to match the actual data flow before release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/SkillsStep.tsx` around lines 107 - 109, The
privacy text in the SkillsStep component (SkillsStep.tsx) is inaccurate because
onboarding now sends data to remote services (see
src/openhuman/learning/linkedin_enrichment.rs which calls an Apify actor and
backend LLM summarization); update the paragraph text rendered in that <p> to
remove “never leaves your device” and instead clearly state that certain data
may be processed by third‑party/remote services for enrichment and summarization
(e.g., “Some data will be sent to our backend and trusted third‑party services
for enrichment and summarization; it is handled according to our privacy
policy”), so the copy matches the actual data flow.

Comment on lines +46 to +51
// PROFILE.md — generated by the onboarding enrichment pipeline (e.g.
// LinkedIn scrape). Not bundled; only exists after the user completes
// the context-gathering onboarding step.
if workspace_dir.join("PROFILE.md").is_file() {
inject_workspace_file(prompt, workspace_dir, "PROFILE.md", max_chars_per_file);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add structured debug logs for PROFILE.md inject/skip branch.

The new optional PROFILE.md path introduces an important branch decision, but there’s no trace/debug signal to troubleshoot onboarding-context state when prompts are assembled.

🔧 Proposed patch
-    if workspace_dir.join("PROFILE.md").is_file() {
+    if workspace_dir.join("PROFILE.md").is_file() {
+        tracing::debug!(
+            target: "openhuman::context::channels_prompt",
+            profile_present = true,
+            max_chars_per_file,
+            "[context][channels_prompt] injecting optional PROFILE.md"
+        );
         inject_workspace_file(prompt, workspace_dir, "PROFILE.md", max_chars_per_file);
+    } else {
+        tracing::debug!(
+            target: "openhuman::context::channels_prompt",
+            profile_present = false,
+            "[context][channels_prompt] skipping optional PROFILE.md"
+        );
     }

As per coding guidelines "Add substantial, development-oriented logs on new/changed flows; include logs at ... branch decisions ... Use log/tracing at debug or trace level for development-oriented diagnostics in Rust."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// PROFILE.md — generated by the onboarding enrichment pipeline (e.g.
// LinkedIn scrape). Not bundled; only exists after the user completes
// the context-gathering onboarding step.
if workspace_dir.join("PROFILE.md").is_file() {
inject_workspace_file(prompt, workspace_dir, "PROFILE.md", max_chars_per_file);
}
// PROFILE.md — generated by the onboarding enrichment pipeline (e.g.
// LinkedIn scrape). Not bundled; only exists after the user completes
// the context-gathering onboarding step.
if workspace_dir.join("PROFILE.md").is_file() {
tracing::debug!(
target: "openhuman::context::channels_prompt",
profile_present = true,
max_chars_per_file,
"[context][channels_prompt] injecting optional PROFILE.md"
);
inject_workspace_file(prompt, workspace_dir, "PROFILE.md", max_chars_per_file);
} else {
tracing::debug!(
target: "openhuman::context::channels_prompt",
profile_present = false,
"[context][channels_prompt] skipping optional PROFILE.md"
);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/context/channels_prompt.rs` around lines 46 - 51, Add
debug/tracing logs around the PROFILE.md branch in the prompt assembly: when
checking if workspace_dir.join("PROFILE.md").is_file() (in the function/module
that calls inject_workspace_file), emit a debug/trace log indicating "PROFILE.md
found, injecting into prompt" before calling inject_workspace_file(prompt,
workspace_dir, "PROFILE.md", max_chars_per_file) and emit a matching debug/trace
log "PROFILE.md not found, skipping injection" in the else branch so callers can
see the decision; use the project's tracing/log macro consistent with other logs
in channels_prompt.rs (e.g., debug! or trace!) and include workspace_dir/display
or filename and max_chars_per_file for richer context.


let profile_url = match search_gmail_for_linkedin(config).await {
Ok(Some(url)) => {
tracing::info!(url = %url, "[linkedin_enrichment] found LinkedIn profile URL");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact profile identifiers from these logs.

These log lines emit the full LinkedIn URL, username slug, and workspace path at info level. That leaks user PII into routine logs and keeps development diagnostics noisier than they need to be. Log booleans/counts or a redacted identifier instead, and keep this flow at debug/trace.

As per coding guidelines: "Never log secrets, API keys, JWTs, credentials, or full PII in Rust logs; redact or omit sensitive fields" and "Use log/tracing at debug or trace level for development-oriented diagnostics in Rust".

Also applies to: 99-99, 177-177, 451-455, 463-466

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/learning/linkedin_enrichment.rs` at line 71, The tracing::info!
call that currently logs the full LinkedIn URL (tracing::info!(url = %url,
"[linkedin_enrichment] found LinkedIn profile URL")) and other similar
info-level logs must be changed to avoid emitting PII: lower the level to debug
or trace, and replace the full URL/slug with a redacted identifier or non-PII
metric (e.g., a boolean, count, or a deterministic short hash/prefix), e.g. log
url_redacted or url_hash instead of %url and/or log found_profile = true; update
the similar tracing calls throughout the linkedin_enrichment module (the other
tracing::info!/tracing::debug! uses noted in the review) to follow the same
pattern so no full profile identifiers appear in logs.

Comment thread src/openhuman/learning/linkedin_enrichment.rs
Comment on lines +3 to 4
//! Injects OpenClaw identity context (SOUL.md, PROFILE.md) so the local model
//! reasons as the agent, not a generic evaluator.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doc comment is inaccurate about SOUL.md location.

Line 143 implies both files are loaded from the workspace, but SOUL.md is loaded from the resolved prompts directory while PROFILE.md is loaded from workspace root. Please adjust wording to avoid misleading future changes.

As per coding guidelines "Add concise rustdoc / code comments where the flow is not obvious; update ... docs when ... user-visible behavior change."

Also applies to: 143-144

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/subconscious/prompt.rs` around lines 3 - 4, Update the
module-level doc comment in prompt.rs to accurately state where each identity
file is loaded from: specify that SOUL.md is read from the resolved prompts
directory and PROFILE.md is read from the workspace root (instead of implying
both come from the workspace); edit the top doc comment that currently mentions
SOUL.md and PROFILE.md so it clearly differentiates the two locations and
mirrors the behavior implemented in the functions that load those files.

Comment on lines +156 to 162
// PROFILE.md lives in the workspace root (not prompts dir) — it's
// generated by the onboarding enrichment pipeline, not bundled.
if let Some(profile) = load_file_excerpt(workspace_dir, "PROFILE.md") {
ctx.push_str("## User Profile\n\n");
ctx.push_str(&profile);
ctx.push_str("\n\n");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add debug/trace logging for PROFILE.md load path in identity context.

This new optional identity branch should emit structured debug signals (loaded vs skipped) to make subconscious prompt composition diagnosable.

🔧 Proposed patch
-    if let Some(profile) = load_file_excerpt(workspace_dir, "PROFILE.md") {
+    if let Some(profile) = load_file_excerpt(workspace_dir, "PROFILE.md") {
+        tracing::debug!(
+            target: "openhuman::subconscious::prompt",
+            profile_present = true,
+            "[subconscious][prompt] injecting PROFILE.md into identity context"
+        );
         ctx.push_str("## User Profile\n\n");
         ctx.push_str(&profile);
         ctx.push_str("\n\n");
+    } else {
+        tracing::trace!(
+            target: "openhuman::subconscious::prompt",
+            profile_present = false,
+            "[subconscious][prompt] PROFILE.md absent; using available identity context only"
+        );
     }

As per coding guidelines "Add substantial, development-oriented logs on new/changed flows; include logs at ... branch decisions ... and error handling paths" and "Use log/tracing at debug or trace level ... with stable prefixes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/subconscious/prompt.rs` around lines 156 - 162, Add structured
debug/trace logging around the PROFILE.md branch in the identity context: when
calling load_file_excerpt(workspace_dir, "PROFILE.md") log a trace/debug event
with a stable prefix (e.g. target "subconscious::prompt" and field
action="profile_load") that includes whether the file was loaded or skipped and
relevant context (workspace_dir and the length or existence boolean of profile);
also log the path attempted and any error info if load_file_excerpt returns an
Err variant. Place logs immediately before/after the load_file_excerpt call and
before the ctx.push_str calls so the behavior of load_file_excerpt,
ctx.push_str, and the PROFILE.md branch is observable.

…unction

- Consolidated debug logging in the handleContextNext function to improve clarity and reduce verbosity.
- Removed unnecessary line breaks for a more concise code structure.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/src/pages/onboarding/Onboarding.tsx (1)

155-163: ⚠️ Potential issue | 🟠 Major

Don’t finish onboarding if the source-of-truth flag failed to persist.

This block treats setOnboardingCompletedFlag(true) as best-effort, but the comment says it is the source of truth. If it fails, onComplete?.() still runs and the user can leave onboarding only to be forced back into it on the next launch. ContextGatheringStep.tsx:199-205 already surfaces onNext() errors, so this failure should be propagated instead of swallowed.

Proposed fix
-    try {
-      await setOnboardingCompletedFlag(true);
-    } catch {
-      console.warn('[onboarding] Failed to persist onboarding_completed to core config');
-    }
+    try {
+      await setOnboardingCompletedFlag(true);
+    } catch (e) {
+      console.warn('[onboarding] Failed to persist onboarding_completed to core config');
+      throw e;
+    }
 
     onComplete?.();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/Onboarding.tsx` around lines 155 - 163, The
onboarding completion handler currently swallows errors from
setOnboardingCompletedFlag(true) and always calls onComplete, allowing the flow
to finish even when the source-of-truth write failed; change handleContextNext
so that errors from setOnboardingCompletedFlag are not caught/suppressed (or
rethrow after logging) and only call onComplete when setOnboardingCompletedFlag
resolves successfully—i.e., remove the empty catch or throw the caught error so
setOnboardingCompletedFlag and onComplete behavior is consistent with
ContextGatheringStep’s onNext error propagation (referencing
setOnboardingCompletedFlag and onComplete).
♻️ Duplicate comments (1)
src/openhuman/learning/linkedin_enrichment.rs (1)

490-494: ⚠️ Potential issue | 🟠 Major

Redact the LinkedIn URL from this Apify invocation log.

This debug event still emits the full profile_url, which is user PII. Keep the actor name, but replace the URL with a redacted/hash field or a non-PII boolean/counter.

As per coding guidelines: "Never log secrets, API keys, JWTs, credentials, or full PII in Rust logs; redact or omit sensitive fields" and "Use log/tracing at debug or trace level for development-oriented diagnostics in Rust".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/learning/linkedin_enrichment.rs` around lines 490 - 494, The
tracing::debug! call that logs LINKEDIN_SCRAPER_ACTOR currently emits the full
profile_url (PII); change it to avoid logging raw URLs by replacing profile_url
with either a redacted value or a stable non-PII hash/count: compute a short
hash (e.g., SHA-256 truncated) or a boolean/counter in the surrounding function
(e.g., inside linkedin_enrichment invocation) and log that instead, keeping
actor = LINKEDIN_SCRAPER_ACT and a descriptive message but never include the
full profile_url string in the tracing::debug! invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Around line 102-103: The current code silently discards memory client init
errors by calling build_memory_client().ok(); change this to handle the Err
explicitly: call build_memory_client() and match or use if let Err(e) to log a
sanitized debug/warn with context (including which operation failed and brief
error details) before falling back, and only set memory to None on explicit
fallback; reference build_memory_client() and the memory variable so the branch
logs the error and decision to continue without memory persistence.
- Around line 39-45: The current LinkedInEnrichmentResult makes UI state fragile
by encoding stage state into freeform text in the log; change the RPC contract
to add a typed stages field (e.g. pub stages: Vec<EnrichmentStage>) alongside
keeping pub log: Vec<String> for display-only use. Define EnrichmentStage with
unique identifiers and typed status (e.g. pub struct EnrichmentStage { pub id:
String, pub status: StageStatus, pub detail: Option<String> } and a
serde-friendly enum StageStatus { Success, Failed, Skipped }), update all places
that currently push human-readable messages into result.log (and any functions
like the LinkedIn enrichment runner that mutate LinkedInEnrichmentResult) to
instead append structured EnrichmentStage entries to result.stages while
preserving the original log entries only for display, and ensure the new types
are serializable/deserializable for the RPC boundary.

---

Outside diff comments:
In `@app/src/pages/onboarding/Onboarding.tsx`:
- Around line 155-163: The onboarding completion handler currently swallows
errors from setOnboardingCompletedFlag(true) and always calls onComplete,
allowing the flow to finish even when the source-of-truth write failed; change
handleContextNext so that errors from setOnboardingCompletedFlag are not
caught/suppressed (or rethrow after logging) and only call onComplete when
setOnboardingCompletedFlag resolves successfully—i.e., remove the empty catch or
throw the caught error so setOnboardingCompletedFlag and onComplete behavior is
consistent with ContextGatheringStep’s onNext error propagation (referencing
setOnboardingCompletedFlag and onComplete).

---

Duplicate comments:
In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Around line 490-494: The tracing::debug! call that logs LINKEDIN_SCRAPER_ACTOR
currently emits the full profile_url (PII); change it to avoid logging raw URLs
by replacing profile_url with either a redacted value or a stable non-PII
hash/count: compute a short hash (e.g., SHA-256 truncated) or a boolean/counter
in the surrounding function (e.g., inside linkedin_enrichment invocation) and
log that instead, keeping actor = LINKEDIN_SCRAPER_ACT and a descriptive message
but never include the full profile_url string in the tracing::debug! invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b37d3408-c3ba-4682-8118-4188c0d76bed

📥 Commits

Reviewing files that changed from the base of the PR and between b0c37af and e97643e.

📒 Files selected for processing (2)
  • app/src/pages/onboarding/Onboarding.tsx
  • src/openhuman/learning/linkedin_enrichment.rs

Comment thread src/openhuman/learning/linkedin_enrichment.rs
Comment thread src/openhuman/learning/linkedin_enrichment.rs Outdated
… results

- Introduced a new `EnrichmentStage` struct to capture detailed results for each stage of the LinkedIn enrichment process.
- Updated the `LinkedInEnrichmentResult` to include a vector of stages, allowing for structured reporting of success, failure, and skipped stages.
- Improved error handling and logging throughout the enrichment pipeline, ensuring better traceability of issues during execution.
- Adjusted the API response to include stage results, enhancing the frontend's ability to display detailed enrichment outcomes.
… Linux E2E job

- Modified the description and default values in the macOS E2E test input options for clarity.
- Commented out the entire Linux E2E job configuration to prevent execution while maintaining the setup for future use.
@senamakel senamakel merged commit 8057f22 into tinyhumansai:main Apr 13, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant