Skip to content

feat: Hermes self-observation + skill auto-update + knowledge graph#95

Merged
Gradata merged 3 commits into
mainfrom
feat/behavioral-engine
Apr 16, 2026
Merged

feat: Hermes self-observation + skill auto-update + knowledge graph#95
Gradata merged 3 commits into
mainfrom
feat/behavioral-engine

Conversation

@Gradata

@Gradata Gradata commented Apr 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • Self-observation pipeline: SELF_REVIEW_VIOLATION events become lesson candidates (Phase 1.5 in pipeline)
  • Skill auto-update: SKILL.md only regenerates when confidence changes ≥0.05
  • Knowledge graph API: Brain.knowledge_graph() assembles nodes, clusters, contradictions, cross-domain
  • CodeRabbit fixes: 13 findings addressed (8 bugs + 5 nits)

2841 tests passing. All features wired and audit-verified.

Test plan

  • Full test suite passing (2841)
  • All 12 features wiring audit: WIRED with line-number evidence
  • CodeRabbit round 1 (10 findings) + round 2 (13 findings) addressed

Generated with Gradata

Gradata and others added 3 commits April 16, 2026 11:40
After end_session graduation sweep, automatically sync telemetry
metrics and events to Gradata Cloud if the user has authenticated
via `gradata login`. Reads credentials from ~/.gradata/config.toml
or GRADATA_API_KEY env var. Cloud sync never blocks or crashes the
local learning loop — all failures are silently logged.

Co-Authored-By: Gradata <noreply@gradata.ai>
Finding 4 (HIGH): restrict config file/dir permissions (0o600/0o700)
  with startup check for overly permissive files, Windows-safe
Finding 5 (MEDIUM): reject non-HTTPS GRADATA_API_URL (allow localhost)
Finding 11 (MEDIUM): default sync_mode=metrics_only, skip raw content sync
Finding 12 (LOW): sanitize TOML values to prevent config injection

Co-Authored-By: Gradata <noreply@gradata.ai>
…e graph API

Stolen from Hermes agent pattern, adapted for Gradata SDK:

- Self-observation pipeline (Phase 1.5): SELF_REVIEW_VIOLATION events
  become INSTINCT lesson candidates with pending_approval=True and
  source=self_observation. Bypasses diff engine (synthetic pairs produce
  garbage). Deduplicates against existing lessons.
- Skill auto-update: SKILL.md files only regenerate when source rule
  confidence changes by >=0.05. Skips on unparseable old confidence
  (no infinite delta). Adds updated_at timestamp.
- Knowledge graph API: Brain.knowledge_graph() assembles nodes, clusters,
  contradictions, and cross-domain candidates from existing modules.
  Read-only, no state mutation. Public API for dashboards/agents.

2841 tests passing (+10 new).

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown
📝 Walkthrough
  • New Public API: Brain.knowledge_graph() method returns structured knowledge-graph with nodes, clusters, contradictions, cross-domain candidates, and stats for dashboards/agents
  • Self-observation pipeline (Phase 1.5): Converts recent SELF_REVIEW_VIOLATION events into lesson candidates with pending_approval=True and agent_type="self_observation", with automatic deduplication against existing lessons
  • Skill auto-update threshold: SKILL.md files regenerate only when source rule confidence changes by ≥0.05; skips regeneration on unparseable prior confidence and records updated_at timestamp
  • Cloud telemetry sync: Best-effort non-blocking cloud sync on brain_end_session() that uploads metrics and optional full events/corrections based on sync_mode config
  • Security fixes: Stricter config file permissions (0o700 directory, 0o600 file on Unix), reject non-HTTPS GRADATA_API_URL (except localhost), sanitize TOML values, and default sync_mode=metrics_only
  • Config parsing: Added _parse_toml_cloud() helper to extract [cloud] section from ~/.gradata/config.toml
  • Extended PipelineResult: New public counters skills_updated and self_observation_candidates track pipeline progress
  • Knowledge graph assembly: New build_knowledge_graph() function reads lessons and optionally enriches with clusters and cross-domain candidates
  • Test coverage: 10 new tests for Phase 1.5 self-observation, skill auto-update thresholds, and knowledge graph behavior; full test suite passing (2841 tests)

Walkthrough

The pull request adds cloud telemetry integration at session end, introduces a knowledge graph API, implements self-observation lesson candidates from review violations, enhances skill file update logic with confidence delta checks, and strengthens config file security through validation and sanitization of API credentials.

Changes

Cohort / File(s) Summary
Cloud Telemetry and Session Management
src/gradata/_core.py
Added _cloud_sync_session() helper to resolve cloud credentials from environment and TOML config, compute telemetry metrics from session corrections, and perform best-effort cloud metrics/event sync via CloudClient.sync_metrics(). Added _parse_toml_cloud() for minimal TOML [cloud] section parsing. Integrated non-blocking cloud sync into brain_end_session() after session emission.
Knowledge Graph and Brain API
src/gradata/brain.py
Added public Brain.knowledge_graph() method that returns a structured graph dictionary by importing and calling build_knowledge_graph() from rule pipeline, with graceful fallback to empty graph on import or execution failure.
Config Security and Validation
src/gradata/cli.py
Enhanced cmd_login with _sanitize_toml_value() to prevent TOML injection, _check_config_permissions() to warn on overly permissive Unix permissions, stricter validation requiring https:// API URLs (except localhost and 127.0.0.1), and automatic permission tightening to 0o700 (directories) and 0o600 (files).
Rule Pipeline Enhancements
src/gradata/enhancements/rule_pipeline.py
Added Phase 1.5 to query recent SELF_REVIEW_VIOLATION events and convert them into pending lesson candidates with agent_type="self_observation". Enhanced _generate_skill_file() to skip rewrites when confidence delta < 0.05 and include updated_at timestamps. Extended PipelineResult with skills_updated and self_observation_candidates counters. Added build_knowledge_graph() function to parse lessons and enrich graph with clusters/contradictions/cross-domain candidates.
Rule Pipeline Tests
tests/test_rule_pipeline.py
Expanded test coverage to directly test _generate_skill_file(), build_knowledge_graph(), self-observation candidate creation, skill confidence delta behavior, and knowledge graph node assembly with optional clustering support.

Sequence Diagrams

sequenceDiagram
    participant Brain
    participant CloudSync as _cloud_sync_session()
    participant CredResolver as Credential Resolver
    participant CloudClient
    participant CloudAPI as Cloud API

    Brain->>CloudSync: Call with session_corrections, all_lessons
    CloudSync->>CredResolver: Resolve GRADATA_API_KEY + ~/.gradata/config.toml
    CredResolver-->>CloudSync: API key (if present)
    CloudSync->>CloudSync: Compute telemetry payload<br/>(rewrite rate, edit distance,<br/>correction density, rule metrics)
    CloudSync->>CloudClient: sync_metrics(telemetry)
    CloudClient->>CloudAPI: POST metrics
    CloudAPI-->>CloudClient: 200 OK
    CloudClient-->>CloudSync: Success
    CloudSync->>CloudSync: Log errors as debug<br/>(never raise/block)
    CloudSync-->>Brain: Return (no exceptions)
    Brain->>Brain: Return sweep result
Loading
sequenceDiagram
    participant Session
    participant DB as Database
    participant Pipeline as Rule Pipeline
    participant Lessons as Lessons Store
    participant Result as PipelineResult

    Session->>DB: Query SELF_REVIEW_VIOLATION<br/>events (current session)
    DB-->>Session: Recent violations
    Session->>Pipeline: Phase 1.5: Convert violations<br/>to Lesson candidates<br/>(state=INSTINCT, confidence=0.40)
    Pipeline->>Lessons: Deduplicate candidates<br/>against existing all_lessons
    Pipeline->>Pipeline: Append new candidates<br/>to all_lessons
    Pipeline->>Result: Increment self_observation_candidates
    Pipeline-->>Session: Updated lessons + stats
    Note over Pipeline,Result: On error: append to<br/>result.errors (no raise)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • PR #42: Directly extends build_knowledge_graph() and rule-pipeline enhancements for knowledge graph construction.
  • PR #86: Modifies the same end-of-session control flow in brain_end_session() by adding post-session operations (health checks vs. telemetry sync).
  • PR #19: Also modifies brain_end_session() implementation in the same critical session lifecycle function.

Suggested labels

feature, security

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly summarizes the three main features being added: self-observation pipeline, skill auto-update threshold logic, and knowledge graph API.
Description check ✅ Passed The description is directly related to the changeset, covering self-observation pipeline, skill auto-update, knowledge graph API, code quality fixes, and test verification.
Docstring Coverage ✅ Passed Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/behavioral-engine

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_core.py`:
- Around line 1000-1006: The _cloud_sync_session() function currently hardcodes
config_path = Path.home() / ".gradata" / "config.toml" which ignores
GRADATA_CONFIG/--config used by cmd_login(); replace that hardcoded lookup with
the shared config-path resolver used by cmd_login() (call the existing resolver
function used by cmd_login(), e.g., get_config_path() or
resolve_gradata_config()), and then read that resolved path to parse TOML and
populate api_key/api_url/brain_id_from_config so end-of-session sync respects
custom config locations; update references in _cloud_sync_session() (and any
helper code in the same module) to use the resolver instead of the hardcoded
Path.home(...) value.

In `@src/gradata/cli.py`:
- Around line 513-516: The _sanitize_toml_value function is currently mutating
valid credentials by stripping TOML-significant chars; instead of removing
characters like [, ], ", and \, change _sanitize_toml_value to produce a
TOML-safe encoded string (escape quotes/backslashes and preserve content) or
delegate to a TOML encoder (e.g., use toml.dumps / tomlkit or a string-quoting
helper) so credentials like "sk-proj-[abc]" are preserved exactly when written;
update the function to only normalize newlines (or escape them) and to return a
properly escaped/quoted TOML string rather than deleting characters.

In `@src/gradata/enhancements/rule_pipeline.py`:
- Around line 494-500: The node id generation in the graph["nodes"].append uses
a brittle f"{lesson.category}:{lesson.description[:40]}", which can collide;
replace it with a stable full identifier (e.g. call the existing helper
_make_rule_id(lesson) if available, or compute a reproducible hash of
lesson.category + lesson.description, or use a persisted lesson.id) and ensure
the same id scheme is used whenever node ids are produced (inspect other uses in
rule_pipeline.py such as edge creation). Update the id field to use that stable
identifier (and remove the 40-char truncation) so graph consumers keyed by "id"
remain correct.
- Around line 281-285: The SELECT in the rule pipeline uses the wrong column
name; change the query executed by conn.execute that currently selects "data" to
select the SDK payload column "data_json" (same convention used in
src/gradata/_core.py), then parse/deserialize that JSON payload as the code
expects before building result.errors; update the SQL in the block that fetches
rows for session (the conn.execute(...) call using current_session and rows
variable) so it reads "SELECT data_json FROM events WHERE type =
'SELF_REVIEW_VIOLATION' AND session = ? ORDER BY id DESC LIMIT 20" and handle
the JSON content accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c5c3166b-d405-4596-b0a9-44eef6aea027

📥 Commits

Reviewing files that changed from the base of the PR and between e09d1e9 and 482c119.

📒 Files selected for processing (5)
  • src/gradata/_core.py
  • src/gradata/brain.py
  • src/gradata/cli.py
  • src/gradata/enhancements/rule_pipeline.py
  • tests/test_rule_pipeline.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/cli.py
  • src/gradata/enhancements/rule_pipeline.py
  • src/gradata/_core.py
  • src/gradata/brain.py
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_rule_pipeline.py
🪛 GitHub Actions: SDK CI
src/gradata/cli.py

[error] 519-521: ruff check: UP037 Remove quotes from type annotation


[error] 537-542: ruff check: I001 Import block is un-sorted or un-formatted. Organize imports


[error] 547-549: ruff check: SIM102 Use a single if statement instead of nested if statements (combine with and)


[error] 617-620: ruff check: SIM105 Use contextlib.suppress(OSError, AttributeError) instead of try-except-pass


[error] 637-640: ruff check: SIM105 Use contextlib.suppress(OSError, AttributeError) instead of try-except-pass


[error] 646-646: ruff check: F541 f-string without any placeholders. Remove extraneous f prefix

src/gradata/enhancements/rule_pipeline.py

[error] 38-41: ruff check: UP037 Remove quotes from type annotation


[error] 122-122: ruff check: UP037 Remove quotes from type annotation


[error] 305-306: ruff check: I001 Import block is un-sorted or un-formatted. Organize imports


[error] 328-330: ruff check: SIM102 Use a single if statement instead of nested if statements (combine with and)


[error] 335-339: ruff check: I001 Import block is un-sorted or un-formatted. Organize imports


[error] 390-390: ruff check: I001 Import block is un-sorted or un-formatted. Organize imports


[error] 533-533: ruff check: I001 Import block is un-sorted or un-formatted. Organize imports

src/gradata/_core.py

[error] 990-993: ruff check: I001 Import block is un-sorted or un-formatted. Organize imports

src/gradata/brain.py

[error] 1-1: Ruff check failed: Found 67 errors (58 fixable with --fix). Process completed with exit code 1.

Comment thread src/gradata/_core.py
Comment on lines +1000 to +1006
config_path = Path.home() / ".gradata" / "config.toml"
if config_path.is_file():
try:
_cfg = _parse_toml_cloud(config_path)
api_key = api_key or _cfg.get("api_key", "")
api_url = _cfg.get("api_url", "")
brain_id_from_config = _cfg.get("brain_id", "")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the shared config-path resolver here.

cmd_login() can persist credentials outside ~/.gradata/config.toml via GRADATA_CONFIG / --config, but _cloud_sync_session() only reads the hardcoded default path. That makes login succeed while end-of-session sync silently never sees the saved credentials for those users.

Suggested fix
-        config_path = Path.home() / ".gradata" / "config.toml"
+        config_env = os.environ.get("GRADATA_CONFIG", "")
+        config_path = Path(config_env) if config_env else Path.home() / ".gradata" / "config.toml"

As per coding guidelines, src/gradata/**/*.py: "no hardcoded paths."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 1000 - 1006, The _cloud_sync_session()
function currently hardcodes config_path = Path.home() / ".gradata" /
"config.toml" which ignores GRADATA_CONFIG/--config used by cmd_login(); replace
that hardcoded lookup with the shared config-path resolver used by cmd_login()
(call the existing resolver function used by cmd_login(), e.g.,
get_config_path() or resolve_gradata_config()), and then read that resolved path
to parse TOML and populate api_key/api_url/brain_id_from_config so
end-of-session sync respects custom config locations; update references in
_cloud_sync_session() (and any helper code in the same module) to use the
resolver instead of the hardcoded Path.home(...) value.

Comment thread src/gradata/cli.py
Comment on lines +513 to +516
def _sanitize_toml_value(val: str) -> str:
"""Finding 12: strip characters that could inject TOML structure."""
# Remove newlines, brackets, and unbalanced quotes to prevent injection
return val.replace("\n", "").replace("\r", "").replace("[", "").replace("]", "").replace('"', "").replace("\\", "").strip()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't mutate credentials while "sanitizing" TOML.

Deleting [, ], \, and " changes the API key / endpoint bytes before they are persisted. A valid value like sk-proj-[abc] is written back as a different credential, so later cloud auth fails. Escape TOML syntax instead of stripping user/server-supplied characters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/cli.py` around lines 513 - 516, The _sanitize_toml_value function
is currently mutating valid credentials by stripping TOML-significant chars;
instead of removing characters like [, ], ", and \, change _sanitize_toml_value
to produce a TOML-safe encoded string (escape quotes/backslashes and preserve
content) or delegate to a TOML encoder (e.g., use toml.dumps / tomlkit or a
string-quoting helper) so credentials like "sk-proj-[abc]" are preserved exactly
when written; update the function to only normalize newlines (or escape them)
and to return a properly escaped/quoted TOML string rather than deleting
characters.

Comment on lines +281 to +285
rows = conn.execute(
"SELECT data FROM events WHERE type = 'SELF_REVIEW_VIOLATION' "
"AND session = ? ORDER BY id DESC LIMIT 20",
(current_session,),
).fetchall()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Query the real events payload column.

The SDK's events table stores payload JSON in data_json, not data (see the existing queries in src/gradata/_core.py). Against a real DB this SELECT fails, the broad except turns it into result.errors, and Phase 1.5 never materializes any self-observation lessons.

Suggested fix
-                "SELECT data FROM events WHERE type = 'SELF_REVIEW_VIOLATION' "
+                "SELECT data_json FROM events WHERE type = 'SELF_REVIEW_VIOLATION' "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rows = conn.execute(
"SELECT data FROM events WHERE type = 'SELF_REVIEW_VIOLATION' "
"AND session = ? ORDER BY id DESC LIMIT 20",
(current_session,),
).fetchall()
rows = conn.execute(
"SELECT data_json FROM events WHERE type = 'SELF_REVIEW_VIOLATION' "
"AND session = ? ORDER BY id DESC LIMIT 20",
(current_session,),
).fetchall()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/rule_pipeline.py` around lines 281 - 285, The SELECT
in the rule pipeline uses the wrong column name; change the query executed by
conn.execute that currently selects "data" to select the SDK payload column
"data_json" (same convention used in src/gradata/_core.py), then
parse/deserialize that JSON payload as the code expects before building
result.errors; update the SQL in the block that fetches rows for session (the
conn.execute(...) call using current_session and rows variable) so it reads
"SELECT data_json FROM events WHERE type = 'SELF_REVIEW_VIOLATION' AND session =
? ORDER BY id DESC LIMIT 20" and handle the JSON content accordingly.

Comment on lines +494 to +500
graph["nodes"].append({
"id": f"{lesson.category}:{lesson.description[:40]}",
"description": lesson.description,
"category": lesson.category,
"confidence": lesson.confidence,
"state": lesson.state.name,
"fire_count": getattr(lesson, "fire_count", 0),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make graph node IDs collision-safe.

f"{lesson.category}:{lesson.description[:40]}" is not unique. Two rules with the same category and 40-character prefix collapse onto the same node ID, which will corrupt any graph consumer that keys by id. Use a stable full identifier instead (_make_rule_id, a hash of the full description, or a persisted lesson ID).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/rule_pipeline.py` around lines 494 - 500, The node
id generation in the graph["nodes"].append uses a brittle
f"{lesson.category}:{lesson.description[:40]}", which can collide; replace it
with a stable full identifier (e.g. call the existing helper
_make_rule_id(lesson) if available, or compute a reproducible hash of
lesson.category + lesson.description, or use a persisted lesson.id) and ensure
the same id scheme is used whenever node ids are produced (inspect other uses in
rule_pipeline.py such as edge creation). Update the id field to use that stable
identifier (and remove the 40-char truncation) so graph consumers keyed by "id"
remain correct.

@Gradata Gradata merged commit 72be40f into main Apr 16, 2026
10 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant