Phase A reliability + ADR 0008/0009 personal-PKI 路線收斂#2
Open
MakiDevelop wants to merge 16 commits intomainfrom
Open
Phase A reliability + ADR 0008/0009 personal-PKI 路線收斂#2MakiDevelop wants to merge 16 commits intomainfrom
MakiDevelop wants to merge 16 commits intomainfrom
Conversation
證據: pytest -q;ruff check src tests Constraint: 保留 pending_only semantics;不改 auth/principal、RRF、health endpoint 拆分 Not-tested: macOS Docker Desktop VirtioFS + SQLite 3.53.0 bind mount 實機重現未在此機驗證
證據: pytest -q;ruff check src tests Constraint: 不引入新 dependency;只在 sqlite3.OperationalError transient case 重試一次 Not-tested: 真實 macOS Docker Desktop VirtioFS disk I/O error 現場未在此機直接重放
證據: pytest -q Constraint: 只調整 runtime stage loader/symlink;不改 build dependency graph Not-tested: docker build/run 驗證因 sandbox 無法連線 /Users/maki/.docker/run/docker.sock
證據: pytest -q;ruff check src tests;pytest -q tests/test_smoke.py tests/test_vec0.py Constraint: 不引入新 dependency;checkpoint interval 由 MH_WAL_CHECKPOINT_INTERVAL_S 控制且預設 300 秒 Not-tested: Docker container 內實際 background task + bind mount 場景未在此機用 docker API 重放
PR 1: 4 commits (paginated reindex / aiosqlite recycle / Dockerfile symlink / WAL checkpoint) pytest 45 passed 1 skip + ruff 全綠. Latent bugs found by Codex: - SqliteStore.open() startup 階段未被 recycle+retry 包覆 (follow-up) - vector DB connection 無 explicit locking (已在 patch D 順手修) Phase B Dissent verdict: APPROVE WITH MODIFICATIONS 重點: 先封 E26 admin gate (B.2), 再做 HMAC migration (B.1). 否則 只是把洞從「無 gate」換成「所有 valid key 皆 admin」. Implementation order 修正寫進 Proposal v2 等下次 session ratify.
Codex Phase A.5 patch C (3e7d2ce) 把 ln -sf 放 apt-get install curl 之前, dpkg post-install trigger 會 reset libsqlite3.so.0 link 回系統 path. 實測: build 完 image, 子程序不繼承 LD_LIBRARY_PATH 時 SQLite=3.40.1 (Debian 系統舊版), 而非預期 3.53.0. Patch C 完全沒生效. 修法: 把 ln block 移到 apt-get install 之後 (runtime stage 最後一個 建構步驟), dpkg trigger 完才覆蓋 libsqlite3 symlink. 驗證: docker run --rm sh -c 'unset LD_LIBRARY_PATH && python -c "import sqlite3; print(sqlite3.sqlite_version)"' post-fix: 3.53.0 ✓ post-fix: /lib/aarch64-linux-gnu/libsqlite3.so.0 -> /opt/sqlite/lib/libsqlite3.so.3.53.0 ✓ Constraint: 不動 ln 命令本身, 只調整 RUN block 順序
…ss 拆分 Codex 寫的 3 sub-patches 因 sandbox .git/index.lock permission 沒 commit 成功, 由 Architect 合併 stage + commit (council report 內仍區分 3 patches). Sub-patches: E.0 — _normalize_bm25 邏輯反轉 latent bug fix (Gemini E39) - src/memory_hall/storage/sqlite_store.py: 1/(1+abs(s)) → -bm25/(1.0-bm25) - BM25 score 越好 normalize output 越高 (monotonic with quality) - RRF 用 rank 逃過此 bug, weighted linear (E) 強相依此 fix - tests/test_fts_tokenization.py: 5 BM25 input rank order assertion E — Hybrid search 改 weighted linear combination (E37 Max) - src/memory_hall/server/app.py search_entries: α·s_lex + (1-α)·s_sem - α default 0.3, MH_HYBRID_ALPHA env override - 邊界 case: semantic_status fail → pure lex / lex 空 → pure sem - 保留 RRF legacy (MH_HYBRID_MODE=rrf) 平滑 migration - score_breakdown 加 hybrid_mode + alpha 欄位讓 client 可見 - tests/test_hybrid_search.py 新增: 3 case (lexical target / semantic paraphrase / conflict resolution) F — Liveness /healthz ↔ Readiness /ready 拆分 (E36 Max) - src/memory_hall/server/routes/health.py 新增 /v1/healthz endpoint (process alive) - /v1/ready 重用 _health_cache (sub-check storage/vector/embedder) - /v1/health alias 到 /v1/ready (6 caller backward compat) - Dockerfile HEALTHCHECK 改 /v1/healthz, 避免跟 reindex worker 撞 embedder - src/memory_hall/server/app.py auth middleware 放行 3 health endpoints 證據: E36 (Max liveness/readiness 業界慣例), E37 (Max RRF k=60 數學壓平), E39 (Gemini BM25 normalize 邏輯反轉) Pytest: 53 passed, 1 skipped, 4 warnings Constraint: 不動 auth / principal 邏輯 (Phase B 範圍) Constraint: 不引入新 dependency Not-tested: production deploy 驗證 (Phase A.5 deploy 23:49 撞 schema migration corruption, rolled back. PR 1+2 code 留 branch 等明天 debug startup migration issue 後再 deploy)
把設計哲學明文化,避免每個 reliability incident 都順手帶入「業界最佳實踐」 (k8s liveness/readiness 拆分、weighted linear hybrid 加 tuning knob、HMAC + principal registry)把 memhall 的複雜度往 production-grade memory platform 推。 四個北極星依優先序:聯想品質 > 穩定 > 快速 > 輕量。 明確不做清單 + sunset criteria,未來 PR 必須通過「personal PKI 體檢」五題。 Constraint: 單一使用者 / 單一部署 / ~10² entries / caller < 10 / 全部在 Maki tailnet 內 Rejected: 寫成 rules/ 而非 ADR | scope 限定 memhall,屬 ADR Directive: 任何引入新 config knob / schema 欄位 / auth 機制的 PR 必須引用本 ADR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
回應 ADR 0008「沒有 benchmark 證據就回退 RRF」的判準需求。 Scaffold 設計: - Synthetic mode:內建 25 entry corpus + 15 query 的合成測試(含同義詞群組 embedder 模擬 semantic similarity),可隨 repo 重跑當 regression baseline - Real-corpus mode:可指向 running memhall + jsonl query 檔案,未來 Maki 提 供真實 query 標註後可比對 Metrics:MRR / Recall@5 / nDCG@10。 執行成本:~5 秒,無外部依賴(只用既有 httpx / pytest fixtures)。 不引入新 dependency,符合 ADR 0008「輕量」要求。 Constraint: synthetic 結果只是 directional,最終決策需 real-corpus benchmark Directive: scripts/ 是新目錄,未來放小型實驗腳本;不放大型 ops 工具 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/bench_hybrid.py synthetic 結果(25 entries × 15 queries): mode MRR R@5 nDCG@10 rrf 0.967 0.928 0.924 ← 三項全勝 weighted_linear(α=0.1) 0.800 0.822 0.813 weighted_linear(α=0.3) 0.867 0.911 0.890 ← 原本的 default weighted_linear(α=0.5) 0.967 0.906 0.923 weighted_linear(α=0.7) 0.933 0.906 0.910 weighted_linear(α=0.9) 0.967 0.928 0.924 關鍵失敗 case: - 「撞牆」(CJK 短 query)— weighted_linear α≤0.1 直接 0 分 RRF 有 CJK 短 query lexical boost (×2.0),weighted_linear 沒有 - 「混合排序」— weighted_linear α≤0.3 全部 0 分 照 ADR 0008「沒有 benchmark 證據就回退 RRF」改 default。 保留 weighted_linear code path(MH_HYBRID_MODE=weighted_linear opt-in),未 來 Maki 用 real corpus benchmark 後若有更好 α 證據可重新評估。weighted_linear 測試改為明確 opt-in,conftest.app_factory 加 hybrid_mode kwarg。 Tests: 53 passed, 1 skipped Constraint: synthetic embedder 是同義詞群組 bag-of-words,與 bge-m3 行為不同 Rejected: 直接刪除 weighted_linear 整段 code | 保留 opt-in 給未來真實 benchmark Directive: 預設 = RRF,weighted_linear 必須 explicit opt-in + benchmark 佐證才換 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR 0009 implementation. /v1/admin/* 從共享 api_token 隔離出獨立 admin_token。 middleware 行為: - /v1/health* → 永遠 public(沿用 ADR 0007) - /v1/admin/* + admin_token 已設 → 要求 admin_token,api_token 被拒絕(401) - /v1/admin/* + admin_token 未設 → fallback 到 api_token(向後相容) - 其他 path → 既有 api_token 邏輯 - admin_token 不能用在非 admin path(least privilege 雙向) 明確不做(per ADR 0008 personal PKI 輕量立場): - HMAC + nonce + replay window - Principal registry / role mapping - Per-key rotation infra - 14 天並存期 / 7 連日零 bearer write 退場流程 - Tailscale ACL 寫進 repo(infra config,docs 推薦即可) 外部 sanity check(2026-04-28): - Codex Phase B Dissent D2 Option E 的最小子集 - SuperGrok DeepSearch:2025-2026 全球範圍無命中本情境的近期 incident, 獨立 admin bearer 是 community 推薦的 least-privilege 做法,verdict GO 統一 401(不用 403)避免 token validity oracle。 hmac.compare_digest constant-time compare(沿用 ADR 0007)。 Tests: full suite 59 passed(53 → 59,新增 6 個 admin gate case) Constraint: admin_token 與 api_token 必須不同值(操作者責任,code 不驗證) Rejected: 用 403 區分「valid api_token 用在 admin path」| token validity oracle Rejected: 14 天並存期 / 退場流程 | 沒有要 retire 的舊機制 Directive: admin_token 未設時必須 fallback 到 api_token,不得直接 401(向後相容) Not-tested: 真實 production deploy(mini Tailscale),測試環境只到 unit Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review 抓到兩個 silent fail-open: 1. [HIGH] admin_token 設了但 api_token 沒設 → 非 admin path fail-open 實測 POST /v1/memory/write 回 201(無 auth) 2. [MEDIUM] admin_token == api_token → 靜默抵消 two-tier 分離 實測同 token 通過 /v1/admin/audit 也通過 /v1/memory/write 修法(Codex 建議的最小路徑): - Settings 加 _validate_auth_tokens model_validator,config load 時 fail-fast - 拒絕「admin_token 設 + api_token 未設」 - 拒絕「admin_token == api_token」 - middleware 邏輯不動(保持簡單,invariant 由 config 層守) 更新 ADR 0009: - 移除原本「操作者責任,code 不強制驗證」的 hand-wave - 新增 fail-fast invariant 段落 + 為什麼 5 行不算違反 ADR 0008 - 新增 Round 1 review history 軌跡 Tests: 61 passed(59 → 61,新增 2 invariant test) Constraint: 必須在 config load 時就 fail,不能跑到 runtime 才發現 Rejected: 把 invariant 檢查加進 middleware | Codex 建議「不要把分支變複雜」,正確 Directive: empty string api_token + admin_token 設也會被擋(pydantic falsy check) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… nit Codex review round 2 抓到 LOW non-blocking finding:autouse fixture 只清 MH_API_TOKEN,沒清 MH_ADMIN_TOKEN。fail-fast invariant 加上後,shell 有 MH_ADMIN_TOKEN env leak 進測試會讓 Settings() 構造失敗(14 failures)。 Repro: MH_ADMIN_TOKEN=leaked-admin pytest tests/test_auth.py 修法:fixture 多清一個 env。 Verification: MH_ADMIN_TOKEN=leaked-admin pytest tests/test_auth.py → 16 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR 0008 體檢時把 Patch F (k8s-style /v1/healthz + /v1/ready 拆分) 標為 「凍結而非回退」。Maki 進一步要求徹底乾淨,所以這次補做 strict revert。 回退理由: - mini 用 restart: unless-stopped,health unhealthy 不會自動 restart - 單一 launchd container(不是 k8s)不需要 liveness/readiness 拆分 - 一個 endpoint 對個人 PKI 維運心智成本更低 - net -48 行(刪測試 + 簡化 routes + middleware allowlist 簡化) 行為變更: - /v1/health 回到「全 sub-check ok 回 200,degraded 回 503」(body 帶 完整 status / storage / vector_store / embedder / last_error) - 移除 /v1/healthz 與 /v1/ready - Dockerfile HEALTHCHECK 改回打 /v1/health - middleware public allowlist 從 3 個 path 縮回 1 個 未動: - runtime._refresh_health_cache、_health_cache_ttl_s 等 Phase A.5 PR1 改善(health sub-check error 不再吞 + 60s TTL cache)保留——這些 本來就是修真 bug,跟 Patch F 的 k8s convention 無關 Tests: 58 passed(61 → 58,刪 3 個 healthz/alias/dockerfile-uses-healthz) Constraint: production 已部署 0.2.1-pr1(含 Patch F),這個 revert 後需要重 build Rejected: 「保留 endpoints 但統一 always 200」混合方案 | Maki 要乾淨,不要混合 Directive: 未來引入新 health endpoint 必須先 dissent ADR 0008 第三條「明確不做」 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_client()` 從 Settings().api_token 讀取 MH_API_TOKEN,當設值為 truthy 字串時自動掛上 Authorization: Bearer <token>;unset 或空字串時不送 header(向下相容無 auth dev server)。 write / search / get / tail 四個 HTTP 命令全部走 _client(),無需逐 command 改動。 新增 tests/test_cli_auth.py 三案:token set / unset / empty-string,鎖定 truthy 語意。 修補 docs/agent-integration.md 文件 vs code 不一致 bug:CLI 過去無 auth 注入, documentation 卻聲稱 Bearer 規則同 Path B(Codex 在 sandbox session 撞到,提報)。 Council: E1 (codex-answer.md) E2 (gemini-answer.md) E3 (codex-answer-r2.md) E4 (gemini-answer-r2.md) Constraint: scope 鎖在 src/memory_hall/cli/main.py + tests/test_cli_auth.py,不動 docs(另一 commit) Rejected: 加 --token CLI flag | 與 server 端慣例 (env-only) 不一致 Directive: empty-string 走 falsy 而非 is not None — 防 "Authorization: Bearer " 畸形 header Co-Authored-By: Codex (codex-cli 0.125.0) <noreply@openai.com> Co-Authored-By: Gemini (gemini-cli 0.39.1) <noreply@google.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- 新增 AGENTS.md(repo 根目錄):cloned agent 的入口指標。informational tone, 不下 hard directive,避開 rules/agent-preflight-check.md 講的 prompt-hijack 風險。 - 新增 docs/agent-integration.md:三條 path 的 decision tree (Path A 嵌入式 Python / Path B HTTP+Bearer / Path C mh CLI), 附 status legend (✅ verified /⚠️ partial)、最後驗證日期、failure mode 對照表。 - README.md 在 "Three entry points" 表格下加 agent 導引連結。 - examples/shell/write_memory.sh 補 Authorization: Bearer header(這是 Codex sandbox session 踩的「missing bearer token」根因)。 - examples/codex_cli/README.md 補 uv sync install 步驟、UV_CACHE_DIR sandbox workaround、auth section 說明 CLI 自動讀 MH_API_TOKEN。 源由:Codex 在 sandboxed session 試寫 memhall 時依序踩到三個坑 (bearer auth required / mh 不在 PATH / sandbox curl 帶 auth header 不穩), 回報 root cause 是 memhall 對 sandboxed agent 的 onboarding 文件不完整。 Council: E1 (codex-answer.md) E2 (gemini-answer.md) E3 (codex-answer-r2.md) E4 (gemini-answer-r2.md) Constraint: 不動 code(前一 commit 已處理 CLI auth 注入) Directive: AGENTS.md 用 informational tone — 防 full-auto agent 把它當 hard directive 重寫 repo(geo-checker 2026-04-11 incident 教訓) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
從 2026-04-20 embed-queue incident 後的整段 reliability 工作收斂到一個 PR:Phase A(修真 bug)+ ADR 0008(明文化「personal PKI 輕量」立場)+ Phase B PR1(admin gate)+ 兩個 over-design 回退。
MH_ADMIN_TOKEN兩層 bearer + fail-fast invariant),明確不做 HMAC / principal registry / 14 天 sunset window/v1/healthz+/v1/ready拆分 strict revert,回到單一/v1/healthscripts/bench_hybrid.pysynthetic + real-corpus mode,未來調 retrieval 必須有數據七位一體軌跡
Production status
已部署
memory-hall:0.2.1到 mini Tailscale:9100,container healthy。Smoke:write / search (RRF) / admin audit (api_token fallback) 全綠。Test plan
pyproject.tomltestpaths=tests)/v1/health200 ok, write 201, search hybrid_mode=rrf, admin/audit fallback 200MH_ADMIN_TOKEN+ 重啟 + 驗 admin gate 鎖/v1/admin/*sync_status='failed'entries(手動 reindex)ADR 索引
Council artifacts
完整治理證據在
~/Documents/agent-council/2026-04-28-memhall-phase-b/:Notes
🤖 Generated with Claude Code