feat(retrieval): BM25 어휘 검색 갈래 추가 — factual AC +0.98 (두 네거티브가 가리킨 처방) by TaskerJang · Pull Request #71 · TaskerJang/doc-graph-agent

TaskerJang · 2026-05-30T14:34:00Z

배경 — 두 네거티브가 가리킨 처방

chunk-rerank(#70)와 답변 프롬프트 v2(#68, 리버트 #69) 두 네거티브가 일관되게 가리킨 진단: 병목은 청크 랭킹도 프롬프트도 아니라 후보 풀이 엔티티 → MENTIONS → 청크로 그래프에 묶여 있다는 것 + OCR로 깨진 청크 내용. 처방 = 그래프 구조를 우회하는 BM25 어휘 검색(Neo4j fulltext, cjk analyzer)을 라우터에 한 갈래로 추가.

변경

retrieval/bm25_retriever.py (신규) — db.index.fulltext.queryNodes 기반 top-k 청크 → 답변. entity/MENTIONS traversal 우회. read-only + Lucene escape + graceful fallback + @track.
retrieval/prompts/bm25_answer_v1.md (신규) — 수치·고유명사 원문 보존, 깔끔한 N/A.
retrieval/router.py — bm25 갈래 dispatch + RoutedResult.bm25_result + DEFAULT_ROUTE local → bm25 (router_v3).
retrieval/prompts/router_v3.md (신규).
fix bm25_retriever.py — Cypher 파라미터 $query → $search_text (Neo4jClient.read 첫 위치인자 query와 kwarg 충돌 해소). 커밋 70b3063.
fix kg/neo4j_client.py — _ensure_limit 가 줄바꿈 앞 LIMIT 못 잡아 더블 LIMIT 만들던 버그를 \bLIMIT\b 정규식으로 수정. 커밋 d009910.

측정 — `gpt52_p1clean` → `gpt52_p1clean_bm25` (80 QA, gpt-5.2 / claude-haiku judge)

VectorRAG / factual — 승리

지표	before	after	Δ
Answer Correctness	2.60	3.58	+0.98
수치 정확도	0.424	0.634	+0.210
Entity Coverage	0.408	0.609	+0.201
평균 응답	10.4s	6.4s	−4.0s

BM25 baseline(AC ~4.0)과의 격차 1.4 → 0.42로 좁힘.

GraphRAG — 품질 소폭↑, 라우팅 정확도↓(지표 함정)

지표	before	after
Answer Correctness	2.05	2.30
Routing Accuracy	85%	67.5%

per-QA 분석: bm25로 라우팅된 graph 9문항 평균 AC 3.22 > local 24문항 2.08. routing accuracy 하락분의 38%(13개 중 5개)는 bm25가 정답을 맞혀서 받은 페널티 — expected_route 라벨이 "graph 질문=graph retrieval" 가정을 박아둔 결과. 답 품질 손실 없음.

머지 후 후속 (별도 이슈)

local MENTIONS 병목 (multi-doc/intersection "확인 불가" ~11문항) — RRF 융합 또는 bm25 라우팅 확대.
filter_agg → community stub 오라우팅 (graph_029/030) — 라우터 키워드 픽스.
OCR 깨짐 (ROE 10.8%→70.8% 등) — P3.
eval expected_route 라벨 재정의 (품질 기준).

faithfulness는 지표 결함(gold-overlap 측정)으로 비교에서 제외.

chunk-rerank 네거티브(#70) 진단 처방: graph traversal/entity-MENTIONS 병목을 어휘 검색으로 우회. Neo4j fulltext(chunk_fulltext) → top-k 청크 → 답변. - retrieval/bm25_retriever.py: queryNodes(Lucene/BM25) + Lucene escape + graceful - retrieval/prompts/bm25_answer_v1.md: 청크 기반 답변(수치 원문 보존, 정직한 N/A) 근거: BEIR(Thakur 2021) BM25 robust OOD baseline; 자기 80 QA 에서 BM25 baseline factual/numerical 압승. 한국어는 cjk analyzer 필수 (인덱스 DDL은 docstring 참조). 아직 router 미배선 — fulltext 인덱스 생성 + sanity check 통과 후 배선 예정.

factual/numerical 단일사실을 local → bm25 로 재라우팅. 관계/인과는 키워드 라우터가 local로 유지, top-N/집계 t2c, 글로벌 community 유지 → 깨끗한 델타. router.py: - Route Literal에 "bm25" 추가, bm25_retrieve import - DEFAULT_ROUTE: local → bm25, ROUTER_PROMPT_PATH: v2 → v3 - RoutedResult.bm25_result 필드, _classify_by_llm 검증에 bm25 - route_and_answer에 bm25 dispatch router_v3.md: bm25 갈래 추가, factual 예시 재라우팅, 분류기준 4-way 게이트 통과: cjk fulltext sanity check에서 '두산밥캣 목표주가' → 답 청크 c0009 (목표주가 80,000원) rank 1 (score 11.4). 측정 대기 (TRIAL EXPIRED — 곧 실행).

…lient.read() kwarg collision Neo4jClient.read(self, query, **params) treats the first positional arg as the Cypher string named `query`. Passing a Cypher parameter also named `query=` via **params collided -> "got multiple values for argument 'query'". The except masked it as a (misleading) "fulltext index missing" warning. Renaming the Lucene search-string param to $search_text removes the collision; index_name and top_k were never affected.

…substring _ensure_limit used `" LIMIT " in query` which misses a LIMIT clause that starts on a new line (e.g. `ORDER BY score DESC\nLIMIT $k`), so it appended a second LIMIT -> `LIMIT $k LIMIT 100` -> Cypher SyntaxError 42I63. The BM25 fulltext query hit exactly this. Word-boundary regex catches LIMIT regardless of preceding whitespace; queries with no LIMIT still get the auto-bound, so no regression for existing callers.

TaskerJang added 4 commits May 30, 2026 22:08

TaskerJang merged commit 80eaebd into dev May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): BM25 어휘 검색 갈래 추가 — factual AC +0.98 (두 네거티브가 가리킨 처방)#71

feat(retrieval): BM25 어휘 검색 갈래 추가 — factual AC +0.98 (두 네거티브가 가리킨 처방)#71
TaskerJang merged 4 commits into
devfrom
feat/hybrid-bm25

TaskerJang commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TaskerJang commented May 30, 2026

배경 — 두 네거티브가 가리킨 처방

변경

측정 — gpt52_p1clean → gpt52_p1clean_bm25 (80 QA, gpt-5.2 / claude-haiku judge)

VectorRAG / factual — 승리

GraphRAG — 품질 소폭↑, 라우팅 정확도↓(지표 함정)

머지 후 후속 (별도 이슈)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

측정 — `gpt52_p1clean` → `gpt52_p1clean_bm25` (80 QA, gpt-5.2 / claude-haiku judge)