feat(retrieval): BM25 어휘 검색 갈래 추가 — factual AC +0.98 (두 네거티브가 가리킨 처방)#71
Merged
Conversation
chunk-rerank 네거티브(#70) 진단 처방: graph traversal/entity-MENTIONS 병목을 어휘 검색으로 우회. Neo4j fulltext(chunk_fulltext) → top-k 청크 → 답변. - retrieval/bm25_retriever.py: queryNodes(Lucene/BM25) + Lucene escape + graceful - retrieval/prompts/bm25_answer_v1.md: 청크 기반 답변(수치 원문 보존, 정직한 N/A) 근거: BEIR(Thakur 2021) BM25 robust OOD baseline; 자기 80 QA 에서 BM25 baseline factual/numerical 압승. 한국어는 cjk analyzer 필수 (인덱스 DDL은 docstring 참조). 아직 router 미배선 — fulltext 인덱스 생성 + sanity check 통과 후 배선 예정.
factual/numerical 단일사실을 local → bm25 로 재라우팅. 관계/인과는 키워드 라우터가 local로 유지, top-N/집계 t2c, 글로벌 community 유지 → 깨끗한 델타. router.py: - Route Literal에 "bm25" 추가, bm25_retrieve import - DEFAULT_ROUTE: local → bm25, ROUTER_PROMPT_PATH: v2 → v3 - RoutedResult.bm25_result 필드, _classify_by_llm 검증에 bm25 - route_and_answer에 bm25 dispatch router_v3.md: bm25 갈래 추가, factual 예시 재라우팅, 분류기준 4-way 게이트 통과: cjk fulltext sanity check에서 '두산밥캣 목표주가' → 답 청크 c0009 (목표주가 80,000원) rank 1 (score 11.4). 측정 대기 (TRIAL EXPIRED — 곧 실행).
…lient.read() kwarg collision Neo4jClient.read(self, query, **params) treats the first positional arg as the Cypher string named `query`. Passing a Cypher parameter also named `query=` via **params collided -> "got multiple values for argument 'query'". The except masked it as a (misleading) "fulltext index missing" warning. Renaming the Lucene search-string param to $search_text removes the collision; index_name and top_k were never affected.
…substring _ensure_limit used `" LIMIT " in query` which misses a LIMIT clause that starts on a new line (e.g. `ORDER BY score DESC\nLIMIT $k`), so it appended a second LIMIT -> `LIMIT $k LIMIT 100` -> Cypher SyntaxError 42I63. The BM25 fulltext query hit exactly this. Word-boundary regex catches LIMIT regardless of preceding whitespace; queries with no LIMIT still get the auto-bound, so no regression for existing callers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
배경 — 두 네거티브가 가리킨 처방
chunk-rerank(#70)와 답변 프롬프트 v2(#68, 리버트 #69) 두 네거티브가 일관되게 가리킨 진단: 병목은 청크 랭킹도 프롬프트도 아니라 후보 풀이
엔티티 → MENTIONS → 청크로 그래프에 묶여 있다는 것 + OCR로 깨진 청크 내용. 처방 = 그래프 구조를 우회하는 BM25 어휘 검색(Neo4j fulltext, cjk analyzer)을 라우터에 한 갈래로 추가.변경
retrieval/bm25_retriever.py(신규) —db.index.fulltext.queryNodes기반 top-k 청크 → 답변. entity/MENTIONS traversal 우회. read-only + Lucene escape + graceful fallback +@track.retrieval/prompts/bm25_answer_v1.md(신규) — 수치·고유명사 원문 보존, 깔끔한 N/A.retrieval/router.py— bm25 갈래 dispatch +RoutedResult.bm25_result+ DEFAULT_ROUTElocal→bm25(router_v3).retrieval/prompts/router_v3.md(신규).bm25_retriever.py— Cypher 파라미터$query→$search_text(Neo4jClient.read 첫 위치인자query와 kwarg 충돌 해소). 커밋70b3063.kg/neo4j_client.py—_ensure_limit가 줄바꿈 앞 LIMIT 못 잡아 더블 LIMIT 만들던 버그를\bLIMIT\b정규식으로 수정. 커밋d009910.측정 —
gpt52_p1clean→gpt52_p1clean_bm25(80 QA, gpt-5.2 / claude-haiku judge)VectorRAG / factual — 승리
BM25 baseline(AC ~4.0)과의 격차 1.4 → 0.42로 좁힘.
GraphRAG — 품질 소폭↑, 라우팅 정확도↓(지표 함정)
per-QA 분석: bm25로 라우팅된 graph 9문항 평균 AC 3.22 > local 24문항 2.08. routing accuracy 하락분의 38%(13개 중 5개)는 bm25가 정답을 맞혀서 받은 페널티 —
expected_route라벨이 "graph 질문=graph retrieval" 가정을 박아둔 결과. 답 품질 손실 없음.머지 후 후속 (별도 이슈)
expected_route라벨 재정의 (품질 기준).faithfulness는 지표 결함(gold-overlap 측정)으로 비교에서 제외.