[codex] Add rollout-backed thread content search#23519
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7373d67e24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| SortDirection::Asc => StoreSortDirection::Asc, | ||
| SortDirection::Desc => StoreSortDirection::Desc, | ||
| }, | ||
| allowed_sources: compute_source_filters(source_kinds).0, |
There was a problem hiding this comment.
Apply post-filtered source kinds to search results
When sourceKinds contains kinds that compute_source_filters cannot express as rollout SessionSources (for example exec, appServer, or any subagent variant), that helper intentionally returns an empty allowed_sources plus a post-filter, and thread/list applies source_kind_matches afterwards. thread/search only passes .0 here and never applies the post-filter to the returned rows, so those requests search with an empty source filter, which the store treats as “all sources”; a thread/search for sourceKinds: ["exec"] can therefore return CLI/VS Code matches.
Useful? React with 👍 / 👎.
owenlin0
left a comment
There was a problem hiding this comment.
Looks like the current impl does this:
rg --json over the entire sessions/archive root
-> collect all stdout
-> parse every matching line event
-> parse each matched JSONL line into a rollout item
-> keep first valid snippet per file path
-> separately page thread metadata
-> intersect metadata rows with the snippet map
for a common search term it could be a lot of data to pull into memory
Here's codex's suggestion:
One performance concern with this search path is that
rg --jsonemits every matching JSONL line and we collect all stdout before processing it. For broad terms this can get very large even though the API only returns one result per thread. In a local benchmark against~/.codex/sessions(2,248 JSONL files / 1.5 GB), a common-ish query produced ~636 MB ofrg --jsonstdout and took 2.3s.I think this can be simpler and more aligned with the API contract by using
rg -las the first pass:
- Run
rg -l --fixed-strings --no-ignore --glob '*.jsonl' -- <term> <sessions-root>to get matching rollout paths.- Put those paths in a set.
- Page/list threads in the requested API sort order via the existing metadata path.
- For rows whose rollout path is in the match set, open that rollout JSONL and parse until the first valid user/assistant
ThreadSearchPreview::ContentMatchsnippet.- If a path matched only internal/tool/metadata content and does not produce a valid semantic snippet, skip it and continue through ordered metadata rows until
limit + 1semantic results are found or the metadata listing is exhausted.- Stop after
limit + 1semantic results sonextCursorcan be computed.This keeps SQLite/listing as the source of truth for API ordering and uses ripgrep only to answer “which rollout files contain the raw term?”. It also preserves the current semantic filter: a raw ripgrep hit is only a candidate; it becomes a search result only after rollout parsing confirms the match is in user/assistant-visible content.
This should avoid huge output for common terms because
rg -lprints each path once instead of every matching line. In the same benchmark, globalrg -lplus metadata ordering took ~63ms for the common-ish query, ~149ms for a rare query, and ~106ms for a no-match query.
thoughts? I had codex benchmark my own sessions directory with this approach
Results:
Common-ish query:
current rg --json: ~2.321s, ~636 MB stdout
proposed rg -l: ~0.063s, ~178 KB stdout
Rare query:
current rg --json: ~0.163s, ~252 KB stdout
proposed rg -l: ~0.149s, ~110 B stdout
No-match query:
current rg --json: ~0.099s, ~257 B stdout
proposed rg -l: ~0.106s, 0 B stdout
| } | ||
|
|
||
| #[test] | ||
| fn thread_search_is_marked_experimental() { |
There was a problem hiding this comment.
nit: codex likes to add these tests but not super high value IMO - can we remove?
There was a problem hiding this comment.
we probably should update AGENTS.md to not add trivial tests like these and remove the ones below, but no worries about doing it in this PR
owenlin0
left a comment
There was a problem hiding this comment.
also, can we switch to using InstallContext::rg_command() instead of invoking rg from the PATH directly? we do bundle ripgrep in our standalone codex installations
There was a problem hiding this comment.
would love @wiltzius-openai to take a look too since this touches ThreadStore.
@wiltzius-openai for your context, @fc-oai would like to add an experimental thread/search API so the Codex App can expose a "thread search" feature. can you validate if this is something that we should eventually support in ThreadStore? would be nice to have a cloud-based solution for this as well eventually
| let Some(local_store) = self | ||
| .thread_store | ||
| .as_any() | ||
| .downcast_ref::<LocalThreadStore>() |
There was a problem hiding this comment.
btw, we have an existing pattern for ThreadStore and throwing a ThreadStoreError::Unsupported error if unsupported on local or remote as needed. can we follow that?
thread/searchshould be modeled as aThreadStoreoperation instead of downcasting toLocalThreadStorein the app-server processor.Suggested shape:
- Add a
search_threads(...) -> ThreadStoreResult<ThreadSearchPage>method to theThreadStoretrait.- Give the trait default implementation:
Err(ThreadStoreError::Unsupported { operation: "thread/search" })- Implement it for
LocalThreadStoreby calling the existing local rollout search path.- In app-server, call
self.thread_store.search_threads(...)instead ofas_any().downcast_ref::<LocalThreadStore>().- Let the existing
thread_store_list_error/unsupported_thread_store_operationmapping convertThreadStoreError::Unsupportedinto the JSON-RPC error.That keeps the API boundary honest: non-local stores report “unsupported” instead of returning an empty page, while future remote stores can add their own implementation without changing app-server dispatch.
3b46e34 to
58156cd
Compare
58156cd to
34c8b40
Compare
owenlin0
left a comment
There was a problem hiding this comment.
One small note on search semantics.
wiltzius-openai
left a comment
There was a problem hiding this comment.
General direction seems OK but I think there are some implementation improvements possible
For remote thread store I think it would be possible to support the search contract (it'll just require a lot of infra)
| #[serde(rename_all = "camelCase")] | ||
| #[ts(export_to = "v2/")] | ||
| pub struct ThreadSearchResult { | ||
| pub thread: Thread, |
There was a problem hiding this comment.
just to confirm this will be a Thread object that does not contain Turns right, it isn't the full history? I think full history would be too expensive.
There was a problem hiding this comment.
That is correct. It's not full history. turns will be []
| .map_err(thread_store_list_error)?; | ||
|
|
||
| for result in page.items { | ||
| let source = with_thread_spawn_agent_metadata( |
There was a problem hiding this comment.
Some cruft in this search implementation. I think what's happening is:
- it searches in thread store using a search term and some some additional parameters
- it applies a post-filter based on source kind? Is there a reason this filter can't be applied internally?
- then it has to re-loop the search since it no longer necessarily has a full page of results?
| - `cwd` — restrict results to threads whose session cwd exactly matches this path, or one of these paths when an array is provided. Relative paths are resolved against the app-server process cwd before matching. | ||
| - `useStateDbOnly` — when `true`, return from the state DB without scanning JSONL rollouts to repair metadata. Omit or pass `false` to preserve the default scan-and-repair behavior. | ||
| - `searchTerm` — restrict results to threads whose extracted title contains this substring (case-sensitive). | ||
| - `searchTerm` — on `thread/list`, restrict results using the existing stored-thread search behavior. Use experimental `thread/search` when callers need persisted user/assistant content search plus matching snippet context. |
There was a problem hiding this comment.
"using the existing stored-thread search behavior." isn't super helpful, the old text is more precise -- I assume that's what was preserved (case-sensitive title search)
There was a problem hiding this comment.
@fc-oai can we revert this change to the README?
| sort_direction, | ||
| ) | ||
| .await?; | ||
| for item in page.items { |
There was a problem hiding this comment.
It seems a little inefficient to ripgrep over the entire rollout history then enumerate all threads filtering to only those that match the ripgrep results. You already have the paths from the ripgrep search, so why not use that directly in the snippet generation (which takes a path)?
There was a problem hiding this comment.
that approach, while seemingly a bit inefficient in theory, is actually faster when a search term has a lot of matches
The tradeoff is broad terms. The old SQLite-intersection approach is good when a term appears in lots of files, because SQLite can walk newest-first and stop once it has a page. The revised approach has to read metadata for every rg hit before it can sort and page, so common terms like "thread" regress badly.
| let mut items = matching_items | ||
| .into_iter() | ||
| .filter_map(|item| { | ||
| stored_thread_from_rollout_item( |
There was a problem hiding this comment.
this is returning a stored thread but with only a single rollout item as its history? A little funky but I guess that works (its just a "different" shape of thread than anywhere else I think)
There was a problem hiding this comment.
curious if there's a reason to include both the snippet and the item?
| #[derive(Clone, Debug, Serialize, Deserialize)] | ||
| pub struct ThreadSearchPage { | ||
| /// Search results returned for this page. | ||
| pub items: Vec<StoredThreadSearchResult>, |
There was a problem hiding this comment.
nit: slightly confusing to call this "items" since they're actually threads + snippets and the threads themselves contain "items" (if I'm reading correctly, a list of a single item?)
There was a problem hiding this comment.
agreed but also this follows the current pattern for all of the *Page structs in the threadstore contract. wonder if we should call it data instead of items?
267cc10 to
41f247d
Compare
Summary
thread/searchfor local rollout-backed thread search usingrgover JSONL rolloutsStoredThreador ordinaryThreadresponsesthread/listseparate from full-content search and document the new app-server surfaceTesting
cargo test -p codex-app-server-protocolcargo test -p codex-app-server thread_search_returns_content_and_title_matches -- --nocapture