[codex] Add rollout-backed thread content search by fc-oai · Pull Request #23519 · openai/codex

fc-oai · 2026-05-19T18:34:38Z

Summary

add experimental thread/search for local rollout-backed thread search using rg over JSONL rollouts
return search-specific result rows with optional previews instead of storing preview data on StoredThread or ordinary Thread responses
keep thread/list separate from full-content search and document the new app-server surface

Testing

cargo test -p codex-app-server-protocol
cargo test -p codex-app-server thread_search_returns_content_and_title_matches -- --nocapture

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7373d67e24

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T22:10:57Z

+                    SortDirection::Asc => StoreSortDirection::Asc,
+                    SortDirection::Desc => StoreSortDirection::Desc,
+                },
+                allowed_sources: compute_source_filters(source_kinds).0,


Apply post-filtered source kinds to search results

When sourceKinds contains kinds that compute_source_filters cannot express as rollout SessionSources (for example exec, appServer, or any subagent variant), that helper intentionally returns an empty allowed_sources plus a post-filter, and thread/list applies source_kind_matches afterwards. thread/search only passes .0 here and never applies the post-filter to the returned rows, so those requests search with an empty source filter, which the store treats as “all sources”; a thread/search for sourceKinds: ["exec"] can therefore return CLI/VS Code matches.

Useful? React with 👍 / 👎.

owenlin0

Looks like the current impl does this:

rg --json over the entire sessions/archive root
  -> collect all stdout
  -> parse every matching line event
  -> parse each matched JSONL line into a rollout item
  -> keep first valid snippet per file path
  -> separately page thread metadata
  -> intersect metadata rows with the snippet map

for a common search term it could be a lot of data to pull into memory

Here's codex's suggestion:

One performance concern with this search path is that rg --json emits every matching JSONL line and we collect all stdout before processing it. For broad terms this can get very large even though the API only returns one result per thread. In a local benchmark against ~/.codex/sessions (2,248 JSONL files / 1.5 GB), a common-ish query produced ~636 MB of rg --json stdout and took 2.3s.

I think this can be simpler and more aligned with the API contract by using rg -l as the first pass:

Run rg -l --fixed-strings --no-ignore --glob '*.jsonl' -- <term> <sessions-root> to get matching rollout paths.

Put those paths in a set.

Page/list threads in the requested API sort order via the existing metadata path.

For rows whose rollout path is in the match set, open that rollout JSONL and parse until the first valid user/assistant ThreadSearchPreview::ContentMatch snippet.

If a path matched only internal/tool/metadata content and does not produce a valid semantic snippet, skip it and continue through ordered metadata rows until limit + 1 semantic results are found or the metadata listing is exhausted.

Stop after limit + 1 semantic results so nextCursor can be computed.

This keeps SQLite/listing as the source of truth for API ordering and uses ripgrep only to answer “which rollout files contain the raw term?”. It also preserves the current semantic filter: a raw ripgrep hit is only a candidate; it becomes a search result only after rollout parsing confirms the match is in user/assistant-visible content.

This should avoid huge output for common terms because rg -l prints each path once instead of every matching line. In the same benchmark, global rg -l plus metadata ordering took ~63ms for the common-ish query, ~149ms for a rare query, and ~106ms for a no-match query.

thoughts? I had codex benchmark my own sessions directory with this approach

Results:

Common-ish query:
current rg --json:      ~2.321s, ~636 MB stdout
proposed rg -l:         ~0.063s, ~178 KB stdout

Rare query:
current rg --json:      ~0.163s, ~252 KB stdout
proposed rg -l:         ~0.149s, ~110 B stdout

No-match query:
current rg --json:      ~0.099s, ~257 B stdout
proposed rg -l:         ~0.106s, 0 B stdout

owenlin0 · 2026-05-20T16:27:56Z

    }

+    #[test]
+    fn thread_search_is_marked_experimental() {


nit: codex likes to add these tests but not super high value IMO - can we remove?

we probably should update AGENTS.md to not add trivial tests like these and remove the ones below, but no worries about doing it in this PR

owenlin0

also, can we switch to using InstallContext::rg_command() instead of invoking rg from the PATH directly? we do bundle ripgrep in our standalone codex installations

owenlin0

would love @wiltzius-openai to take a look too since this touches ThreadStore.

@wiltzius-openai for your context, @fc-oai would like to add an experimental thread/search API so the Codex App can expose a "thread search" feature. can you validate if this is something that we should eventually support in ThreadStore? would be nice to have a cloud-based solution for this as well eventually

owenlin0 · 2026-05-20T17:34:38Z

+        let Some(local_store) = self
+            .thread_store
+            .as_any()
+            .downcast_ref::<LocalThreadStore>()


btw, we have an existing pattern for ThreadStore and throwing a ThreadStoreError::Unsupported error if unsupported on local or remote as needed. can we follow that?

thread/search should be modeled as a ThreadStore operation instead of downcasting to LocalThreadStore in the app-server processor.

Suggested shape:

Add a search_threads(...) -> ThreadStoreResult<ThreadSearchPage> method to the ThreadStore trait.

Give the trait default implementation:
Err(ThreadStoreError::Unsupported { operation: "thread/search" })

Implement it for LocalThreadStore by calling the existing local rollout search path.

In app-server, call self.thread_store.search_threads(...) instead of as_any().downcast_ref::<LocalThreadStore>().

Let the existing thread_store_list_error / unsupported_thread_store_operation mapping convert ThreadStoreError::Unsupported into the JSON-RPC error.

That keeps the API boundary honest: non-local stores report “unsupported” instead of returning an empty page, while future remote stores can add their own implementation without changing app-server dispatch.

owenlin0

One small note on search semantics.

wiltzius-openai

General direction seems OK but I think there are some implementation improvements possible

For remote thread store I think it would be possible to support the search contract (it'll just require a lot of infra)

wiltzius-openai · 2026-05-20T20:53:08Z

+#[serde(rename_all = "camelCase")]
+#[ts(export_to = "v2/")]
+pub struct ThreadSearchResult {
+    pub thread: Thread,


just to confirm this will be a Thread object that does not contain Turns right, it isn't the full history? I think full history would be too expensive.

That is correct. It's not full history. turns will be []

wiltzius-openai · 2026-05-20T20:58:49Z

+                .map_err(thread_store_list_error)?;
+
+            for result in page.items {
+                let source = with_thread_spawn_agent_metadata(


Some cruft in this search implementation. I think what's happening is:

it searches in thread store using a search term and some some additional parameters

it applies a post-filter based on source kind? Is there a reason this filter can't be applied internally?

then it has to re-loop the search since it no longer necessarily has a full page of results?

wiltzius-openai · 2026-05-20T21:02:28Z

 - `cwd` — restrict results to threads whose session cwd exactly matches this path, or one of these paths when an array is provided. Relative paths are resolved against the app-server process cwd before matching.
 - `useStateDbOnly` — when `true`, return from the state DB without scanning JSONL rollouts to repair metadata. Omit or pass `false` to preserve the default scan-and-repair behavior.
- `searchTerm` — restrict results to threads whose extracted title contains this substring (case-sensitive).
+- `searchTerm` — on `thread/list`, restrict results using the existing stored-thread search behavior. Use experimental `thread/search` when callers need persisted user/assistant content search plus matching snippet context.


"using the existing stored-thread search behavior." isn't super helpful, the old text is more precise -- I assume that's what was preserved (case-sensitive title search)

@fc-oai can we revert this change to the README?

wiltzius-openai · 2026-05-20T21:11:59Z

+            sort_direction,
+        )
+        .await?;
+        for item in page.items {


It seems a little inefficient to ripgrep over the entire rollout history then enumerate all threads filtering to only those that match the ripgrep results. You already have the paths from the ripgrep search, so why not use that directly in the snippet generation (which takes a path)?

that approach, while seemingly a bit inefficient in theory, is actually faster when a search term has a lot of matches

The tradeoff is broad terms. The old SQLite-intersection approach is good when a term appears in lots of files, because SQLite can walk newest-first and stop once it has a page. The revised approach has to read metadata for every rg hit before it can sort and page, so common terms like "thread" regress badly.

wiltzius-openai · 2026-05-20T21:13:20Z

+    let mut items = matching_items
+        .into_iter()
+        .filter_map(|item| {
+            stored_thread_from_rollout_item(


this is returning a stored thread but with only a single rollout item as its history? A little funky but I guess that works (its just a "different" shape of thread than anywhere else I think)

curious if there's a reason to include both the snippet and the item?

wiltzius-openai · 2026-05-20T21:14:02Z

+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct ThreadSearchPage {
+    /// Search results returned for this page.
+    pub items: Vec<StoredThreadSearchResult>,


nit: slightly confusing to call this "items" since they're actually threads + snippets and the threads themselves contain "items" (if I'm reading correctly, a list of a single item?)

agreed but also this follows the current pattern for all of the *Page structs in the threadstore contract. wonder if we should call it data instead of items?

fc-oai changed the title ~~Add rollout-backed thread content search~~ [codex] Add rollout-backed thread content search May 19, 2026

fc-oai marked this pull request as ready for review May 19, 2026 22:08

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes