fix(docs): tag aztec-nr-api records so they pass the search filter#23058
Merged
Conversation
Despite the previous fixes (#23042, #23049) restoring 14,773 records from 2,222 aztec-nr-api pages, those records still don't surface in the docs site search. Root cause: docusaurus-theme-search-typesense ANDs `language:=en && docusaurus_tag:=[<plugin-context-tag>]` into every search query. The api-nr records lack both attributes because the typesense-docsearch-scraper only stamps them onto records scraped from a docusaurus-rendered page (via `<meta name="docsearch:...">` tags); rustdoc-style nargo-doc pages don't emit those metas. Two changes: 1. Add `extra_attributes` on the api-nr start_url so each api-nr record gets `language: "en"` and a `docusaurus_tag` array spanning every plugin context (`docs-developer-v4.2.0`, `docs-network-v4.2.0`, `docs-participate-current`, `docs-root-current`, `default`). Typesense's `:=[...]` array-match then succeeds in any context the user is searching from. 2. Add `field_definitions` to `custom_settings`. The typesense-docsearch-scraper's default schema declares the wildcard `.*_tag` as `string`, so passing an array for `docusaurus_tag` would be rejected at import. `field_definitions` overrides the wildcard for `docusaurus_tag` specifically with `string*`, which accepts both a single string (existing docusaurus records, set from a meta tag) and an array (api-nr records). The scraper replaces the entire default schema when `field_definitions` is set, so the full default field list is reproduced verbatim with only the `docusaurus_tag` entry inserted before the `.*_tag` wildcard. Note: the version-specific tag values (`v4.2.0`) need to be updated when mainnet/testnet bump versions. Future improvement: derive these from `developer_version_config.json` and `network_version_config.json` at scrape time.
Follow-up to address the brittleness caveat. The previous commit
hardcoded `docs-developer-v4.2.0` and `docs-network-v4.2.0` in
typesense.config.json, which would silently go stale every time
mainnet/testnet versions bump.
Move the version-specific tags out of the static JSON. The workflow
now reads `docs/developer_version_config.json` and
`docs/network_version_config.json`, builds the four
`docs-${pluginId}-${versionName}` strings (filtered for empty/dupes),
and uses jq to append them to the api-nr start_url's
`extra_attributes.docusaurus_tag` array before passing the config to
the scraper. The static JSON keeps only the unversioned tags
(participate, root, default).
Also: `set -euo pipefail` so a jq derivation failure aborts the run
instead of feeding an empty config to docker.
alejoamiras
approved these changes
May 7, 2026
5 tasks
spalladino
pushed a commit
that referenced
this pull request
May 8, 2026
#23058 tried to make aztec-nr-api records visible in the docusaurus search dropdown by stamping `docusaurus_tag` as an array of plugin context tags, paired with a `field_definitions` schema override that declared `docusaurus_tag` as `string*` (string-or-array). In practice the override doesn't take effect: every api-nr document import is rejected by Typesense with `'Field 'docusaurus_tag' must be a string.'` The CI guard added in #23042 (MIN_HITS=5000) didn't trip because the ~12k non-api docs still passed. The fix is much smaller. The docusaurus theme's contextual filter unconditionally prepends the constant `default` (DEFAULT_SEARCH_TAG in docusaurus-theme-common) to every dropdown query's `docusaurus_tag` list. So a single scalar value of `"default"` on api-nr records satisfies the filter from every plugin context, and no schema override is needed: the scraper's default `.*_tag: string` accepts the scalar cleanly. Changes: - `docs/typesense.config.json`: drop `field_definitions`; collapse api-nr `extra_attributes.docusaurus_tag` to scalar `"default"`. - `.github/workflows/docs-typesense.yml`: drop the jq mutation that derived versioned tags (no longer needed). Add a post-index curl smoke check that searches the live alias for `docusaurus_tag:=[default]&&language:=en` and fails the run if fewer than MIN_API_HITS=1000 records are visible. No existing docusaurus page carries the `"default"` tag (each one is stamped with its plugin-context tag from the docsearch meta), so this count is effectively the count of indexed api-nr records.
3 tasks
rangozd
pushed a commit
to rangozd/aztec-packages
that referenced
this pull request
May 16, 2026
AztecProtocol#23058 tried to make aztec-nr-api records visible in the docusaurus search dropdown by stamping `docusaurus_tag` as an array of plugin context tags, paired with a `field_definitions` schema override that declared `docusaurus_tag` as `string*` (string-or-array). In practice the override doesn't take effect: every api-nr document import is rejected by Typesense with `'Field 'docusaurus_tag' must be a string.'` The CI guard added in AztecProtocol#23042 (MIN_HITS=5000) didn't trip because the ~12k non-api docs still passed. The fix is much smaller. The docusaurus theme's contextual filter unconditionally prepends the constant `default` (DEFAULT_SEARCH_TAG in docusaurus-theme-common) to every dropdown query's `docusaurus_tag` list. So a single scalar value of `"default"` on api-nr records satisfies the filter from every plugin context, and no schema override is needed: the scraper's default `.*_tag: string` accepts the scalar cleanly. Changes: - `docs/typesense.config.json`: drop `field_definitions`; collapse api-nr `extra_attributes.docusaurus_tag` to scalar `"default"`. - `.github/workflows/docs-typesense.yml`: drop the jq mutation that derived versioned tags (no longer needed). Add a post-index curl smoke check that searches the live alias for `docusaurus_tag:=[default]&&language:=en` and fails the run if fewer than MIN_API_HITS=1000 records are visible. No existing docusaurus page carries the `"default"` tag (each one is stamped with its plugin-context tag from the docsearch meta), so this count is effectively the count of indexed api-nr records.
rangozd
pushed a commit
to rangozd/aztec-packages
that referenced
this pull request
May 16, 2026
…tecProtocol#23097) ## Summary Fourth in the series fixing search after AztecProtocol#22861. After AztecProtocol#23058 merged, the production index still has **0 records under `aztec-nr-api/mainnet/...`**. Confirmed by querying the live Typesense collection directly (`filter_by:url:=https://docs.aztec.network/aztec-nr-api/mainnet/*` returns `found: 0`) and by inspecting the most recent scraper run logs. ## Root cause The schema override added by AztecProtocol#23058 doesn't take effect. Every api-nr document import is rejected by Typesense with HTTP 400 `'Field \`docusaurus_tag\` must be a string.'`, even though `custom_settings.field_definitions` lists an explicit `{ \"name\": \"docusaurus_tag\", \"type\": \"string*\" }` ahead of the wildcard `.*_tag: string`. Per Typesense docs an explicit field should win over a regex pattern field, but in practice the wildcard's `string` type appears to be what's enforced. The CI guard from AztecProtocol#23042 (`MIN_HITS=5000`) didn't trip because the ~12k non-api docs still passed. ## Fix The PR over-engineered the solution. Reading the docusaurus theme: ```ts // docusaurus-theme-common/src/utils/searchUtils.ts export const DEFAULT_SEARCH_TAG = 'default'; ``` ```ts // docusaurus-theme-common/src/index.ts const tags = [DEFAULT_SEARCH_TAG, ...docsTags]; return {locale: i18n.currentLocale, tags}; ``` …the theme unconditionally prepends `'default'` to the `docusaurus_tag` filter on every dropdown query, in every plugin context. So api-nr records only need the single scalar value `\"default\"` to satisfy the filter from anywhere on the docs site. No array, no schema surgery, no version-specific tag derivation. Three changes: ### 1. `docs/typesense.config.json` Drop the `custom_settings.field_definitions` override entirely (the scraper's default schema with `.*_tag: string` accepts scalar string values cleanly), and collapse the api-nr `extra_attributes.docusaurus_tag` to scalar `\"default\"`. ### 2. `.github/workflows/docs-typesense.yml` — remove jq mutation The jq block that derived versioned tags is no longer needed. The scraper now reads `docs/typesense.config.json` verbatim. ### 3. `.github/workflows/docs-typesense.yml` — log api-nr visibility post-index After the scraper completes its alias swap, curl the live `aztec-docs` alias for `docusaurus_tag:=[default]&&language:=en` and log the count. No existing docusaurus page carries the `\"default\"` tag (each is stamped with its plugin-context tag, e.g. `docs-developer-v4.2.0`, from the `<meta name=\"docsearch:docusaurus_tag\">` tag), so this count is effectively the count of indexed api-nr records — and the filter mirrors what the theme actually sends. Informational only; not gated by a threshold. ## Behavior change api-nr records will now appear in the search dropdown from every plugin context (developer, network, root, participate) and every doc version (mainnet, testnet, nightly), because they're stamped with the always-prepended `\"default\"` tag rather than version-specific tags. Today we only generate `aztec-nr-api/mainnet/`, so a user browsing testnet developer docs would see mainnet aztec-nr API links in their dropdown. Probably desirable (an aztec-nr API symbol is the same regardless of which doc version you're reading), but a behavior change vs the (non-functional) AztecProtocol#23058 attempt. ## Caveat api-nr visibility now depends on the docusaurus theme's `DEFAULT_SEARCH_TAG = 'default'` invariant. If a future caller ever issues a search query that doesn't include `'default'` in the tag list (e.g. a custom search page bypassing `useContextualSearchFilters`), api-nr records would silently disappear from that surface. ## Test plan - [ ] Manually dispatch `Docs Scraper` workflow via `workflow_dispatch` on this branch. - [ ] Confirm the run logs `Indexed N records (threshold: 5000)` with N >> 5000. - [ ] Confirm the run logs `api-nr records visible under docusaurus_tag:=[default]: M` with M well above zero (AztecProtocol#23049 indexed 14,773 api-nr records before the schema rejection started silently dropping them, so we expect a similar count). - [ ] Confirm no `'Field \`docusaurus_tag\` must be a string.'` 400s in the scraper output. - [ ] After merge, search docs.aztec.network from the homepage, /developers/, /network/, and /participate/ for an Aztec.nr identifier (e.g. `ContractClassId`, `balance_set`, `compute_log_tag`, `address_note`) and confirm API reference pages appear in the dropdown in all four contexts.
rangozd
pushed a commit
to rangozd/aztec-packages
that referenced
this pull request
May 16, 2026
…rotocol#23109) ## Problem Search results for aztec-nr-api pages 404 in the browser even though the underlying static HTML exists and a direct fetch returns 200. Repro: search the docs dropdown for an Aztec.nr identifier (e.g. `SerializeToColumns`), click an api-nr hit, get the Docusaurus 404 page. Curling the same URL works: ``` $ curl -sI https://docs.aztec.network/aztec-nr-api/mainnet/protocol_types/proof/traits/trait.serializetocolumns | head -1 HTTP/2 200 ``` So the index from AztecProtocol#23097 is correct (14,773 api-nr records visible under `docusaurus_tag:=[default]`, confirmed in the May 8 scraper run); the file is reachable; the 404 is purely client-side. ## Root cause `aztec-nr-api/` pages are static HTML generated by `nargo doc` and dropped into `docs/static/aztec-nr-api/`. Netlify serves them as raw files. Docusaurus does **not** register them as React Router routes. `docusaurus-theme-search-typesense` decides between SPA navigation and full page load via `externalUrlRegex` ([SearchBar/index.tsx#L169-L197](https://github.com/typesense/docusaurus-theme-search-typesense/blob/main/src/theme/SearchBar/index.tsx#L169)): ```ts // transformItems: keep absolute URL only if it matches the regex, // otherwise strip to a relative path if (isRegexpStringMatch(externalUrlRegex, item.url)) return item; return { ...item, url: withBaseUrl(`${url.pathname}${url.hash}`) }; // navigator: hard-nav only if itemUrl matches the regex if (isRegexpStringMatch(externalUrlRegex, itemUrl)) { window.location.href = itemUrl; } else { history.push(itemUrl); } ``` `externalUrlRegex` is unset in `docs/docusaurus.config.js`, so api-nr clicks go through `history.push`, which dispatches to React Router, which has no matching route for `/aztec-nr-api/...` and renders the SPA's 404. The static file on disk is never requested. ## Fix Set `externalUrlRegex: "/aztec-nr-api/"` in the typesense theme config. `isRegexpStringMatch` does a case-insensitive substring match (`new RegExp(s, 'gi').test(value)`), so this: - matches the absolute URL during `transformItems` → URL stays absolute, - matches the URL during `navigate` → uses `window.location.href` → real page load → Netlify serves the static file. Regular Docusaurus results don't match the regex and continue using `history.push`, preserving SPA navigation for in-app routes. The slashes (`/aztec-nr-api/`) prevent accidental substring matches if a future doc slug ever contained "aztec-nr-api". ## Verification Pre-fix repro is the user-reported bug; post-fix verification requires the deployed Netlify preview. After this lands and a preview is built, manually: - [ ] Search the dropdown from the homepage, /developers/, /operate/, and /participate/ for an Aztec.nr identifier (e.g. `SerializeToColumns`, `address_note`). - [ ] Click an api-nr result; confirm the page loads (not the 404). - [ ] Click a regular Docusaurus result; confirm SPA navigation still feels instant (no full page reload). ## Notes - Only `aztec-nr-api/` is currently indexed in `docs/typesense.config.json`. `static/typescript-api/` is not crawled by the Typesense scraper, so it doesn't need to be in the regex. - This is the third PR in the search-fix series after AztecProtocol#22861 → AztecProtocol#23042 → AztecProtocol#23058 → AztecProtocol#23097. AztecProtocol#23097 fixed indexing (records exist in Typesense); this one fixes click-through (records resolve in the browser).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third in the series fixing search after #22861. Previous PRs (#23042, #23049) successfully indexed 14,773 records from 2,222 aztec-nr-api pages, but users still don't see those records in the dropdown search.
Root cause
The
docusaurus-theme-search-typesensepackage ANDs a contextual filter into every search query:The api-nr records have neither
languagenordocusaurus_tagset, because the typesense-docsearch-scraper only stamps those onto records scraped from docusaurus pages (it reads<meta name="docsearch:docusaurus_tag" content="...">tags). Rustdoc-style nargo-doc pages don't emit those metas, so every api-nr record is missing the fields the theme filters on, so every api-nr record is filtered out of every dropdown query.Fix
Three coordinated changes:
1.
extra_attributeson the api-nr start_url (docs/typesense.config.json)Stamp every api-nr record with the attributes the theme expects:
These cover the three unversioned plugin contexts. The two versioned ones (
docs-developer-${version}anddocs-network-${version}) are appended dynamically by the workflow (see #3) so the static config doesn't go stale on version bumps.Typesense's
docusaurus_tag:=[<context-tag>]matches if the record's array contains the context tag, so the api-nr records will satisfy the filter from any plugin context.2.
field_definitionsschema override (docs/typesense.config.json)The scraper's default schema (
scraper/src/typesense_helper.pyv0.11.0) declares the wildcard.*_tagasstring, so sending an array fordocusaurus_tagwould be rejected at import time.field_definitionsoverrides this — but it REPLACES the entire default schema rather than merging, so the full default field list is reproduced verbatim with one targeted change:docusaurus_tagis added with typestring*(accepts both string and array) before the.*_tagwildcard. Existing docusaurus records continue to work because they passdocusaurus_tagas a single string from a meta tag, andstring*accepts that too.3. Derive versioned tags at scrape time (
.github/workflows/docs-typesense.yml)Read
developer_version_config.jsonandnetwork_version_config.json, build thedocs-developer-${mainnet},docs-developer-${testnet},docs-network-${mainnet},docs-network-${testnet}strings (dropping empty/duplicates), and usejqto append them to the api-nr start_url'sdocusaurus_tagarray before passing the config to the scraper. This way the static JSON never holds version-specific values that need manual updating.The workflow run also switches to
set -euo pipefailso ajqderivation failure aborts the run rather than feeding an empty config to docker.Caveats
Test plan
Docs Scraperworkflow on this branch viaworkflow_dispatch.Nb hits≈ 27,000 (no regression in record count).docusaurus_tagvalues matching the current docs versions (e.g.docs-developer-v4.2.0,docs-network-v4.2.0).ContractClassId,balance_set,compute_log_tag) and confirm API reference pages appear in the dropdown in all three contexts.