Skip to content

fix(docs): tag aztec-nr-api records so they pass the search filter#23058

Merged
critesjosh merged 2 commits into
nextfrom
josh/fix-typesense-api-tags
May 7, 2026
Merged

fix(docs): tag aztec-nr-api records so they pass the search filter#23058
critesjosh merged 2 commits into
nextfrom
josh/fix-typesense-api-tags

Conversation

@critesjosh

@critesjosh critesjosh commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Third in the series fixing search after #22861. Previous PRs (#23042, #23049) successfully indexed 14,773 records from 2,222 aztec-nr-api pages, but users still don't see those records in the dropdown search.

Root cause

The docusaurus-theme-search-typesense package ANDs a contextual filter into every search query:

// docusaurus-theme-search-typesense/src/client/useTypesenseContextualFacetFilters.ts
const languageFilter = `language:=${locale}`;
const tagsFilter = `docusaurus_tag:=[${tags.join(',')}]`;
return [languageFilter, tagsFilter].filter(Boolean).join(' && ');

The api-nr records have neither language nor docusaurus_tag set, because the typesense-docsearch-scraper only stamps those onto records scraped from docusaurus pages (it reads <meta name="docsearch:docusaurus_tag" content="..."> tags). Rustdoc-style nargo-doc pages don't emit those metas, so every api-nr record is missing the fields the theme filters on, so every api-nr record is filtered out of every dropdown query.

Fix

Three coordinated changes:

1. extra_attributes on the api-nr start_url (docs/typesense.config.json)

Stamp every api-nr record with the attributes the theme expects:

"extra_attributes": {
  "language": "en",
  "docusaurus_tag": [
    "docs-participate-current",
    "docs-root-current",
    "default"
  ]
}

These cover the three unversioned plugin contexts. The two versioned ones (docs-developer-${version} and docs-network-${version}) are appended dynamically by the workflow (see #3) so the static config doesn't go stale on version bumps.

Typesense's docusaurus_tag:=[<context-tag>] matches if the record's array contains the context tag, so the api-nr records will satisfy the filter from any plugin context.

2. field_definitions schema override (docs/typesense.config.json)

The scraper's default schema (scraper/src/typesense_helper.py v0.11.0) declares the wildcard .*_tag as string, so sending an array for docusaurus_tag would be rejected at import time. field_definitions overrides this — but it REPLACES the entire default schema rather than merging, so the full default field list is reproduced verbatim with one targeted change: docusaurus_tag is added with type string* (accepts both string and array) before the .*_tag wildcard. Existing docusaurus records continue to work because they pass docusaurus_tag as a single string from a meta tag, and string* accepts that too.

3. Derive versioned tags at scrape time (.github/workflows/docs-typesense.yml)

Read developer_version_config.json and network_version_config.json, build the docs-developer-${mainnet}, docs-developer-${testnet}, docs-network-${mainnet}, docs-network-${testnet} strings (dropping empty/duplicates), and use jq to append them to the api-nr start_url's docusaurus_tag array before passing the config to the scraper. This way the static JSON never holds version-specific values that need manual updating.

The workflow run also switches to set -euo pipefail so a jq derivation failure aborts the run rather than feeding an empty config to docker.

Caveats

  • Existing 14,773 api-nr records in the production collection are stale until the next scraper run rewrites them. The scraper alias-swaps to a fresh collection on each run, so no manual purge is needed.

Test plan

  • Manually dispatch Docs Scraper workflow on this branch via workflow_dispatch.
  • Confirm scraper run reports Nb hits ≈ 27,000 (no regression in record count).
  • Confirm no schema-validation errors in the run log.
  • Confirm the workflow log echoes the derived docusaurus_tag values matching the current docs versions (e.g. docs-developer-v4.2.0, docs-network-v4.2.0).
  • After merge, search docs.aztec.network from the homepage, /developers/, and /operate/ for an Aztec.nr identifier (e.g. ContractClassId, balance_set, compute_log_tag) and confirm API reference pages appear in the dropdown in all three contexts.

critesjosh added 2 commits May 7, 2026 14:24
Despite the previous fixes (#23042, #23049) restoring 14,773 records
from 2,222 aztec-nr-api pages, those records still don't surface in
the docs site search. Root cause: docusaurus-theme-search-typesense
ANDs `language:=en && docusaurus_tag:=[<plugin-context-tag>]` into
every search query. The api-nr records lack both attributes because
the typesense-docsearch-scraper only stamps them onto records scraped
from a docusaurus-rendered page (via `<meta name="docsearch:...">`
tags); rustdoc-style nargo-doc pages don't emit those metas.

Two changes:

1. Add `extra_attributes` on the api-nr start_url so each api-nr
   record gets `language: "en"` and a `docusaurus_tag` array spanning
   every plugin context (`docs-developer-v4.2.0`,
   `docs-network-v4.2.0`, `docs-participate-current`,
   `docs-root-current`, `default`). Typesense's `:=[...]` array-match
   then succeeds in any context the user is searching from.

2. Add `field_definitions` to `custom_settings`. The
   typesense-docsearch-scraper's default schema declares the wildcard
   `.*_tag` as `string`, so passing an array for `docusaurus_tag`
   would be rejected at import. `field_definitions` overrides the
   wildcard for `docusaurus_tag` specifically with `string*`, which
   accepts both a single string (existing docusaurus records, set
   from a meta tag) and an array (api-nr records). The scraper
   replaces the entire default schema when `field_definitions` is
   set, so the full default field list is reproduced verbatim with
   only the `docusaurus_tag` entry inserted before the `.*_tag`
   wildcard.

Note: the version-specific tag values (`v4.2.0`) need to be updated
when mainnet/testnet bump versions. Future improvement: derive these
from `developer_version_config.json` and `network_version_config.json`
at scrape time.
Follow-up to address the brittleness caveat. The previous commit
hardcoded `docs-developer-v4.2.0` and `docs-network-v4.2.0` in
typesense.config.json, which would silently go stale every time
mainnet/testnet versions bump.

Move the version-specific tags out of the static JSON. The workflow
now reads `docs/developer_version_config.json` and
`docs/network_version_config.json`, builds the four
`docs-${pluginId}-${versionName}` strings (filtered for empty/dupes),
and uses jq to append them to the api-nr start_url's
`extra_attributes.docusaurus_tag` array before passing the config to
the scraper. The static JSON keeps only the unversioned tags
(participate, root, default).

Also: `set -euo pipefail` so a jq derivation failure aborts the run
instead of feeding an empty config to docker.
@critesjosh critesjosh requested a review from charlielye as a code owner May 7, 2026 18:31
@critesjosh critesjosh enabled auto-merge May 7, 2026 19:05
@critesjosh critesjosh added this pull request to the merge queue May 7, 2026
@alejoamiras alejoamiras removed this pull request from the merge queue due to a manual request May 7, 2026
@critesjosh critesjosh added this pull request to the merge queue May 7, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 7, 2026
@critesjosh critesjosh added this pull request to the merge queue May 7, 2026
Merged via the queue into next with commit a248d30 May 7, 2026
27 of 29 checks passed
@critesjosh critesjosh deleted the josh/fix-typesense-api-tags branch May 7, 2026 21:27
spalladino pushed a commit that referenced this pull request May 8, 2026
#23058 tried to make aztec-nr-api records visible in the docusaurus
search dropdown by stamping `docusaurus_tag` as an array of plugin
context tags, paired with a `field_definitions` schema override that
declared `docusaurus_tag` as `string*` (string-or-array). In practice
the override doesn't take effect: every api-nr document import is
rejected by Typesense with `'Field 'docusaurus_tag' must be a string.'`
The CI guard added in #23042 (MIN_HITS=5000) didn't trip because the
~12k non-api docs still passed.

The fix is much smaller. The docusaurus theme's contextual filter
unconditionally prepends the constant `default` (DEFAULT_SEARCH_TAG in
docusaurus-theme-common) to every dropdown query's `docusaurus_tag`
list. So a single scalar value of `"default"` on api-nr records
satisfies the filter from every plugin context, and no schema override
is needed: the scraper's default `.*_tag: string` accepts the scalar
cleanly.

Changes:
- `docs/typesense.config.json`: drop `field_definitions`; collapse
  api-nr `extra_attributes.docusaurus_tag` to scalar `"default"`.
- `.github/workflows/docs-typesense.yml`: drop the jq mutation that
  derived versioned tags (no longer needed). Add a post-index curl
  smoke check that searches the live alias for
  `docusaurus_tag:=[default]&&language:=en` and fails the run if
  fewer than MIN_API_HITS=1000 records are visible. No existing
  docusaurus page carries the `"default"` tag (each one is stamped
  with its plugin-context tag from the docsearch meta), so this
  count is effectively the count of indexed api-nr records.
rangozd pushed a commit to rangozd/aztec-packages that referenced this pull request May 16, 2026
AztecProtocol#23058 tried to make aztec-nr-api records visible in the docusaurus
search dropdown by stamping `docusaurus_tag` as an array of plugin
context tags, paired with a `field_definitions` schema override that
declared `docusaurus_tag` as `string*` (string-or-array). In practice
the override doesn't take effect: every api-nr document import is
rejected by Typesense with `'Field 'docusaurus_tag' must be a string.'`
The CI guard added in AztecProtocol#23042 (MIN_HITS=5000) didn't trip because the
~12k non-api docs still passed.

The fix is much smaller. The docusaurus theme's contextual filter
unconditionally prepends the constant `default` (DEFAULT_SEARCH_TAG in
docusaurus-theme-common) to every dropdown query's `docusaurus_tag`
list. So a single scalar value of `"default"` on api-nr records
satisfies the filter from every plugin context, and no schema override
is needed: the scraper's default `.*_tag: string` accepts the scalar
cleanly.

Changes:
- `docs/typesense.config.json`: drop `field_definitions`; collapse
  api-nr `extra_attributes.docusaurus_tag` to scalar `"default"`.
- `.github/workflows/docs-typesense.yml`: drop the jq mutation that
  derived versioned tags (no longer needed). Add a post-index curl
  smoke check that searches the live alias for
  `docusaurus_tag:=[default]&&language:=en` and fails the run if
  fewer than MIN_API_HITS=1000 records are visible. No existing
  docusaurus page carries the `"default"` tag (each one is stamped
  with its plugin-context tag from the docsearch meta), so this
  count is effectively the count of indexed api-nr records.
rangozd pushed a commit to rangozd/aztec-packages that referenced this pull request May 16, 2026
…tecProtocol#23097)

## Summary

Fourth in the series fixing search after AztecProtocol#22861. After AztecProtocol#23058 merged,
the production index still has **0 records under
`aztec-nr-api/mainnet/...`**. Confirmed by querying the live Typesense
collection directly
(`filter_by:url:=https://docs.aztec.network/aztec-nr-api/mainnet/*`
returns `found: 0`) and by inspecting the most recent scraper run logs.

## Root cause

The schema override added by AztecProtocol#23058 doesn't take effect. Every api-nr
document import is rejected by Typesense with HTTP 400 `'Field
\`docusaurus_tag\` must be a string.'`, even though
`custom_settings.field_definitions` lists an explicit `{ \"name\":
\"docusaurus_tag\", \"type\": \"string*\" }` ahead of the wildcard
`.*_tag: string`. Per Typesense docs an explicit field should win over a
regex pattern field, but in practice the wildcard's `string` type
appears to be what's enforced. The CI guard from AztecProtocol#23042
(`MIN_HITS=5000`) didn't trip because the ~12k non-api docs still
passed.

## Fix

The PR over-engineered the solution. Reading the docusaurus theme:

```ts
// docusaurus-theme-common/src/utils/searchUtils.ts
export const DEFAULT_SEARCH_TAG = 'default';
```

```ts
// docusaurus-theme-common/src/index.ts
const tags = [DEFAULT_SEARCH_TAG, ...docsTags];
return {locale: i18n.currentLocale, tags};
```

…the theme unconditionally prepends `'default'` to the `docusaurus_tag`
filter on every dropdown query, in every plugin context. So api-nr
records only need the single scalar value `\"default\"` to satisfy the
filter from anywhere on the docs site. No array, no schema surgery, no
version-specific tag derivation.

Three changes:

### 1. `docs/typesense.config.json`

Drop the `custom_settings.field_definitions` override entirely (the
scraper's default schema with `.*_tag: string` accepts scalar string
values cleanly), and collapse the api-nr
`extra_attributes.docusaurus_tag` to scalar `\"default\"`.

### 2. `.github/workflows/docs-typesense.yml` — remove jq mutation

The jq block that derived versioned tags is no longer needed. The
scraper now reads `docs/typesense.config.json` verbatim.

### 3. `.github/workflows/docs-typesense.yml` — log api-nr visibility
post-index

After the scraper completes its alias swap, curl the live `aztec-docs`
alias for `docusaurus_tag:=[default]&&language:=en` and log the count.
No existing docusaurus page carries the `\"default\"` tag (each is
stamped with its plugin-context tag, e.g. `docs-developer-v4.2.0`, from
the `<meta name=\"docsearch:docusaurus_tag\">` tag), so this count is
effectively the count of indexed api-nr records — and the filter mirrors
what the theme actually sends. Informational only; not gated by a
threshold.

## Behavior change

api-nr records will now appear in the search dropdown from every plugin
context (developer, network, root, participate) and every doc version
(mainnet, testnet, nightly), because they're stamped with the
always-prepended `\"default\"` tag rather than version-specific tags.
Today we only generate `aztec-nr-api/mainnet/`, so a user browsing
testnet developer docs would see mainnet aztec-nr API links in their
dropdown. Probably desirable (an aztec-nr API symbol is the same
regardless of which doc version you're reading), but a behavior change
vs the (non-functional) AztecProtocol#23058 attempt.

## Caveat

api-nr visibility now depends on the docusaurus theme's
`DEFAULT_SEARCH_TAG = 'default'` invariant. If a future caller ever
issues a search query that doesn't include `'default'` in the tag list
(e.g. a custom search page bypassing `useContextualSearchFilters`),
api-nr records would silently disappear from that surface.

## Test plan

- [ ] Manually dispatch `Docs Scraper` workflow via `workflow_dispatch`
on this branch.
- [ ] Confirm the run logs `Indexed N records (threshold: 5000)` with N
>> 5000.
- [ ] Confirm the run logs `api-nr records visible under
docusaurus_tag:=[default]: M` with M well above zero (AztecProtocol#23049 indexed
14,773 api-nr records before the schema rejection started silently
dropping them, so we expect a similar count).
- [ ] Confirm no `'Field \`docusaurus_tag\` must be a string.'` 400s in
the scraper output.
- [ ] After merge, search docs.aztec.network from the homepage,
/developers/, /network/, and /participate/ for an Aztec.nr identifier
(e.g. `ContractClassId`, `balance_set`, `compute_log_tag`,
`address_note`) and confirm API reference pages appear in the dropdown
in all four contexts.
rangozd pushed a commit to rangozd/aztec-packages that referenced this pull request May 16, 2026
…rotocol#23109)

## Problem

Search results for aztec-nr-api pages 404 in the browser even though the
underlying static HTML exists and a direct fetch returns 200.

Repro: search the docs dropdown for an Aztec.nr identifier (e.g.
`SerializeToColumns`), click an api-nr hit, get the Docusaurus 404 page.
Curling the same URL works:

```
$ curl -sI https://docs.aztec.network/aztec-nr-api/mainnet/protocol_types/proof/traits/trait.serializetocolumns | head -1
HTTP/2 200
```

So the index from AztecProtocol#23097 is correct (14,773 api-nr records visible under
`docusaurus_tag:=[default]`, confirmed in the May 8 scraper run); the
file is reachable; the 404 is purely client-side.

## Root cause

`aztec-nr-api/` pages are static HTML generated by `nargo doc` and
dropped into `docs/static/aztec-nr-api/`. Netlify serves them as raw
files. Docusaurus does **not** register them as React Router routes.

`docusaurus-theme-search-typesense` decides between SPA navigation and
full page load via `externalUrlRegex`
([SearchBar/index.tsx#L169-L197](https://github.com/typesense/docusaurus-theme-search-typesense/blob/main/src/theme/SearchBar/index.tsx#L169)):

```ts
// transformItems: keep absolute URL only if it matches the regex,
// otherwise strip to a relative path
if (isRegexpStringMatch(externalUrlRegex, item.url)) return item;
return { ...item, url: withBaseUrl(`${url.pathname}${url.hash}`) };

// navigator: hard-nav only if itemUrl matches the regex
if (isRegexpStringMatch(externalUrlRegex, itemUrl)) {
  window.location.href = itemUrl;
} else {
  history.push(itemUrl);
}
```

`externalUrlRegex` is unset in `docs/docusaurus.config.js`, so api-nr
clicks go through `history.push`, which dispatches to React Router,
which has no matching route for `/aztec-nr-api/...` and renders the
SPA's 404. The static file on disk is never requested.

## Fix

Set `externalUrlRegex: "/aztec-nr-api/"` in the typesense theme config.
`isRegexpStringMatch` does a case-insensitive substring match (`new
RegExp(s, 'gi').test(value)`), so this:

- matches the absolute URL during `transformItems` → URL stays absolute,
- matches the URL during `navigate` → uses `window.location.href` → real
page load → Netlify serves the static file.

Regular Docusaurus results don't match the regex and continue using
`history.push`, preserving SPA navigation for in-app routes.

The slashes (`/aztec-nr-api/`) prevent accidental substring matches if a
future doc slug ever contained "aztec-nr-api".

## Verification

Pre-fix repro is the user-reported bug; post-fix verification requires
the deployed Netlify preview. After this lands and a preview is built,
manually:

- [ ] Search the dropdown from the homepage, /developers/, /operate/,
and /participate/ for an Aztec.nr identifier (e.g. `SerializeToColumns`,
`address_note`).
- [ ] Click an api-nr result; confirm the page loads (not the 404).
- [ ] Click a regular Docusaurus result; confirm SPA navigation still
feels instant (no full page reload).

## Notes

- Only `aztec-nr-api/` is currently indexed in
`docs/typesense.config.json`. `static/typescript-api/` is not crawled by
the Typesense scraper, so it doesn't need to be in the regex.
- This is the third PR in the search-fix series after AztecProtocol#22861AztecProtocol#23042AztecProtocol#23058AztecProtocol#23097. AztecProtocol#23097 fixed indexing (records exist in Typesense);
this one fixes click-through (records resolve in the browser).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants