Skip to content

Use native batch URL rewrites for text nodes#284

Draft
adamziel wants to merge 1 commit into
codex-native-structured-rewritefrom
codex/use-native-batch-rewrite
Draft

Use native batch URL rewrites for text nodes#284
adamziel wants to merge 1 commit into
codex-native-structured-rewritefrom
codex/use-native-batch-rewrite

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

What it does

Stacks on #282 and routes wp_rewrite_urls() text-node rewriting through wp_native_apis_rewrite_text_url_bases() when the native function is available.

The native fast path is intentionally narrow:

  • only current #text tokens are sent to the native batch rewriter;
  • block JSON attributes, URL HTML attributes, style/CSS URLs, and non-ASCII text continue through the structured PHP path;
  • if the native batch call makes no change, the existing per-URL path still handles the token.

Why

#282 adds the coarse native primitive, but the public wp_rewrite_urls() path does not call it. This PR shows what kind of speedup is available when a public rewrite workflow uses that native primitive without running it over raw block markup wholesale.

Benchmark

Same 5,000 sampled post_content values used in the reprint experiments: 16.85 MB decoded, 5,000 changed values. PHP.wasm 8.4.21 with a locally built #282 wp_native_apis extension.

Branch Native batch function Time Output hash
#282 base available, not used by wp_rewrite_urls() 95.694s 44c9750cb310ad822ce73840290c647e609b78ee047365b60f9f123aeed5d9bc
this PR used for ASCII text nodes 2.338s 44c9750cb310ad822ce73840290c647e609b78ee047365b60f9f123aeed5d9bc

That is about 40.9x faster than #282 alone on this sample. For context, #283's merged native scanner/cache path measured around 3.5s on the same sample, so this points to another meaningful improvement once #282 is available.

Testing

php -l components/DataLiberation/BlockMarkup/class-blockmarkupurlprocessor.php
php -l components/DataLiberation/URL/functions.php
vendor/bin/phpcbf -d memory_limit=1G components/DataLiberation/BlockMarkup/class-blockmarkupurlprocessor.php components/DataLiberation/URL/functions.php
vendor/bin/phpcs -d memory_limit=1G components/DataLiberation/BlockMarkup/class-blockmarkupurlprocessor.php components/DataLiberation/URL/functions.php
vendor/bin/phpunit -c phpunit.xml components/DataLiberation/Tests/RewriteUrlsTest.php components/DataLiberation/Tests/BlockMarkupUrlProcessorTest.php
PHP_WASM_VERSIONS=8.4 ./extensions/native-apis/build-playground-extension.sh .context/wp_native_apis-stacked-wasm-extension

@adamziel adamziel force-pushed the codex-native-structured-rewrite branch from 4903777 to 684a29c Compare May 19, 2026 20:17
@adamziel adamziel force-pushed the codex/use-native-batch-rewrite branch from e4a51e3 to 882768c Compare May 19, 2026 20:20
@adamziel
Copy link
Copy Markdown
Collaborator Author

Rebased this stack on current trunk after #283 landed and re-ran the same post-content sample benchmark.

Sample: 5,000 post_content values, 16.85 MB decoded, PHP.wasm 8.4, wp_native_apis loaded, output hash 44c9750cb310ad822ce73840290c647e609b78ee047365b60f9f123aeed5d9bc for both branches.

Branch Runs Median Delta
#282 rebased on trunk/#283 3.773 s, 3.655 s, 3.642 s 3.655 s baseline
#284 stacked on rebased #282 2.632 s, 2.532 s, 2.448 s 2.532 s 30.7% faster

This did not confirm the hypothesis that the stacked native batch text rewrite has no meaningful improvement after #283. It is still materially faster on this content sample.

Caveat: the rebased native-extension CI is currently failing URL text parity tests in the native-extension job, so this stack is not merge-ready as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant