Skip to content

fix: resolve numeric character reference external entities#826

Open
abhu85 wants to merge 1 commit intoNaturalIntelligence:masterfrom
abhu85:fix/825-numeric-char-refs
Open

fix: resolve numeric character reference external entities#826
abhu85 wants to merge 1 commit intoNaturalIntelligence:masterfrom
abhu85:fix/825-numeric-char-refs

Conversation

@abhu85
Copy link
Copy Markdown

@abhu85 abhu85 commented Apr 29, 2026

Summary

Fix regression where addEntity('#xD', '\r') and similar numeric character reference external entities are silently ignored since v5.7.x.

Problem

The addEntity API documents that numeric character references can be registered as external entities (e.g., parser.addEntity('#xD', '\r') for 
). This worked before v5.7.0 but broke with the migration to @nodable/entities:

  • v5.7.1: setExternalEntities() calls validateEntityName() which throws Invalid character '#' in entity name: "#xD"
  • v5.7.2: The validateEntityName error was bypassed by passing external entities through the constructor, but #-prefixed entities are still silently ignored because the EntityDecoder's NCR path intercepts #-prefixed tokens before checking the named entity map

This breaks downstream users (reported by AWS SDK users in #825).

Solution

Separate #-prefixed external entities from regular named entities. Regular entities are passed to the EntityDecoder's named entity map as before. Numeric (#-prefixed) entities are resolved via pre-processing — replacing 
, 
, etc. with their registered values before the EntityDecoder's decode pass runs.

This preserves backward compatibility:

  • addEntity('#xD', '\r') works correctly again
  • Default behavior (no addEntity, no htmlEntities) is unchanged
  • htmlEntities: true continues to decode all numeric character references
  • Custom entityDecoder option is unaffected

Test Plan

  • Re-enabled the disabled test for addEntity("#xD", "\r\n")
  • Added tests for multiple hex (
,  , A) and decimal (
) numeric character reference external entities
  • All 316 existing tests pass
  • Verified with AWS SDK XML response patterns

Fixes #825

…telligence#825)

The addEntity('#xD', '\r') API was broken since v5.7.0 because the
EntityDecoder's NCR path intercepts #-prefixed tokens before checking
the named entity map. This caused #-prefixed external entities to be
silently ignored (v5.7.2) or throw an error (v5.7.1).

Fix by separating #-prefixed external entities and resolving them via
pre-processing before the EntityDecoder's decode pass.
@amitguptagwl
Copy link
Copy Markdown
Member

numerical entities are already internally supported. I couldn't understand the purpose of this workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v5.7.x rejects valid XML numeric character references like 

2 participants