fix: resolve numeric character reference external entities#826
Open
abhu85 wants to merge 1 commit intoNaturalIntelligence:masterfrom
Open
fix: resolve numeric character reference external entities#826abhu85 wants to merge 1 commit intoNaturalIntelligence:masterfrom
abhu85 wants to merge 1 commit intoNaturalIntelligence:masterfrom
Conversation
…telligence#825) The addEntity('#xD', '\r') API was broken since v5.7.0 because the EntityDecoder's NCR path intercepts #-prefixed tokens before checking the named entity map. This caused #-prefixed external entities to be silently ignored (v5.7.2) or throw an error (v5.7.1). Fix by separating #-prefixed external entities and resolving them via pre-processing before the EntityDecoder's decode pass.
Member
|
numerical entities are already internally supported. I couldn't understand the purpose of this workaround? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix regression where
addEntity('#xD', '\r')and similar numeric character reference external entities are silently ignored since v5.7.x.Problem
The
addEntityAPI documents that numeric character references can be registered as external entities (e.g.,parser.addEntity('#xD', '\r')for
). This worked before v5.7.0 but broke with the migration to@nodable/entities:setExternalEntities()callsvalidateEntityName()which throwsInvalid character '#' in entity name: "#xD"validateEntityNameerror was bypassed by passing external entities through the constructor, but#-prefixed entities are still silently ignored because theEntityDecoder's NCR path intercepts#-prefixed tokens before checking the named entity mapThis breaks downstream users (reported by AWS SDK users in #825).
Solution
Separate
#-prefixed external entities from regular named entities. Regular entities are passed to theEntityDecoder's named entity map as before. Numeric (#-prefixed) entities are resolved via pre-processing — replacing
, , etc. with their registered values before theEntityDecoder's decode pass runs.This preserves backward compatibility:
addEntity('#xD', '\r')works correctly againaddEntity, nohtmlEntities) is unchangedhtmlEntities: truecontinues to decode all numeric character referencesentityDecoderoption is unaffectedTest Plan
addEntity("#xD", "\r\n")
, ,A) and decimal ( ) numeric character reference external entitiesFixes #825