Slightly Better Support for Escaped Characters in Xlsx Reader/Writer#4726
Merged
oleibman merged 10 commits intoPHPOffice:masterfrom Dec 3, 2025
Merged
Slightly Better Support for Escaped Characters in Xlsx Reader/Writer#4726oleibman merged 10 commits intoPHPOffice:masterfrom
oleibman merged 10 commits intoPHPOffice:masterfrom
Conversation
See [Discussion 4724](PHPOffice#4724) PhpSpreadsheet converts all control characters (x00-x1f) in strings to and from a form which Excel recognizes (e.g. `x1c` becomes `_x001C_` when writing, and vice versa when reading). There have historically been 3 exceptions which go unconverted - tab (x09), line feed (new line) (x0a), and carriage return (x0d). PR PHPOffice#4536 removed those exceptions, but that caused some problems; these were fixed by PR PHPOffice#4619, but the exceptions were restored. The referenced discussion deals with a spreadsheet with a cell containing `_x000D_`, carriage return. Although the writer no longer converts to that string on output, the reader should be able to handle it on input. In fact, the reader ought to handle any string of the form "underscore x 4-hex-digits underscore", whether or not it represents a control character. And there's an interesting edge case. If a user enters into a cell the string `A_x0030_B`, it needs to be handled as-is. Excel handles this by writing it out as `A_x005F_x0030_B`, i.e. substituting `_x005F_` for the first underscore, so that the reader sees `_x005F_` (converting it to underscore) followed by `x0030_B` (no leading underscore, so no conversion). PhpSpreadsheet could probably handle this by converting all underscores on write, but I am trying to emulate Excel and do it only when needed.
It is probably very anal of me to do this. Excel does it. I can't see it happening in the wild.
Php8.5 problem with iconv //IGNORE.
Make some properties protected rather than private.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Discussion 4724
PhpSpreadsheet converts all control characters (x00-x1f) in strings to and from a form which Excel recognizes (e.g.
x1cbecomes_x001C_when writing, and vice versa when reading). There have historically been 3 exceptions which go unconverted - tab (x09), line feed (new line) (x0a), and carriage return (x0d). PR #4536 removed those exceptions, but that caused some problems; these were fixed by PR #4619, but the exceptions were restored.The referenced discussion deals with a spreadsheet with a cell containing
_x000D_, carriage return. Although the writer no longer converts to that string on output, the reader should be able to handle it on input. In fact, the reader ought to handle any string of the form "underscore x 4-hex-digits underscore", whether or not it represents a control character.And there's an interesting edge case. If a user enters into a cell the string
A_x0030_B, it needs to be handled as-is. Excel handles this by writing it out asA_x005F_x0030_B, i.e. substituting_x005F_for the first underscore, so that the reader sees_x005F_(converting it to underscore) followed byx0030_B(no leading underscore, so no conversion). PhpSpreadsheet could probably handle this by converting all underscores on write, but I am trying to emulate Excel and do it only when needed.This is:
Checklist: