Skip to content

gh-130567: Remove optimistic allocation in locale.strxfrm()#137143

Merged
encukou merged 3 commits into
python:mainfrom
serhiy-storchaka:locale-wcsxfrm
Oct 16, 2025
Merged

gh-130567: Remove optimistic allocation in locale.strxfrm()#137143
encukou merged 3 commits into
python:mainfrom
serhiy-storchaka:locale-wcsxfrm

Conversation

@serhiy-storchaka

@serhiy-storchaka serhiy-storchaka commented Jul 27, 2025

Copy link
Copy Markdown
Member

On modern systems, the result of wcsxfrm() is much larger the size of the input string (from 4+2n on Windows to 4+5n on Linux for simple ASCII strings), so optimistic allocation of the buffer of the same size never works.

On modern systems, the result of wcsxfrm() is much larger the size of
the input string (from 4+2*n on Windows to 4+5*n on Linux for simple
ASCII strings), so optimistic allocation of the buffer of the same size
never works.
Comment thread Modules/_localemodule.c Outdated
@@ -409,33 +409,23 @@ _locale_strxfrm_impl(PyObject *module, PyObject *str)
}

/* assume no change in size, first */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be updated to match changed code.

@picnixz picnixz changed the title Remove optimistic allocation in locale.strxfrm() gh-130567: Remove optimistic allocation in locale.strxfrm() Jul 27, 2025
@serhiy-storchaka

Copy link
Copy Markdown
Member Author

If this is a bug fix, it needs a NEWS entry. If the bug will be fixed in other way -- it is just cleanup and minor optimization not worth a NEWS entry.

@encukou encukou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not fix the bug; macOS raises EINVAL in wcsxfrm(NULL, s, 0) on the Czech and Chinese strings.

So, it's just cleanup and minor optimization.

@serhiy-storchaka

Copy link
Copy Markdown
Member Author

Actually, optimistic allocation works if the locale was not set or set to "C".

>>> import locale
>>> locale.strxfrm('abc')
'abc'
>>> locale.setlocale(locale.LC_ALL, 'C')
'C'
>>> locale.strxfrm('abc')
'abc'
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
'en_US.UTF-8'
>>> locale.strxfrm('abc')
'šŮŹ\x01\x1d\x1d\x1d\x01\x02\x02\x02\x01\x01悝\x01惹\x01愝'

But why would you use locale.strxfrm() in the C locale? And since the call of wcsxfrm() is cheap in the C locale, I believe that the loss in the worst case is less that the gain in average.

@serhiy-storchaka

Copy link
Copy Markdown
Member Author

This PR should fix a crash discussed in #130567 (comment). So this is a bug fix. If we are not going to backport it, we need another PR to fix it.

@encukou

encukou commented Sep 10, 2025

Copy link
Copy Markdown
Member

Let's backport it [edit: to 3.14.1], even if can't reproduce the corruption on my system.
“Fix possible crash in strxfrm” should be a good blurb?

@serhiy-storchaka

Copy link
Copy Markdown
Member Author

Created a simpler PR #138940 for the fix.

@encukou

encukou commented Oct 15, 2025

Copy link
Copy Markdown
Member

Do you want to update this one?

@encukou encukou merged commit 2a2bc82 into python:main Oct 16, 2025
43 checks passed
StanFromIreland pushed a commit to StanFromIreland/cpython that referenced this pull request Dec 6, 2025
…thonGH-137143)

On modern systems, the result of wcsxfrm() is much larger the size of
the input string (from 4+2*n on Windows to 4+5*n on Linux for simple
ASCII strings), so optimistic allocation of the buffer of the same size
never works.

The exception is if the locale is "C" (or unset), but in that case the `wcsxfrm`
call should be fast (and calling `locale.strxfrm()` doesn't make too much
sense in the first place).
@serhiy-storchaka serhiy-storchaka deleted the locale-wcsxfrm branch July 1, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants