gh-111089: Add PyUnicode_AsUTF8NoNUL() function#111688
gh-111089: Add PyUnicode_AsUTF8NoNUL() function#111688vstinner wants to merge 3 commits intopython:mainfrom
Conversation
|
I'm not sure about For example, if the string is used to open a file, the function doesn't reject Maybe the name should be more explicitly about its purpose, such as: |
|
the clinic.py changes look fine to me; I have no opinion on the C changes :) |
|
Clinic changes are ok with me. I'll leave the C API discussion to the C API WG :) |
|
If this change is merged, it would be interesting to go through PyUnicode_AsUTF8() usage in the Python code base and check if switching PyUnicode_AsUTF8Safe() would be worth it. See #111656 (comment) list for example. |
The working group doesn't exist yet, it's still a draft PEP: https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117 |
|
@encukou @gpshead @Yhg1s @serhiy-storchaka: Would you mind to review this change? |
|
Are there strong objections against simply making |
Tools/clinic/clinic.py
Outdated
| goto exit; | ||
| }}}} | ||
| {paramname} = PyUnicode_AsUTF8({argname}); | ||
| {paramname} = PyUnicode_AsUTF8Safe({argname}); |
There was a problem hiding this comment.
There should be different code for limited_capi is true.
There was a problem hiding this comment.
Oh. I wanted to add PyUnicode_AsUTF8Safe() to the limited C API in a separated PR. Maybe it's more convenient to do it in the PR, since there is a lot of code generated by Argument Clinic impacted by these changes.
I suppose that it's too late to change it, same rationale than changing PyUnicode_AsUTF8(). Moreover, |
613243e to
96b2b8b
Compare
|
I rebased the PR on the main branch to get the test_asyncio fix. I also squashed commits. |
|
Wait until the WG can discuss this. I agree that I wouldn't mind naming it |
|
If "Safe" is too generic, what about "AsUTF8NoNul"? I propose "Nul" instead of "Null" which looks like "NULL pointer". Or maybe "AsUTF8NoNullChars"? "Z" looks too short to me, it's not easy to guess its intent. |
Alternative short name: |
Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8Safe() function should be used instead.
96b2b8b to
193c18b
Compare
|
@serhiy-storchaka @erlend-aasland: I renamed the function to PyUnicode_AsUTF8NoNUL(). Are you fine with this name? |
I didn't follow the discussion. My first impression is that I don't understand what that API is supposed to do, based on the name only. |
|
I am fine with added the embedded null character check in If people not fine with this, ask them with what are they fine. |
|
I don't know why we need a new public function. For a new function, if you feel like it needs a strlen check, that's fine. I don't think it's a sensible way to deal with C strings (everyone who works with C strings should know about the importance of NUL), but for new APIs I don't really care either way. However:
|
|
I close my issue: #111089 (comment) |
Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8NoNUL() function should be used instead.
📚 Documentation preview 📚: https://cpython-previews--111688.org.readthedocs.build/