gh-150902: Optimize PyCriticalSection2 to skip locking the same locks as the ones held by the current CS2#151085
Conversation
…me locks as the ones held by the current CS2
colesbury
left a comment
There was a problem hiding this comment.
Looks good, a few comments from Claude:
1. Stale comment in _PyCriticalSection2_End (pycore_critical_section.h:199-202). This comment is now inaccurate and should be updated as part of this PR:
// if mutex1 is NULL, we used the fast path in
// _PyCriticalSection_BeginSlow for mutexes that are already held,
// which should only happen when mutex1 and mutex2 were the same mutex,
After this change, _cs_mutex == NULL also occurs when m1 != m2 but both were already held by the current CS2. The "should only happen when mutex1 and mutex2 were the same mutex" claim
no longer holds.
2. Pointer comparison in the asserts. assert(m1 < m2) and assert(prev2->_cs_base._cs_mutex < prev2->_cs_mutex2) compare PyMutex * directly. Relational comparison of unrelated pointers
is technically UB, and the header deliberately casts to (uintptr_t) when sorting (line 163). For consistency, cast here too:
assert((uintptr_t)m1 < (uintptr_t)m2);
4. Minor: ordering differs from the single-mutex version. Here the world-stopped check comes before the recursive-skip; in _PyCriticalSection_BeginSlow it's the reverse. Both produce
identical results (NULL mutexes, no push), so this is harmless — just a slight inconsistency.
|
I'll address them, thanks Claude |
|
... and it still cannot count 😁 |
I deleted 3. because I didn't think it was relevant |
|
Thanks @dpdani for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14, 3.15. |
|
Ah ok, sorry Claude, was too quick to judge |
|
@colesbury: You requested 3.14 and 3.15 backports but you didn't merge them yet. You can enable auto-merge.
I hesitated to merge them. Usually I don't backport optimizations. Well, it's up to you ;-) |
|
This particular optimization being missing was causing unintended performance regressions for the FT build compared to the default build in 3.14 and 3.15 (also see attached issue), I think this is why Sam decided to open backports. |
|
Thanks for the reminder. Regarding backporting: As Daniele, wrote, I think we introduced a performance regression in the copy code and the easiest and most reliable fix is this general "optimization" rather than trying to change how critical sections are used in the copy module. |
This mimics an optimization already present for the single-mutex critical section.
PyCriticalSection2to skip locking when the requested locks were acquired in the current CS #150902