perf: improve PHP thread management#898
Conversation
|
I will give them a go. I haven't looked at the PR here, but from my experiments, most of the time was spent initializing PHP. I was looking into making "workers" that implement the most basic behavior (totally reset state after the request since nothing needs to be saved)—much like how FPM works. I have a rough PoC that doesn't quite work yet, but I don't feel like it is too far from working. |
|
@withinboredom if we speak about the same thing, it's what this patch does! |
83914ed to
c5676af
Compare
|
Just got the latest of this branch: workers: 22,660 rps |
|
This is basically what I am running: where |
|
Do you have the numbers without this patch? |
|
from workers: 22,535 rps |
|
IIRC, the main issue is that in non-worker mode, it spawns a totally new thread, inits PHP, inits the request, executes the request, shutsdown PHP full, then kills the thread. What we want: new request inits the request in PHP, executes the request, then just shutdown the request and fully reset state. |
|
I may be missing something, but it's already what FrankenPHP is doing (even without this patch). |
|
It's what it was originally doing, for sure, but it looks like an accidental change. |
|
I'm not sure to follow @withinboredom 😅 |
|
Ah, nope, that wasn't actually the problem! I remember that being something I originally thought was the issue... I just checked out my Example output: # we should expect to see interleaving here, but it is strictly sequential
2024/07/05 06:59:25.150 WARN Executing request {"total": 32, "elapsed": 0.000000697}
2024/07/05 06:59:25.150 WARN finished execution {"total": 31, "elapsed": 0.015291472}
2024/07/05 06:59:25.150 WARN Executing request {"total": 32, "elapsed": 0.000000786}
2024/07/05 06:59:25.151 WARN finished execution {"total": 31, "elapsed": 0.015279653}
2024/07/05 06:59:25.151 WARN Executing request {"total": 32, "elapsed": 0.000000712}
2024/07/05 06:59:25.151 WARN finished execution {"total": 31, "elapsed": 0.015311099}Essentially, in worker mode, request handling is multi-threaded. Each worker processes a request independently, so if you have 16 workers, there are 16 threads processing requests. With cgi-mode, there is only a single-threaded consumer of the request channel, which creates a bottleneck, no matter what you set In other words, IIRC, there is only a single manager thread calling |
|
But this patch should change exactly that! It's not the case? I struggle to show CPU usage per thread on Mac (I'm not even sure that it's possible). I'll use a Linux box to see 😅 |
|
Added to my original comment above, which might help explain more about what I meant. |
|
@withinboredom In this patch, I moved the call to |
c5676af to
a2908e9
Compare
|
Looks like it may do the trick... And it appears to. I threw it up on a wordpress site: nginx + fpm (badly tuned fwiw): 860 rps So, it seems to be g2g |
|
Thanks for the report! Looks great. Remaining issue: find why the tests are failing (they aren't locally)! |
|
Test failures look to affect only workers. We likely have a race condition somewhere. |
|
I can't get it to fail locally on AMD or INTEL x64. If I had to hazard a guess, there is likely a race condition with sending the worker script file to the C side (i.e., it never starts the workers). An easy way to test that would be to add some output to the worker script so you can see that it started up. |
1b1e096 to
284ee23
Compare
|
I finally found the problem: It's currently not a problem (even if a bad practice) because we used |
Alternative to #837.
The main idea is to start a fixed number of threads and handle requests in these threads using an infinite loop.
This fixes the backpressure problem identified in #836, simplifies the code, and removes the need for the
C-Thread-Poollibrary.Fixes #836. Closes #837.
@withinboredom could you run your benchmarks on this PR? I'm curious about the results.