[RFC] Do not set parallelism controlling env vars#8297
Conversation
In #5098 we set a malloc trim threshold by default to more aggressively control memory trimming. also related #7177 At the same time, we included these default settings but didn't have incredibly solid arguments for it. It's been a long standing best practice when using dask to disable this nested parallelism. We haven't received a lot of user feedback about this. However, we had some internal reports of users who were struggling with this because this was quite unexpected behavior for them and non-trivial to debug for the ordinary end user. In apache/arrow#38389 (comment) this also suggests to negatively impact read performance of parquet tables. We should consider removing this again
|
I'm not yet convinced that this is a good idea overall. I think that, even in the arrow case, we could use more testing. When these are set too high (nested parallelism) the result is often quite bad. When these are set too low, the result is often only kinda-bad. Also, I personally have seen more people negatively effected by over-saturation than otherwise. |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files + 2 27 suites +2 15h 29m 39s ⏱️ + 45m 10s For more details on these failures and errors, see this check. Results for commit f1ae053. ± Comparison against base commit b4eee3f. ♻️ This comment has been updated with latest results. |
|
@fjetter we could also consider setting these to two rather than one, as a more middle-of-the-road choice. |
absolutely. So far, I tested this only on a very small scale. I do not intend to merge this blindly without more through testing. I'll revert this PR to draft state. |
In #5098 we set a malloc trim threshold by default to more aggressively control memory trimming. also related #7177
At the same time, we included these default settings but didn't have incredibly solid arguments for it. It's been a long standing best practice when using dask to disable this nested parallelism.
We haven't received a lot of user feedback about this. However, we had some internal reports of users who were struggling with this because this was quite unexpected behavior for them and non-trivial to debug for the ordinary end user.
In apache/arrow#38389 (comment) this also suggests to negatively impact read performance of parquet tables.
We should consider removing this again
Closes #xxxx
pre-commit run --all-files