Skip to content

[RFC] Do not set parallelism controlling env vars#8297

Closed
fjetter wants to merge 1 commit into
mainfrom
fjetter-patch-1
Closed

[RFC] Do not set parallelism controlling env vars#8297
fjetter wants to merge 1 commit into
mainfrom
fjetter-patch-1

Conversation

@fjetter

@fjetter fjetter commented Oct 23, 2023

Copy link
Copy Markdown
Member

In #5098 we set a malloc trim threshold by default to more aggressively control memory trimming. also related #7177

At the same time, we included these default settings but didn't have incredibly solid arguments for it. It's been a long standing best practice when using dask to disable this nested parallelism.

We haven't received a lot of user feedback about this. However, we had some internal reports of users who were struggling with this because this was quite unexpected behavior for them and non-trivial to debug for the ordinary end user.

In apache/arrow#38389 (comment) this also suggests to negatively impact read performance of parquet tables.

We should consider removing this again

Closes #xxxx

  • Tests added / passed
  • Passes pre-commit run --all-files

In #5098 we set a malloc trim threshold by default to more aggressively control memory trimming.
also related #7177

At the same time, we included these default settings but didn't have incredibly solid arguments for it. It's been a long standing best practice when using dask to disable this nested parallelism.

We haven't received a lot of user feedback about this. However, we had some internal reports of users who were struggling with this because this was quite unexpected behavior for them and non-trivial to debug for the ordinary end user.

In apache/arrow#38389 (comment) this also suggests to negatively impact read performance of parquet tables.

We should consider removing this again
@mrocklin

Copy link
Copy Markdown
Member

I'm not yet convinced that this is a good idea overall. I think that, even in the arrow case, we could use more testing.

When these are set too high (nested parallelism) the result is often quite bad. When these are set too low, the result is often only kinda-bad. Also, I personally have seen more people negatively effected by over-saturation than otherwise.

@github-actions

github-actions Bot commented Oct 23, 2023

Copy link
Copy Markdown
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  +       2         27 suites  +2   15h 29m 39s ⏱️ + 45m 10s
  3 932 tests ±       0    3 808 ✔️  -        1     115 💤 ±    0    8 ±0  1 🔥 +1 
49 388 runs  +4 998  47 016 ✔️ +4 862  2 340 💤 +135  31 ±0  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit f1ae053. ± Comparison against base commit b4eee3f.

♻️ This comment has been updated with latest results.

@mrocklin

Copy link
Copy Markdown
Member

@fjetter we could also consider setting these to two rather than one, as a more middle-of-the-road choice.

@fjetter

fjetter commented Oct 23, 2023

Copy link
Copy Markdown
Member Author

I think that, even in the arrow case, we could use more testing.

absolutely. So far, I tested this only on a very small scale. I do not intend to merge this blindly without more through testing. I'll revert this PR to draft state.

@fjetter fjetter marked this pull request as draft October 23, 2023 12:33
@fjetter fjetter closed this Jan 8, 2025
@fjetter fjetter deleted the fjetter-patch-1 branch January 8, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants