Skip to content

Avoid overflow in statitics.mean#7426

Merged
jrbourbeau merged 1 commit into
dask:mainfrom
mrocklin:dashboard-mean
Dec 20, 2022
Merged

Avoid overflow in statitics.mean#7426
jrbourbeau merged 1 commit into
dask:mainfrom
mrocklin:dashboard-mean

Conversation

@mrocklin

Copy link
Copy Markdown
Member

I don't know why, but for some reason statistics.mean was overflowing in CI. See https://github.com/dask/distributed/actions/runs/3741526593/jobs/6351258185

I'm trying a naive implementation instead. It also happens to be faster and simpler.

In [1]: from statistics import mean

In [2]: x = list(range(1000))

In [3]: %timeit mean(x)
196 µs ± 777 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [4]: %timeit sum(x) / len(x)
4.82 µs ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

I don't know why, but for some reason statistics.mean was overflowing in
CI.  See https://github.com/dask/distributed/actions/runs/3741526593/jobs/6351258185

I'm trying a naive implementation instead.  It also happens to be faster
and simpler.

```python
In [1]: from statistics import mean

In [2]: x = list(range(1000))

In [3]: %timeit mean(x)
196 µs ± 777 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [4]: %timeit sum(x) / len(x)
4.82 µs ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```
@github-actions

Copy link
Copy Markdown
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       20 files  +         8         20 suites  +8   9h 0m 52s ⏱️ + 4h 41m 15s
  3 271 tests ±         0    3 184 ✔️ +         9       85 💤  -   10  2 +1 
33 337 runs  +13 057  31 907 ✔️ +12 397  1 426 💤 +657  4 +3 

For more details on these failures, see this check.

Results for commit 4e90750. ± Comparison against base commit 3ac8631.

@mrocklin

Copy link
Copy Markdown
Member Author

This is trivial enough and coming up enough in current PRs' CI that I plan to merge tomorrow US-time if there are no comments.

@jrbourbeau jrbourbeau changed the title Avoid overflow in statitics.mean Avoid overflow in statitics.mean Dec 20, 2022
@jrbourbeau jrbourbeau merged commit c21e715 into dask:main Dec 20, 2022
@mrocklin mrocklin deleted the dashboard-mean branch December 20, 2022 20:01
@mrocklin

Copy link
Copy Markdown
Member Author

Woot. Thanks

@fjetter

fjetter commented Jan 3, 2023

Copy link
Copy Markdown
Member

This is still an issue but now in the new code...

  File "d:\a\distributed\distributed\distributed\dashboard\components\shared.py", line 569, in update
    self.label_source.data["memory"] = [
  File "d:\a\distributed\distributed\distributed\dashboard\components\shared.py", line 571, in <listcomp>
    f.__name__, dask.utils.format_bytes(f(self.source.data["memory"]))
  File "d:\a\distributed\distributed\distributed\dashboard\components\shared.py", line 562, in mean
    return sum(x) / len(x)

Looks like x is sometimes a numpy array and I assume we're overflowing int64??? (or we're using a different dtype somewhere, int64 overflow sounds crazy even if not impossible)

image

(the above screenshot is not reproducing, just a snapshot showing the data, I believe this is RSS memory, and the dtype)

@fjetter

fjetter commented Jan 3, 2023

Copy link
Copy Markdown
Member

At the very least, this is giving exactly the same warning

In [1]: import numpy as np

In [2]: sum(np.array([2**63-1, 1], dtype=np.int64))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants