Hi,
I'm trying to compute some function over a wide range of parameters using LSFCluster.
The general outline is as follows:
def f(x,y):
...
futures = [f(x,y) for x,y in list(itertools.product(range(X),range(Y)))]
x = progress(client.compute(futures))
x
When I try to compute with X=Y=20, everything goes smoothly.
However, when I increase the range of parameters over which I'm computing f(x,y) (for example = X=Y=100), I get an error message I don't understand:
distributed.scheduler - ERROR - '856313'
Traceback (most recent call last):
File "/home/adamh/miniconda3/lib/python3.5/site-packages/distributed/scheduler.py", line 1267, in add_worker
plugin.add_worker(scheduler=self, worker=address)
File "/home/adamh/miniconda3/lib/python3.5/site-packages/dask_jobqueue/core.py", line 61, in add_worker
self.running_jobs[job_id] = self.pending_jobs.pop(job_id)
KeyError: '856313'
Just to be sure, I ran bjobs -r and indeed I have a job running with job id 856313. I get similar error message for many other different workers.
Some more info which might be relevant:
- When I change
f to be some really simple function (f(x,y)=x+y), the problem disappears.
- During runtime,
f writes temporary files to a tmp directory - each f(x,y) creates its own temp directory - perhaps this involves too much disk operations?
- When I run everything locally, it takes forever (hence dask :-)) but doesn't crash and doesn't fill up the memory.
Any help would be much appreciated!
Hi,
I'm trying to compute some function over a wide range of parameters using
LSFCluster.The general outline is as follows:
When I try to compute with X=Y=20, everything goes smoothly.
However, when I increase the range of parameters over which I'm computing
f(x,y)(for example = X=Y=100), I get an error message I don't understand:Just to be sure, I ran
bjobs -rand indeed I have a job running with job id 856313. I get similar error message for many other different workers.Some more info which might be relevant:
fto be some really simple function (f(x,y)=x+y), the problem disappears.fwrites temporary files to a tmp directory - eachf(x,y)creates its own temp directory - perhaps this involves too much disk operations?Any help would be much appreciated!