Hi team,
I'm having an issue launching the pretraining job with tensorflow 2.15 or above. Tensorflow 2.15 immediately segdumps. With the latest tensorflow 2.16.1 I see there is an unbound or near-100% video memory growth of one of the data loading process, leading to CUDA OOM and cascading failures. One quick workaround is to locally install lower tensorflow versions e.g.
pip install tensorflow==2.13.1 tensorflow-text==2.13.0
Also works:
pip install tensorflow==2.14.1 tensorflow-text==2.14.0
Hi team,
I'm having an issue launching the pretraining job with tensorflow 2.15 or above. Tensorflow 2.15 immediately segdumps. With the latest tensorflow 2.16.1 I see there is an unbound or near-100% video memory growth of one of the data loading process, leading to CUDA OOM and cascading failures. One quick workaround is to locally install lower tensorflow versions e.g.
pip install tensorflow==2.13.1 tensorflow-text==2.13.0Also works:
pip install tensorflow==2.14.1 tensorflow-text==2.14.0