-
Notifications
You must be signed in to change notification settings - Fork 128
Description
Expected behavior
Conda environments needed for running the SRW should build properly when using devbuild.sh to install conda.
Current behavior
I've found a bug in the conda installation process that causes conda environments to not be correctly installed when the base path to the SRW App is too long, i.e. has too many characters.
This bug is likely present in any app that uses a similar conda installation procedure. I originally found it in mpas_app and was able to verify that it's also in the SRW App.
The bug is that when the number of characters in the base path (e.g. /path/to/ufs-srweather-app) is greater than 108 characters, the conda environments needed for running the SRW will not be installed because various scripts under the ufs-srweather-app/conda directory will be pointing to the system installation of python instead of the one in the ufs-srweather-app/conda directory.
I verified this by trying to run one of the WE2E tests (grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR) in each of the following two clones of the SRW:
/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_123/ufs-srweather-app/modulefiles
/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/modulefiles
The test succeeded for the first clone because the path is 108 characters long, but the one for the second one failed because the path is 109 characters long.
Currently, if the path is too long and this error occurs, the following error messages can be seen in the output of devbuild.sh:
...
Linking conda-libmamba-solver-23.3.0-pyhd8ed1ab_0
Linking mamba-1.4.2-py310h51d5547_0
Transaction finished
installation finished.
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/bin/conda", line 12, in <module>
from conda.cli import main
ModuleNotFoundError: No module named 'conda'
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/bin/conda", line 12, in <module>
from conda.cli import main
ModuleNotFoundError: No module named 'conda'
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/condabin/mamba", line 7, in <module>
from mamba.mamba import main
ModuleNotFoundError: No module named 'mamba'
real 0m0.029s
user 0m0.019s
sys 0m0.008s
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/bin/conda", line 12, in <module>
from conda.cli import main
ModuleNotFoundError: No module named 'conda'
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/condabin/mamba", line 7, in <module>
from mamba.mamba import main
ModuleNotFoundError: No module named 'mamba'
real 0m0.028s
user 0m0.018s
sys 0m0.008s
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/bin/conda", line 12, in <module>
from conda.cli import main
ModuleNotFoundError: No module named 'conda'
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/condabin/mamba", line 7, in <module>
from mamba.mamba import main
ModuleNotFoundError: No module named 'mamba'
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/bin/conda", line 12, in <module>
from conda.cli import main
ModuleNotFoundError: No module named 'conda'
Traceback (most recent call last):
File "/scratch4/BMC/fv3lam/Gerard.Ketefian/DTC_MPAS_stoch/tmp2/123456789_123456789_123456789_1234/ufs-srweather-app/conda/condabin/mamba", line 7, in <module>
from mamba.mamba import main
ModuleNotFoundError: No module named 'mamba'
COMPILER=intel
MODULE_FILE=build_hera_intel
... Load MODULE_FILE and create BUILD directory ...
Currently Loaded Modules:
...
However, the build process does not stop, and various scripts under ufs-srweather-app/conda end up pointing to a system installation of python instead of the one in the ufs-srweather-app/conda directory. As a result, when one attempts to run the workflow, the necessary python environments are not found, and one sees the following message:
$ conda activate srw_app
EnvironmentNameNotFound: Could not find conda environment: srw_app
You can list all discoverable environments with `conda info --envs`.
It takes a lot of digging to backtrace the problem. Thus, there should be at least a warning/error-out mechanism in devbuild.sh so that the user is aware that something went wrong or that prevents users from cloning in directories that are too long. Of course, the preferred solution would be to not have a limitation on the path length at all.
Machines affected
So far, I've only seen this bug on Hera; I haven't tried on other machines.
Steps To Reproduce
Create a directory that has more than 108 characters (maybe minus the number of characters in the string "/ufs-srweather-app"), build the app, then try running any of the WE2E tests. This will generate an error like this:
$ conda activate srw_app
EnvironmentNameNotFound: Could not find conda environment: srw_app
You can list all discoverable environments with `conda info --envs`.
Detailed Description of Fix (optional)
Not sure, but at least there should be an error-out mechanism in devbuild.sh in case the path length is too long.
Additional Information (optional)
See description above.
Output (optional)
See description above.