You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recommend using [mamba](https://mamba.readthedocs.io/en/latest/)(much faster than conda) to install the dependencies. First install `mamba` following the intructions in the [mamba repository](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html).
29
+
We recommend using [mamba](https://mamba.readthedocs.io/en/latest/)rather than conda to install the dependencies to increase installation speed. First install `mamba` following the intructions in the [mamba repository](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html).
30
30
31
31
32
32
Install dependencies via `mamba`:
@@ -41,18 +41,36 @@ Then install `mofdiff` as a package:
41
41
pip install -e .
42
42
```
43
43
44
-
We use [MOFid](https://github.com/snurr-group/mofid) for preprocessing and analysis. Install MOFid following the instruction in the [MOFid repository](https://github.com/snurr-group/mofid/blob/master/compiling.md). The generative modeling part of this codebase does not depend on MOFid.
44
+
We use [MOFid](https://github.com/snurr-group/mofid) for preprocessing and analysis. To perform these steps, install MOFid following the instruction in the [MOFid repository](https://github.com/snurr-group/mofid/blob/master/compiling.md). The generative modeling and MOF simulation portions of this codebase do not depend on MOFid.
45
45
46
-
## Download data
46
+
Configure the `.env` file to set correct paths to various directories, dependent on the desired functionality. An [example](./.env)`.env` file is provided in the repository.
47
47
48
-
You can download the preprocessed data from [Zenodo](https://zenodo.org/uploads/10467288) (recommended).
48
+
For model training, please set the learning-related paths.
49
+
- PROJECT_ROOT: the parent MOFDiff directory
50
+
- DATASET_DIR: the directory containing the .lmdb file produced by processing the data
51
+
- LOG_DIR: the directory to which logs will by written
52
+
- HYDRA_JOBS: the directory to which Hydra output will be written
53
+
- WANDB_DIR: the directory to which WandB output will be written
49
54
50
-
Alternatively, you can download the `BW-DB` raw data from [Materials Cloud](https://archive.materialscloud.org/record/2018.0016/v3) and preprocess the data with the following command (assuming the data is downloaded to `${raw_path}`, this step requires MOFid):
55
+
For MOF relaxation and structureal property calculations, please additionally set the Zeo++ path.
56
+
- ZEO_PATH: path to the Zeo++ "network" binary
57
+
58
+
For GCMC simulations, please additionally set the GCMC-related paths.
59
+
- RASPA_PATH: the RASPA2 parent directory
60
+
- RASPA_SIM_PATH: path to the RASPA2 "simulate" binary
61
+
- EGULP_PATH: path to the eGULP "egulp" binary
62
+
- EGULP_PARAMETER_PATH: the directory containing the eGULP "MEPO.param" file
63
+
64
+
## Process data
65
+
66
+
You can download the preprocessed `BW-DB` data from [Zenodo](https://zenodo.org/uploads/10467288) (recommended).
67
+
68
+
Alternatively, you can download the `BW-DB` raw data from [Materials Cloud](https://archive.materialscloud.org/record/2018.0016/v3) to `${raw_path}` and preprocess with the following command. This step requires MOFid.
@@ -64,8 +82,6 @@ The entire preprocessing process for `BW-DB` may take several days (depending on
64
82
65
83
## Training
66
84
67
-
First, configure the `.env` file to set correct paths to various directories. An [example](./.env)`.env` file is provided in the repository.
68
-
69
85
### training the building block encoder
70
86
71
87
Before training the diffusion model, we need to train the building block encoder. The building block encoder is a graph neural network that encodes the building blocks of MOFs. The building block encoder is trained with the following command:
@@ -80,11 +96,11 @@ The default output directory is `${oc.env:HYDRA_JOBS}/bb/${expname}/`. `oc.env:H
Logging is done with [wandb](https://wandb.ai/site) by default. You need to login to wandb with `wandb login` before training. The training logs will be saved to the wandb project `mofdiff`. You can also override the wandb project with command line arguments. You can also disable wandb logging by removing the `wandb` entry in the [config](./conf/logging/default.yaml).
99
+
Logging is done with [wandb](https://wandb.ai/site) by default. You need to login to wandb with `wandb login` before training. The training logs will be saved to the wandb project `mofdiff`. You can also override the wandb project with command line arguments. You can also disable wandb logging by removing the `wandb` entry in the [config](./conf/logging/default.yaml) as demonstrated [here](./conf/logging/no_wandb_logging.yaml).
84
100
85
101
### training coarse-grained diffusion model for MOFs
86
102
87
-
The output directory where the building block encoder is saved: `bb_encoder_path` is needed for training the diffusion model. With the building block encoder trained to convergence, train the CG diffusion model with the following command:
103
+
The output directory where the building block encoder is saved: `bb_encoder_path` is needed for training the diffusion model. By default, this path is `${oc.env:HYDRA_JOBS}/bb/${expname}/`, as defined [above](#training-the-building-block-encoder). Train/validation splits are defined in [splits](./splits), with examples provided for BW-DB. With the building block encoder trained to convergence, train the CG diffusion model with the following command:
@@ -96,16 +112,16 @@ For BW-DB, training the building block encoder takes roughly 3 days and training
96
112
97
113
Pretrained models can be found [here](https://zenodo.org/record/10467288).
98
114
99
-
With a trained CG diffusion model `${diffusion_model_path}`, generate random CG MOF structures with the following command:
115
+
With a trained CG diffusion model `${diffusion_model_path}`, generate random CG MOF structures with the following command, where `${bb_cache_path}` is the path to the trained building encoder, as described [above](#training-the-building-block-encoder).
`${bb_cache_path}` is the path to the building block embedding space, saved at the beginning of CG diffusion model training. To optimize MOF structures for a property (e.g., CO2 adsorption working capacity), use the following command:
121
+
To optimize MOF structures for a property defined in BW-DB (e.g., CO2 adsorption working capacity) use the following command:
Available arguments for `sample.py` and `optimize.py` can be found in the respective files. The generated CG MOF structures will be saved in `${sample_path}=${diffusion_model_path}/${sample_tag}` as `samples.pt`.
@@ -114,7 +130,7 @@ The CG structures generated with the diffusion model are not guaranteed to be re
114
130
115
131
## Assemble all-atom MOFs
116
132
117
-
Assembled the CG MOF structures with the following command:
133
+
Assemble all-atom MOF structures from the CG MOF structures with the following command:
@@ -124,7 +140,7 @@ This command will assemble the CG MOF structures in `${sample_path}` and save th
124
140
125
141
## Relax MOFs and compute structural properties
126
142
127
-
The assembled structures may not be physically plausible. These MOF structures are relaxed uses the UFF force field with LAMMPS. LAMMPS is already installed if you have followed the installation instructions in this README. The script for relaxing the MOF structures also compute structural properties (e.g., pore volume, surface area, etc.) with [Zeo++](https://www.zeoplusplus.org/download.html) and the mofids of the generated MOFs with [MOFid](https://github.com/snurr-group/mofid/tree/master). The respective packages should be installed following the instructions in the respective repositories, and the corresponding paths should be added to `.env`before running the following command. Each step should take no more than a few minutes to complete on a single CPU. We use multiprocessing to parallelize the computation.
143
+
The assembled structures may not be physically plausible. These MOF structures are relaxed using the UFF force field with LAMMPS. LAMMPS has already been installed as part of the environment if you have followed the installation instructions in this README. The script for relaxing the MOF structures also compute structural properties (e.g., pore volume, surface area, etc.) with [Zeo++](https://www.zeoplusplus.org/download.html) and the mofids of the generated MOFs with [MOFid](https://github.com/snurr-group/mofid/tree/master). The respective packages should be installed following the instructions in the respective repositories, and the corresponding paths should be added to `.env`as outlined [above](#installation). Each step should take no more than a few minutes to complete on a single CPU. We use multiprocessing to parallelize the computation.
128
144
129
145
Relax MOFs and compute structural properties with the following command:
130
146
@@ -137,7 +153,9 @@ This command will relax the assembled MOFs in `${sample_path}/cif` and save the
137
153
138
154
## GCMC simulation for gas adsorption
139
155
140
-
To run GCMC simulations, first install RASPA2 (simulation software) and eGULP (charge calculation software).
156
+
### additional installation
157
+
158
+
To run GCMC simulations, first install RASPA2 (simulation software) and eGULP (charge calculation software). The paths to both should additionally be added to `.env` as outlined [above](#installation).
Then, decompress the [force field parameters](./mofdiff/gcmc/UFF-TraPPe-scaled.tar)to the RASPA directory using the following commands (assuming RASPA2 installed in `RASPA_PATH=PYTHONPATH/site-packages/RASPA2` with `pip`):
180
+
Finally, RASPA2 requires a set of forcefield parameters with which to run the simulations. To use our default simulation settings, copy the UFF parameter set from [ForceFields](https://github.com/lipelopesoliveira/ForceFields/tree/main) into the RASPA2 forcefield definition directory, typically located at `$RASPA_PATH/share/raspa/forcefield`.
163
181
164
-
```
165
-
tar -xf UFF-TraPPe-scaled.tar -C RASPA_PATH/share/raspa/forcefield/UFF-TraPPe
166
-
```
182
+
### running simulations
167
183
168
184
Calculate charges for relaxed samples in `${sample_path}` with the following command:
169
185
@@ -176,7 +192,6 @@ This command will output cif files with charge information under `${sample_path}
0 commit comments