Skip to content

Commit e510105

Browse files
author
Jake Smith
committed
cleaning up README
1 parent ea49191 commit e510105

File tree

7 files changed

+76
-30
lines changed

7 files changed

+76
-30
lines changed

.env

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
### An example .env file. Please configure the paths for your local machine.
2+
## For model training, please set the learning-related paths.
3+
## For MOF relaxation and structureal property calculations, please additionally set the Zeo++ path.
4+
## For GCMC simulations, please set the MOF-related software paths.
5+
6+
# learning-related paths
7+
export PROJECT_ROOT="/home/example_user/MOFDiff"
8+
export DATASET_DIR="/data/mofdiff/mof_data/lmdbs"
9+
export LOG_DIR="/data/mofdiff/mof_models"
10+
export HYDRA_JOBS="/data/mofdiff/mof_models"
11+
export WANDB_DIR="/data/mofdiff/mof_models"
12+
13+
# Zeo++ path
14+
export ZEO_PATH="/usr/local/bin/zeo++-0.3/network"
15+
16+
# MOF-related software paths
17+
export RASPA_PATH="/anaconda/envs/mofdiff/lib/python3.8/site-packages/RASPA2"
18+
export RASPA_SIM_PATH="/anaconda/envs/mofdiff/bin/simulate"
19+
export EGULP_PATH="/usr/local/bin/egulp/src/egulp"
20+
export EGULP_PARAMETER_PATH="/usr/local/bin/egulp/data"

README.md

Lines changed: 42 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@ If you find this code useful, please consider referencing our paper:
1818
## Table of Contents
1919

2020
- [Installation](#installation)
21-
- [Dowlnload data](#download-data)
21+
- [Process data](#process-data)
2222
- [Training](#training)
23-
- [Generating MOF structures](#generating-mof-structures)
23+
- [Generating MOF structures](#generating-cg-mof-structures)
2424
- [Assemble all-atom MOFs](#assemble-all-atom-mofs)
25-
- [Relax MOFs](#relax-mofs)
25+
- [Relax MOFs](#relax-mofs-and-compute-structural-properties)
2626

2727
## Installation
2828

29-
We recommend using [mamba](https://mamba.readthedocs.io/en/latest/) (much faster than conda) to install the dependencies. First install `mamba` following the intructions in the [mamba repository](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html).
29+
We recommend using [mamba](https://mamba.readthedocs.io/en/latest/) rather than conda to install the dependencies to increase installation speed. First install `mamba` following the intructions in the [mamba repository](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html).
3030

3131

3232
Install dependencies via `mamba`:
@@ -41,18 +41,36 @@ Then install `mofdiff` as a package:
4141
pip install -e .
4242
```
4343

44-
We use [MOFid](https://github.com/snurr-group/mofid) for preprocessing and analysis. Install MOFid following the instruction in the [MOFid repository](https://github.com/snurr-group/mofid/blob/master/compiling.md). The generative modeling part of this codebase does not depend on MOFid.
44+
We use [MOFid](https://github.com/snurr-group/mofid) for preprocessing and analysis. To perform these steps, install MOFid following the instruction in the [MOFid repository](https://github.com/snurr-group/mofid/blob/master/compiling.md). The generative modeling and MOF simulation portions of this codebase do not depend on MOFid.
4545

46-
## Download data
46+
Configure the `.env` file to set correct paths to various directories, dependent on the desired functionality. An [example](./.env) `.env` file is provided in the repository.
4747

48-
You can download the preprocessed data from [Zenodo](https://zenodo.org/uploads/10467288) (recommended).
48+
For model training, please set the learning-related paths.
49+
- PROJECT_ROOT: the parent MOFDiff directory
50+
- DATASET_DIR: the directory containing the .lmdb file produced by processing the data
51+
- LOG_DIR: the directory to which logs will by written
52+
- HYDRA_JOBS: the directory to which Hydra output will be written
53+
- WANDB_DIR: the directory to which WandB output will be written
4954

50-
Alternatively, you can download the `BW-DB` raw data from [Materials Cloud](https://archive.materialscloud.org/record/2018.0016/v3) and preprocess the data with the following command (assuming the data is downloaded to `${raw_path}`, this step requires MOFid):
55+
For MOF relaxation and structureal property calculations, please additionally set the Zeo++ path.
56+
- ZEO_PATH: path to the Zeo++ "network" binary
57+
58+
For GCMC simulations, please additionally set the GCMC-related paths.
59+
- RASPA_PATH: the RASPA2 parent directory
60+
- RASPA_SIM_PATH: path to the RASPA2 "simulate" binary
61+
- EGULP_PATH: path to the eGULP "egulp" binary
62+
- EGULP_PARAMETER_PATH: the directory containing the eGULP "MEPO.param" file
63+
64+
## Process data
65+
66+
You can download the preprocessed `BW-DB` data from [Zenodo](https://zenodo.org/uploads/10467288) (recommended).
67+
68+
Alternatively, you can download the `BW-DB` raw data from [Materials Cloud](https://archive.materialscloud.org/record/2018.0016/v3) to `${raw_path}` and preprocess with the following command. This step requires MOFid.
5169

5270
```
53-
python preprocessing/extract_mofid.py --df_path ${raw_path}/all_MOFs_screening_data.csv --cif_path ${raw_path}/cifs --save_path ${raw_path}/mofid
54-
python preprocessing/preprocess.py --dataset_path
55-
python preprocessing/save_to_lmdb.py
71+
python mofdiff/preprocessing/extract_mofid.py --df_path ${raw_path}/all_MOFs_screening_data.csv --cif_path ${raw_path}/cifs --save_path ${raw_path}/mofid
72+
python mofdiff/preprocessing/preprocess.py --df_path ${raw_path}/all_MOFs_screening_data.csv --mofid_path ${raw_path}/mofid --save_path ${raw_path}/graphs
73+
python mofdiff/preprocessing/save_to_lmdb.py --graph_path ${raw_path}/graphs --save_path ${raw_path}/lmdbs
5674
```
5775

5876
The preprocessing inovlves 3 steps:
@@ -64,8 +82,6 @@ The entire preprocessing process for `BW-DB` may take several days (depending on
6482

6583
## Training
6684

67-
First, configure the `.env` file to set correct paths to various directories. An [example](./.env) `.env` file is provided in the repository.
68-
6985
### training the building block encoder
7086

7187
Before training the diffusion model, we need to train the building block encoder. The building block encoder is a graph neural network that encodes the building blocks of MOFs. The building block encoder is trained with the following command:
@@ -80,11 +96,11 @@ The default output directory is `${oc.env:HYDRA_JOBS}/bb/${expname}/`. `oc.env:H
8096
python mofdiff/scripts/train.py --config-name=bb expname=bwdb_bb_dim_64 model.latent_dim=64
8197
```
8298

83-
Logging is done with [wandb](https://wandb.ai/site) by default. You need to login to wandb with `wandb login` before training. The training logs will be saved to the wandb project `mofdiff`. You can also override the wandb project with command line arguments. You can also disable wandb logging by removing the `wandb` entry in the [config](./conf/logging/default.yaml).
99+
Logging is done with [wandb](https://wandb.ai/site) by default. You need to login to wandb with `wandb login` before training. The training logs will be saved to the wandb project `mofdiff`. You can also override the wandb project with command line arguments. You can also disable wandb logging by removing the `wandb` entry in the [config](./conf/logging/default.yaml) as demonstrated [here](./conf/logging/no_wandb_logging.yaml).
84100

85101
### training coarse-grained diffusion model for MOFs
86102

87-
The output directory where the building block encoder is saved: `bb_encoder_path` is needed for training the diffusion model. With the building block encoder trained to convergence, train the CG diffusion model with the following command:
103+
The output directory where the building block encoder is saved: `bb_encoder_path` is needed for training the diffusion model. By default, this path is `${oc.env:HYDRA_JOBS}/bb/${expname}/`, as defined [above](#training-the-building-block-encoder). Train/validation splits are defined in [splits](./splits), with examples provided for BW-DB. With the building block encoder trained to convergence, train the CG diffusion model with the following command:
88104

89105
```
90106
python mofdiff/scripts/train.py data.bb_encoder_path=${bb_encoder_path}
@@ -96,16 +112,16 @@ For BW-DB, training the building block encoder takes roughly 3 days and training
96112

97113
Pretrained models can be found [here](https://zenodo.org/record/10467288).
98114

99-
With a trained CG diffusion model `${diffusion_model_path}`, generate random CG MOF structures with the following command:
115+
With a trained CG diffusion model `${diffusion_model_path}`, generate random CG MOF structures with the following command, where `${bb_cache_path}` is the path to the trained building encoder, as described [above](#training-the-building-block-encoder).
100116

101117
```
102118
python mofdiff/scripts/sample.py --model_path ${diffusion_model_path} --bb_cache_path ${bb_cache_path}
103119
```
104120

105-
`${bb_cache_path}` is the path to the building block embedding space, saved at the beginning of CG diffusion model training. To optimize MOF structures for a property (e.g., CO2 adsorption working capacity), use the following command:
121+
To optimize MOF structures for a property defined in BW-DB (e.g., CO2 adsorption working capacity) use the following command:
106122

107123
```
108-
python mofdiff/scripts/optimize.py --model_path ${diffusion_model_path} --bb_cache_path ${bb_cache_path} --data_path ${data_path}
124+
python mofdiff/scripts/optimize.py --model_path ${diffusion_model_path} --bb_cache_path ${bb_cache_path} --data_path ${data_path} --property "working_capacity_vacuum_swing [mmol/g]" --target_v 15.0
109125
```
110126

111127
Available arguments for `sample.py` and `optimize.py` can be found in the respective files. The generated CG MOF structures will be saved in `${sample_path}=${diffusion_model_path}/${sample_tag}` as `samples.pt`.
@@ -114,7 +130,7 @@ The CG structures generated with the diffusion model are not guaranteed to be re
114130

115131
## Assemble all-atom MOFs
116132

117-
Assembled the CG MOF structures with the following command:
133+
Assemble all-atom MOF structures from the CG MOF structures with the following command:
118134

119135
```
120136
python mofdiff/scripts/assemble.py --input ${sample_path}/samples.pt
@@ -124,7 +140,7 @@ This command will assemble the CG MOF structures in `${sample_path}` and save th
124140

125141
## Relax MOFs and compute structural properties
126142

127-
The assembled structures may not be physically plausible. These MOF structures are relaxed uses the UFF force field with LAMMPS. LAMMPS is already installed if you have followed the installation instructions in this README. The script for relaxing the MOF structures also compute structural properties (e.g., pore volume, surface area, etc.) with [Zeo++](https://www.zeoplusplus.org/download.html) and the mofids of the generated MOFs with [MOFid](https://github.com/snurr-group/mofid/tree/master). The respective packages should be installed following the instructions in the respective repositories, and the corresponding paths should be added to `.env` before running the following command. Each step should take no more than a few minutes to complete on a single CPU. We use multiprocessing to parallelize the computation.
143+
The assembled structures may not be physically plausible. These MOF structures are relaxed using the UFF force field with LAMMPS. LAMMPS has already been installed as part of the environment if you have followed the installation instructions in this README. The script for relaxing the MOF structures also compute structural properties (e.g., pore volume, surface area, etc.) with [Zeo++](https://www.zeoplusplus.org/download.html) and the mofids of the generated MOFs with [MOFid](https://github.com/snurr-group/mofid/tree/master). The respective packages should be installed following the instructions in the respective repositories, and the corresponding paths should be added to `.env` as outlined [above](#installation). Each step should take no more than a few minutes to complete on a single CPU. We use multiprocessing to parallelize the computation.
128144

129145
Relax MOFs and compute structural properties with the following command:
130146

@@ -137,7 +153,9 @@ This command will relax the assembled MOFs in `${sample_path}/cif` and save the
137153

138154
## GCMC simulation for gas adsorption
139155

140-
To run GCMC simulations, first install RASPA2 (simulation software) and eGULP (charge calculation software).
156+
### additional installation
157+
158+
To run GCMC simulations, first install RASPA2 (simulation software) and eGULP (charge calculation software). The paths to both should additionally be added to `.env` as outlined [above](#installation).
141159

142160
RASPA2 can be installed with `pip`:
143161

@@ -159,11 +177,9 @@ mkdir /usr/local/bin/egulp && tar -xf egulp.tar -C /usr/local/bin/egulp
159177
cd /usr/local/bin/egulp/src && make && cd -
160178
```
161179

162-
Then, decompress the [force field parameters](./mofdiff/gcmc/UFF-TraPPe-scaled.tar) to the RASPA directory using the following commands (assuming RASPA2 installed in `RASPA_PATH=PYTHONPATH/site-packages/RASPA2` with `pip`):
180+
Finally, RASPA2 requires a set of forcefield parameters with which to run the simulations. To use our default simulation settings, copy the UFF parameter set from [ForceFields](https://github.com/lipelopesoliveira/ForceFields/tree/main) into the RASPA2 forcefield definition directory, typically located at `$RASPA_PATH/share/raspa/forcefield`.
163181

164-
```
165-
tar -xf UFF-TraPPe-scaled.tar -C RASPA_PATH/share/raspa/forcefield/UFF-TraPPe
166-
```
182+
### running simulations
167183

168184
Calculate charges for relaxed samples in `${sample_path}` with the following command:
169185

@@ -176,7 +192,6 @@ This command will output cif files with charge information under `${sample_path}
176192

177193
Run GCMC simulations with the following command:
178194

179-
180195
```
181196
python mofdiff/scripts/gcmc_screen.py --input ${sample_path}/mepo_qeq_charges
182197
```
@@ -192,4 +207,4 @@ This codebase is based on several existing repositories:
192207
- [PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric)
193208
- [PyTorch](https://github.com/pytorch/pytorch)
194209
- [Lightning](https://github.com/Lightning-AI/pytorch-lightning/)
195-
- [Hydra](https://github.com/facebookresearch/hydra)
210+
- [Hydra](https://github.com/facebookresearch/hydra)

conf/logging/no_wandb_logging.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# log frequency
2+
val_check_interval: 3
3+
progress_bar_refresh_rate: 10
4+
5+
tensorboard:
6+
save_dir: ${oc.env:LOG_DIR}/tensorboard
7+
8+
lr_monitor:
9+
logging_interval: "step"
10+
log_momentum: False

mofdiff/common/relaxation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ def extract_info_from_log(log_file_path):
7171

7272

7373
@timeout(7200)
74-
def initiate_lammps_with_force_field(cif_file, force_field="UFF4MOF"):
74+
def initiate_lammps_with_force_field(cif_file, force_field="UFF"):
7575
"""
7676
This function initiates a lammps instance where energies and forces are calculated
7777
using a force field implemented within lammps. The returned lammps instance can then be

mofdiff/common/sys_utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import pytorch_lightning as pl
88
from omegaconf import DictConfig, OmegaConf
99

10+
from mofdiff import __path__ as mofdiff_path
1011

1112
# This is the function that will be called when the timeout happens
1213
def timeout_handler(signum, frame):
@@ -65,7 +66,7 @@ def get_env(env_name: str, default: Optional[str] = None) -> str:
6566
return env_value
6667

6768

68-
def load_envs(env_file: Optional[str] = ".env") -> None:
69+
def load_envs(env_file: Optional[str] = os.path.join(os.path.dirname(mofdiff_path[0]), ".env")) -> None:
6970
"""
7071
Load all the environment variables defined in the `env_file`.
7172
This is equivalent to `. env_file` in bash.

mofdiff/gcmc/UFF-TraPPe-scaled.tar

-36 KB
Binary file not shown.

mofdiff/preprocessing/save_to_lmdb.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def save_to_lmdb(graph_path, save_path):
4242
parser.add_argument(
4343
"--save_path",
4444
type=str,
45-
help="path to processed graphs",
45+
help="output path for lmdb file",
4646
)
4747
args = parser.parse_args()
4848
save_to_lmdb(Path(args.graph_path), Path(args.save_path))

0 commit comments

Comments
 (0)