Skip to content

Commit 787e98a

Browse files
authored
added obc generation workflow (NOAA-GFDL#128)
* added obc generation workflow * Update README.md * adress comments * Update README.md
1 parent 865b310 commit 787e98a

File tree

7 files changed

+869
-0
lines changed

7 files changed

+869
-0
lines changed
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# MOM6 Open Boundary Conditions (OBC) Generation Workflow
2+
3+
This repository provides an example workflow for generating Open Boundary Conditions (OBC) for MOM6 using daily GLORYS data on PPAN.
4+
5+
## Overview
6+
7+
The main script, `mom6_obc_workflow.sh`, orchestrates the entire OBC generation process. The workflow includes the following steps:
8+
9+
1. **Spatial Subsetting of GLORYS Data**
10+
- Iterates through each day within a specified date range to spatially subset the original GLORYS dataset on UDA.
11+
- Reduces computational cost by limiting input data to the regional domain of interest instead of the entire global GLORYS domain.
12+
13+
2. **Filling Missing Values in Subset GLORYS Files**
14+
- Processes each daily subset file using CDO to fill missing values.
15+
- Compresses processed files with `ncks -4 -L 5`.
16+
- Combines all variables (e.g., `thetao`, `so`, `zos`, `uo`, `vo`) into a single NetCDF file.
17+
18+
3. **Daily Boundary Condition Generation**
19+
- Submits jobs to execute the `write_glorys_boundary_daily.py` script for each day.
20+
- Regrids GLORYS data and generates daily OBC files.
21+
22+
Template scripts for these steps are provided in the `template` directory. User-specific parameters are configured using [uwtools](https://github.com/ufs-community/uwtools) to render templates, creating a `config.yaml` file based on user input.
23+
24+
---
25+
26+
## Configuration Example
27+
28+
Below is an example `config.yaml` file to set up parameters for the workflow:
29+
30+
```yaml
31+
# General parameters for template scripts
32+
_WALLTIME: "1440" # Wall time (in minutes) for SLURM jobs
33+
_NPROC: "1" # Number of processes for each job
34+
_EMAIL_NOTIFICATION: "fail" # SLURM email notification option
35+
_USER_EMAIL: "your.email@example.com" # Email address for error notifications
36+
_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j" # Path for job logs
37+
_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily" # Path to original GLORYS data
38+
_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean" # File name prefix for GLORYS data
39+
_REGIONAL_GLORYS_ARCHIVE: "/archive/user/datasets/glorys" # Archive path for processed daily files
40+
_BASIN_NAME: "NWA12" # Regional domain name
41+
_OUTPUT_PREFIX: "GLORYS" # Prefix for output files
42+
_VARS: "thetao so uo vo zos" # Variables to process
43+
_LON_MIN: "-100.0" # Minimum longitude for subsetting
44+
_LON_MAX: "-30.0" # Maximum longitude for subsetting
45+
_LAT_MIN: "5.0" # Minimum latitude for subsetting
46+
_LAT_MAX: "60.0" # Maximum latitude for subsetting
47+
_PYTHON_SCRIPT: "$PYTHON_SCRIPT" # Path to the Python script for daily OBC generation
48+
49+
# Date range for processing
50+
first_date: "$START_DATE"
51+
last_date: "$END_DATE"
52+
53+
# Python script parameters
54+
glorys_dir: "/archive/user/datasets/glorys/NWA12/filled" # daily subset of GLORYS DATA after filling NaN
55+
output_dir: "./outputs" # output path for the obc files
56+
hgrid: "./ocean_hgrid.nc" # grid file
57+
ncrcat_names:
58+
- "thetao"
59+
- "so"
60+
- "zos"
61+
- "uv"
62+
segments:
63+
- id: 1
64+
border: "south"
65+
- id: 2
66+
border: "north"
67+
- id: 3
68+
border: "east"
69+
variables:
70+
- "thetao"
71+
- "so"
72+
- "zos"
73+
- "uv"
74+
```
75+
76+
# Workflow Usage
77+
78+
## Step 1: Modify Configuration
79+
Update the `cat <<EOF > config.yaml` part in `mom6_obc_workflow.sh` with parameters specific to your domain and workflow requirements.
80+
81+
```
82+
cat <<EOF > config.yaml
83+
_WALLTIME: "1440"
84+
_NPROC: "1"
85+
_EMAIL_NOTIFACTION: "fail"
86+
_USER_EMAIL: "yi-cheng.teng@noaa.gov"
87+
_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j"
88+
_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily"
89+
_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean"
90+
_REGIONAL_GLORYS_ARCHIVE: "/archive/ynt/datasets/glorys"
91+
_BASIN_NAME: "NWA12"
92+
_OUTPUT_PREFIX: "GLORYS"
93+
_VARS: "thetao so uo vo zos"
94+
_LON_MIN: "-100.0"
95+
_LON_MAX: "-30.0"
96+
_LAT_MIN: "5.0"
97+
_LAT_MAX: "60.0"
98+
_PYTHON_SCRIPT: "$PYTHON_SCRIPT"
99+
first_date: "$START_DATE"
100+
last_date: "$END_DATE"
101+
glorys_dir: "/archive/ynt/datasets/glorys/NWA12/filled"
102+
output_dir: "./outputs"
103+
hgrid: './ocean_hgrid.nc'
104+
ncrcat_names:
105+
- 'thetao'
106+
- 'so'
107+
- 'zos'
108+
- 'uv'
109+
segments:
110+
- id: 1
111+
border: 'south'
112+
- id: 2
113+
border: 'north'
114+
- id: 3
115+
border: 'east'
116+
variables:
117+
- 'thetao'
118+
- 'so'
119+
- 'zos'
120+
- 'uv'
121+
EOF
122+
```
123+
124+
## Step 2: Generate OBC Files
125+
Run the workflow for a specific year or date range:
126+
127+
```bash
128+
./mom6_obc_workflow.sh 2022-01-01 2022-12-31
129+
./mom6_obc_workflow.sh 2023-01-01 2023-12-31
130+
```
131+
132+
## Step 3: Concatenate Multiple Years of OBC Files
133+
To merge OBC files from multiple years into a single file, use the `--ncrcat` option. Ensure the dates in the command match the range for which you generated OBC files:
134+
135+
```bash
136+
./mom6_obc_workflow.sh 2022-01-01 2023-12-31 --ncrcat
137+
```
138+
### Adjust Timestamps (Optional Substep)
139+
If you need to adjust the timestamps of the first and last records for compatibility with `MOM6` yearly simulation, use the `--adjust-timestamps` option in combination with `--ncrcat`. Note that this is an alternative to the command above and should not be run afterward:
140+
```
141+
./mom6_obc_workflow.sh 2022-01-01 2023-12-31 --ncrcat --adjust-timestamps
142+
```
143+
**Note**: Ensure the date range specified in your command corresponds to the dates for which you generated OBC files. Running this step with a mismatched date range will cause it to fail if files for the specified dates are missing.
144+
145+
TODO:
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
#!/bin/bash
2+
3+
# Load required modules and environments
4+
source $MODULESHOME/init/sh
5+
module load miniforge
6+
conda activate /nbhome/role.medgrp/.conda/envs/uwtools || { echo "Error activating conda environment. Exiting."; exit 1; }
7+
8+
set -eu
9+
10+
# Helper functions
11+
print_usage() {
12+
echo "Usage: $0 START_DATE END_DATE [--ncrcat] [--adjust-timestamps]"
13+
echo " START_DATE and END_DATE must be in YYYY-MM-DD format."
14+
echo " --ncrcat: Enable ncrcat step (skips subset, fill, and submit_python steps)."
15+
echo " --adjust-timestamps: Adjust timestamps during ncrcat step."
16+
}
17+
18+
validate_date_format() {
19+
if [[ ! "$1" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
20+
echo "Error: Date $1 must be in YYYY-MM-DD format."
21+
exit 1
22+
fi
23+
}
24+
25+
log_message() {
26+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
27+
}
28+
29+
# Default options
30+
DO_NCRCAT=false
31+
ADJUST_TIMESTAMPS=false
32+
PYTHON_SCRIPT="../write_glorys_boundary_daily.py"
33+
34+
# Parse arguments
35+
START_DATE="$1"
36+
END_DATE="$2"
37+
shift 2
38+
39+
while [[ $# -gt 0 ]]; do
40+
case "$1" in
41+
--ncrcat)
42+
DO_NCRCAT=true
43+
;;
44+
--adjust-timestamps)
45+
ADJUST_TIMESTAMPS=true
46+
;;
47+
*)
48+
echo "Unknown argument: $1"
49+
print_usage
50+
exit 1
51+
;;
52+
esac
53+
shift
54+
done
55+
56+
validate_date_format "$START_DATE"
57+
validate_date_format "$END_DATE"
58+
59+
60+
start_date_epoch=$(date -d "$START_DATE" +%s)
61+
end_date_epoch=$(date -d "$END_DATE" +%s)
62+
if [[ $start_date_epoch -gt $end_date_epoch ]]; then
63+
log_message "Error: START_DATE ($START_DATE) must not be after END_DATE ($END_DATE). Exiting."
64+
exit 1
65+
fi
66+
67+
# Ensure --adjust-timestamps is only used with --ncrcat
68+
if $ADJUST_TIMESTAMPS && ! $DO_NCRCAT; then
69+
echo "Error: --adjust-timestamps can only be used with --ncrcat."
70+
exit 1
71+
fi
72+
73+
# Warn user when --ncrcat is enabled
74+
if $DO_NCRCAT; then
75+
log_message "WARNING: --ncrcat is enabled. The script will SKIP subset, fill, and submit_python steps."
76+
log_message "Ensure that all daily outputs already exist for the specified date range."
77+
fi
78+
79+
# Prepare directories
80+
CURRENT_DATE=$(date +%Y-%m-%d-%H-%M)
81+
mkdir -p ./log/$CURRENT_DATE ./outputs scripts
82+
83+
# Define user configurations
84+
log_message "Generating config.yaml..."
85+
cat <<EOF > config.yaml
86+
_WALLTIME: "1440"
87+
_NPROC: "1"
88+
_EMAIL_NOTIFACTION: "fail"
89+
_USER_EMAIL: "$USER@noaa.gov"
90+
_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j"
91+
_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily"
92+
_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean"
93+
_REGIONAL_GLORYS_ARCHIVE: "/archive/$USER/datasets/glorys"
94+
_BASIN_NAME: "NWA12"
95+
_OUTPUT_PREFIX: "GLORYS"
96+
_VARS: "thetao so uo vo zos"
97+
_LON_MIN: "-100.0"
98+
_LON_MAX: "-30.0"
99+
_LAT_MIN: "5.0"
100+
_LAT_MAX: "60.0"
101+
_PYTHON_SCRIPT: "$PYTHON_SCRIPT"
102+
first_date: "$START_DATE"
103+
last_date: "$END_DATE"
104+
glorys_dir: "/archive/$USER/datasets/glorys/NWA12/filled"
105+
output_dir: "./outputs"
106+
hgrid: './ocean_hgrid.nc'
107+
ncrcat_names:
108+
- 'thetao'
109+
- 'so'
110+
- 'zos'
111+
- 'uv'
112+
segments:
113+
- id: 1
114+
border: 'south'
115+
- id: 2
116+
border: 'north'
117+
- id: 3
118+
border: 'east'
119+
variables:
120+
- 'thetao'
121+
- 'so'
122+
- 'zos'
123+
- 'uv'
124+
EOF
125+
126+
127+
log_message "Preparing scripts directory..."
128+
[[ -d scripts ]] || mkdir scripts
129+
for template in subset_glorys fill_glorys submit_python_make_obc_day ncrcat_obc; do
130+
rm -f scripts/${template}.sh
131+
uw template render --input-file template/${template}_template.sh \
132+
--values-file config.yaml \
133+
--output-file scripts/${template}.sh || { log_message "Error rendering ${template}. Exiting."; exit 1; }
134+
done
135+
136+
# Skip main steps if --ncrcat is enabled
137+
if ! $DO_NCRCAT; then
138+
139+
# Submit jobs
140+
current_date_epoch=$start_date_epoch
141+
job_ids=()
142+
143+
while [[ $current_date_epoch -le $end_date_epoch ]]; do
144+
current_date=$(date -d "@$current_date_epoch" +%Y-%m-%d)
145+
year=$(date -d "$current_date" +%Y)
146+
month=$(date -d "$current_date" +%m)
147+
day=$(date -d "$current_date" +%d)
148+
149+
log_message "Submitting subset job for $current_date..."
150+
subset_job_id=$(sbatch --job-name="glorys_subset_${year}_${month}_${day}" \
151+
scripts/subset_glorys.sh $year $month $day | awk '{print $4}')
152+
153+
log_message "Submitting fill_nan job for $current_date..."
154+
fill_job_id=$(sbatch --dependency=afterok:$subset_job_id \
155+
--job-name="glorys_fill_${year}_${month}_${day}" \
156+
scripts/fill_glorys.sh $year $month $day | awk '{print $4}')
157+
158+
log_message "Submitting Python job for $current_date..."
159+
python_job_id=$(sbatch --dependency=afterok:$fill_job_id \
160+
--job-name="python_make_obc_day_${year}_${month}_${day}" \
161+
scripts/submit_python_make_obc_day.sh $year $month $day | awk '{print $4}')
162+
163+
job_ids+=($python_job_id)
164+
current_date_epoch=$((current_date_epoch + 86400))
165+
done
166+
fi
167+
168+
# Optional ncrcat step
169+
if $DO_NCRCAT; then
170+
log_message "Submitting ncrcat job..."
171+
dependency_str=$(IFS=,; echo "${job_ids[*]:-}")
172+
if $ADJUST_TIMESTAMPS; then
173+
sbatch --job-name="obc_ncrcat" scripts/ncrcat_obc.sh --config config.yaml --ncrcat_years --adjust_timestamps
174+
else
175+
sbatch --job-name="obc_ncrcat" scripts/ncrcat_obc.sh --config config.yaml --ncrcat_years
176+
fi
177+
fi
178+
179+
log_message "Workflow completed."

0 commit comments

Comments
 (0)