ml4convection is an end-to-end package that uses U-nets, a type of machine learning, to predict the spatial coverage of thunderstorms (henceforth, "convection") from satellite data. Inputs to the U-net are a time series of multispectral brightness-temperature maps from the Himawari-8 satellite. Available spectral bands are band 8 (central wavelength of 6.25 μm), 9 (6.95 μm), 10 (7.35 μm), 11 (8.60 μm), 13 (10.45 μm), 14 (11.20 μm), and 16 (13.30 μm). Labels are created by applying an echo-classification algorithm to reflectivity maps from four weather radars in Taiwan. The echo-classification algorithm is a modified version of Storm-labeling in 3 Dimensions (SL3D); you can read about the original version here and all except one of the modifications here. Before the end of April 2021, we (Ryan Lagerquist, Jebb Stewart, Imme Ebert-Uphoff, and Christina Kumler) plan to submit a journal article on this work to Monthly Weather Review, titled "Using deep learning to nowcast the spatial coverage of convection from Himawari-8 satellite data". If you would like to see the manuscript before it is accepted for publication, please contact me at ryan dot lagerquist at noaa dot gov.
Detailed documentation (each file and method) can be found here.
Documentation for important scripts, which you can run from the Unix command line, is provided below. Please note that this package is not intended for Windows and I provide no support for Windows. Also, though I have included many unit tests (every file ending in _test.py), I provide no guarantee that the code is free of bugs. If you choose to use this package, you do so at your own risk.
Before training a U-net (or any model in Keras), you must set up the model. "Setting up" includes four things: choosing the architecture, choosing the loss function, choosing the metrics (evaluation scores other than the loss function, which, in addition to the loss function, are used to monitor the model's performance after each training epoch), and compiling the model. For each lead time (0, 30, 60, 90, 120 minutes), I have created a script that sets up the chosen U-net (based on the hyperparameter experiment presented in the Monthly Weather Review paper). These scripts, which you can find in the directory ml4convection/scripts, are as follows:
make_best_architecture_0minutes.pymake_best_architecture_30minutes.pymake_best_architecture_60minutes.pymake_best_architecture_90minutes.pymake_best_architecture_120minutes.py
Each script will set up the model (model_object) and print the model's architecture in a text-only flow chart to the command window, using the command model_object.summary(). If you want to save the model (which is still untrained) to a file, add the following command, replacing output_path with the desired file name.
model_object.save(filepath=output_path, overwrite=True, include_optimizer=True)
Once you have set up a U-net, you can train the U-net, using the script train_neural_net.py in the directory ml4convection/scripts. Below is an example of how you would call train_neural_net.py from a Unix terminal. For some input arguments I have suggested a default (where I include an actual value), and for some I have not. In this case, the lead time is 3600 seconds (60 minutes) and the lag times are 0 and 1200 and 2400 seconds (0 and 20 and 40 minutes). Thus, if the forecast issue time is 1200 UTC, the valid time will be 1300 UTC, while the predictors (brightness-temperature maps) will come from 1120 and 1140 and 1200 UTC.
python train_neural_net.py \
--training_predictor_dir_name="your directory name here" \
--training_target_dir_name="your directory name here" \
--validn_predictor_dir_name="your directory name here" \
--validn_target_dir_name="your directory name here" \
--input_model_file_name="file with untrained, but set-up, model" \
--output_model_dir_name="where you want trained model to be saved" \
--band_numbers 8 9 10 11 13 14 16 \
--lead_time_seconds=3600 \
--lag_times_seconds 0 1200 2400 \
--include_time_dimension=0 \
--first_training_date_string="20160101" \
--last_training_date_string="20161224" \
--first_validn_date_string="20170101" \
--last_validn_date_string="20171224" \
--normalize=1 \
--uniformize=1 \
--add_coords=0 \
--num_examples_per_batch=60 \
--max_examples_per_day_in_batch=8 \
--use_partial_grids=1 \
--omit_north_radar=1 \
--num_epochs=1000 \
--num_training_batches_per_epoch=64 \
--num_validn_batches_per_epoch=32 \
--plateau_lr_multiplier=0.6
More details on the input arguments are provided below.
training_predictor_dir_nameis a string, naming the directory with predictor files (containing brightness-temperature maps). Files therein will be found byexample_io.find_predictor_fileand read byexample_io.read_predictor_file, whereexample_io.pyis in the directoryml4convection/io.example_io.find_predictor_filewill only look for files named like[training_predictor_dir_name]/[yyyy]/predictors_[yyyymmdd]_radar[k].ncand[training_predictor_dir_name]/[yyyy]/predictors_[yyyymmdd]_radar[k].nc.gz, where[yyyy]is the 4-digit year;[yyyymmdd]is the date; and[k]is the radar number, ranging from 1-3. An example of a good file name, assuming the top-level directory isfoo, isfoo/2016/predictors_20160101_radar1.nc.training_target_dir_nameis a string, naming the directory with target files (containing labels, which are binary convection masks, containing 0 or 1 at each grid point). Files therein will be found byexample_io.find_target_fileand read byexample_io.read_target_file.example_io.find_target_filewill only look for files named like[training_target_dir_name]/[yyyy]/targets_[yyyymmdd]_radar[k].ncand[training_target_dir_name]/[yyyy]/targets_[yyyymmdd]_radar[k].nc.gz.validn_predictor_dir_nameis the same astraining_predictor_dir_namebut for validation data.validn_target_dir_nameis the same astraining_target_dir_namebut for validation data.input_model_file_nameis a string, containing the full path to the untrained but set-up model. This file will be read byneural_net.read_model, whereneural_net.pyis in the directoryml4convection/machine_learning.output_model_dir_nameis a string, naming the output directory. The trained model will be saved here.band_numbersis a list of band numbers to use in the predictors. I suggest using all bands (8, 9, 10, 11, 13, 14, 16).lead_time_secondsis the lead time in seconds.lag_times_secondsis a list of lag times for the predictors.include_time_dimensionis a Boolean flag (0 or 1), determining whether or not the spectral bands and lag times will be represented on separate axes. For vanilla U-nets, always make this 0; for temporal U-nets and U-net++ models, always make this 1.first_training_date_stringis a string containing the first date in the training period, in the formatyyyymmdd.last_training_date_stringis a string containing the last date in the training period, in the formatyyyymmdd.first_validn_date_stringis the same asfirst_training_date_stringbut for validation data.last_validn_date_stringis the same aslast_training_date_stringbut for validation data.normalizeis a Boolean flag (0 or 1), determining whether or not predictors will be normalized to z-scores. Please always make this 1.uniformizeis a Boolean flag (0 or 1), determining whether or not predictors will be uniformized before normalization. Please always make this 1.add_coordsis a Boolean flag (0 or 1), determining whether or not latitude-longitude coordinates will be used as predictors. Please always make this 0.num_examples_per_batchis the number of examples per training or validation batch. Based on hyperparameter experiments presented in the Monthly Weather Review paper, I suggest making this 60.max_examples_per_day_in_batchis the maximum number of examples in a given batch that can come from the same day. The smaller you make this, the less temporal autocorrelation there will be in each batch. However, smaller numbers also increase the training time, because they increase the number of daily files from which data must be read.use_partial_gridsis a Boolean flag (0 or 1), determining whether the model will be trained on the full Himawari-8 grid or partial radar-centered grids. Please always make this 1.omit_north_radaris a Boolean flag (0 or 1), determining whether or not the northernmost radar in Taiwan will be omitted from training. Please always make this 1.num_epochsis the number of training epochs. I suggest making this 1000, as early stopping always occurs before 1000 epochs.num_training_batches_per_epochis the number of training batches per epoch. I suggest making this 64 (so that 64 training batches per epoch are used to update model weights), but you might find a better value.num_validn_batches_per_epochis the number of validation batches per epoch. I suggest making this 32 (so that 32 validation batches per epoch are used to compute metrics other than the loss function).plateau_lr_multiplieris a floating-point value ranging from 0 to 1 (non-inclusive). During training, if the validation loss has not improved over the last 10 epochs (i.e., validation performance has reached a "plateau"), the learning rate will be multiplied by this value.
Once you have trained the U-net, you can apply it to make predictions on new data. This is called the "inference phase," as opposed to the "training phase". You can do this with the script apply_neural_net.py in the directory ml4convection/scripts. Below is an example of how you would call apply_neural_net.py from a Unix terminal.
python apply_neural_net.py \
--input_model_file_name="file with trained model" \
--input_predictor_dir_name="your directory name here" \
--input_target_dir_name="your directory name here" \
--apply_to_full_grids=[0 or 1] \
--overlap_size_px=90 \
--first_valid_date_string="date in format yyyymmdd" \
--last_valid_date_string="date in format yyyymmdd" \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_model_file_nameis a string, containing the full path to the trained model. This file will be read byneural_net.read_model.input_predictor_dir_nameis a string, naming the directory with predictor files. Files therein will be found byexample_io.find_predictor_fileand read byexample_io.read_predictor_file, as for the input argumentstraining_predictor_dir_nameandvalidn_predictor_dir_nametotrain_neural_net.py.input_target_dir_nameis a string, naming the directory with target files. Files therein will be found byexample_io.find_target_fileand read byexample_io.read_target_file, as for the input argumentstraining_target_dir_nameandvalidn_target_dir_nametotrain_neural_net.py.apply_to_full_gridsis a Boolean flag (0 or 1), determining whether the model will be applied to full or partial grids. If the model was trained on full grids,apply_to_full_gridswill be ignored and the model will be applied to full grids regardless. Thus,apply_to_full_gridsis used only if the model was trained on partial (radar-centered) grids.overlap_size_pxis an integer, determining the overlap size (in pixels) between adjacent partial grids. This argument is used only if the model was trained on partial grids andapply_to_full_gridsis 1. I suggest makingoverlap_size_px90.first_valid_date_stringandlast_valid_date_stringare the first and last days in the inference period. In other words, the model will be used to make the predictions for all days fromfirst_valid_date_stringtolast_valid_date_string.output_dir_nameis a string, naming the output directory. Predictions will be saved here.
Once you have run apply_neural_net.py to make predictions, you can plot the predictions with the script plot_predictions.py in the directory ml4convection/scripts. Below is an example of how you would call plot_predictions.py from a Unix terminal.
python plot_predictions.py \
--input_prediction_dir_name="your directory name here" \
--first_date_string="date in format yyyymmdd" \
--last_date_string="date in format yyyymmdd" \
--use_partial_grids=[0 or 1] \
--smoothing_radius_px=2 \
--daily_times_seconds 0 7200 14400 21600 28800 36000 43200 50400 57600 64800 72000 79200 \
--plot_deterministic=0 \
--probability_threshold=-1 \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_prediction_dir_nameis a string, naming the directory with prediction files. Files therein will be found byprediction_io.find_fileand read byprediction_io.read_file, whereprediction_io.pyis in the directoryml4convection/io.prediction_io.find_filewill only look for files named like[input_prediction_dir_name]/[yyyy]/predictions_[yyyymmdd]_radar[k].ncand[input_prediction_dir_name]/[yyyy]/predictions_[yyyymmdd]_radar[k].nc.gz, where[yyyy]is the 4-digit year;[yyyymmdd]is the date; and[k]is the radar number, ranging from 1-3. An example of a good file name, assuming the top-level directory isfoo, isfoo/2016/predictions_20160101_radar1.nc.first_date_stringandlast_date_stringare the first and last days to plot. In other words, the model will be used to plot predictions for all days fromfirst_date_stringtolast_date_string.use_partial_gridsis a Boolean flag (0 or 1), indicating whether you want to plot predictions on the full Himawari-8 grid or partial (radar-centered) grids. Ifuse_partial_gridsis 1,plot_predictions.pywill plot partial grids centered on every radar (but in separate plots, so you will get one plot per time step per radar).smoothing_radius_px, used for full-grid predictions (ifuse_partial_gridsis 0), is the e-folding radius for Gaussian smoothing (pixels). Each probability field will be filtered by this amount before plotting. Smoothing is useful for full-grid predictions created from a model trained on partial grids. In this case the full-grid predictions are created by sliding the partial grid to various "windows" inside the full grid, and sometimes there is a sharp cutoff at the boundary between two adjacent windows.daily_times_secondsis a list of daily times at which to plot predictions. The data are available at 10-minute (600-second) time steps, but you may not want to plot the predictions every 10 minutes. In the above code example, the list of times provided will force the script to plot predictions at {0000, 0200, 0400, 0600, 0800, 1000, 1200, 1400, 1600, 1800, 2000, 2200} UTC every day.plot_deterministicis a Boolean flag (0 or 1), indicating whether you want to plot deterministic (binary) or probabilistic predictions.probability_thresholdis the probability threshold (ranging from 0 to 1) used to convert probabilistic to binary predictions. This argument is ignored ifplot_deterministicis 0, which is why I make it -1 in the above code example.output_dir_nameis a string, naming the output directory. Plots will be saved here as JPEG images.
If you want to just plot satellite data, use the script plot_satellite.py in the directory ml4convection/scripts. Below is an example of how you would call plot_satellite.py from a Unix terminal.
python plot_satellite.py \
--input_satellite_dir_name="your directory name here" \
--first_date_string="date in format yyyymmdd" \
--last_date_string="date in format yyyymmdd" \
--band_numbers 8 9 10 11 13 14 16 \
--daily_times_seconds 0 7200 14400 21600 28800 36000 43200 50400 57600 64800 72000 79200 \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_satellite_dir_nameis a string, naming the directory with prediction files. Files therein will be found bysatellite_io.find_fileand read bysatellite_io.read_file, wheresatellite_io.pyis in the directoryml4convection/io.satellite_io.find_filewill only look for files named like[input_satellite_dir_name]/[yyyy]/satellite_[yyyymmdd].ncand[input_satellite_dir_name]/[yyyy]/satellite_[yyyymmdd].nc.gz, where[yyyy]is the 4-digit year and[yyyymmdd]is the date. An example of a good file name, assuming the top-level directory isfoo, isfoo/2016/satellite_20160101.nc.first_date_stringandlast_date_stringare the first and last days to plot. In other words, the model will be used to plot brightness-temperature maps for all days fromfirst_date_stringtolast_date_string.band_numbersis a list of band numbers to plot.plot_satellite.pywill create one image per time step per band.daily_times_secondsis a list of daily times at which to plot brightness-temperature maps, same as the input forplot_predictions.py.output_dir_nameis a string, naming the output directory. Plots will be saved here as JPEG images.
If you want to just plot radar data, use the script plot_radar.py in the directory ml4convection/scripts. Below is an example of how you would call plot_radar.py from a Unix terminal.
python plot_radar.py \
--input_reflectivity_dir_name="your directory name here" \
--input_echo_classifn_dir_name="your directory name here" \
--first_date_string="date in format yyyymmdd" \
--last_date_string="date in format yyyymmdd" \
--plot_all_heights=[0 or 1] \
--daily_times_seconds 0 7200 14400 21600 28800 36000 43200 50400 57600 64800 72000 79200 \
--expand_to_satellite_grid=[0 or 1] \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_reflectivity_dir_nameis a string, naming the directory with reflectivity files. Files therein will be found byradar_io.find_fileand read byradar_io.read_reflectivity_file, whereradar_io.pyis in the directoryml4convection/io.radar_io.find_filewill only look for files named like[input_reflectivity_dir_name]/[yyyy]/reflectivity_[yyyymmdd].ncand[input_reflectivity_dir_name]/[yyyy]/reflectivity_[yyyymmdd].nc.gz, where[yyyy]is the 4-digit year and[yyyymmdd]is the date. An example of a good file name, assuming the top-level directory isfoo, isfoo/2016/reflectivity_20160101.nc.input_echo_classifn_dir_nameis a string, naming the directory with echo-classification files. If you specify this input argument, thenplot_radar.pywill plot black dots on top of the reflectivity map, one for each convective grid point. If you leave this input argument alone, echo classification will not be plotted. Files ininput_echo_classifn_dir_namewill be found byradar_io.find_fileand read byradar_io.read_echo_classifn_file.radar_io.find_filewill only look for files named like[input_echo_classifn_dir_name]/[yyyy]/echo_classification_[yyyymmdd].ncand[input_echo_classifn_dir_name]/[yyyy]/echo_classification_[yyyymmdd].nc.gz, where[yyyy]is the 4-digit year and[yyyymmdd]is the date. An example of a good file name, assuming the top-level directory isfoo, isfoo/2016/echo_classification_20160101.nc.first_date_stringandlast_date_stringare the first and last days to plot. In other words, the model will be used to plot reflectivity maps for all days fromfirst_date_stringtolast_date_string.plot_all_heightsis a Boolean flag (0 or 1). If 1, the script will plot reflectivity at all heights, thus producing one plot per time step per height. If 0, the script will plot composite (column-maximum) reflectivity, thus producing one plot per time step.daily_times_secondsis a list of daily times at which to plot reflectivity maps, same as the input forplot_predictions.py.expand_to_satellite_gridis a Boolean flag (0 or 1). If 1, the script will plot reflectivity on the Himawari-8 grid, which is slightly larger than the radar grid. In this case values around the edge of the grid will all be 0 dBZ.output_dir_nameis a string, naming the output directory. Plots will be saved here as JPEG images.
Evaluation scripts are split into those that compute "basic" scores and those that compute "advanced" scores. Basic scores are written to one file per day, whereas advanced scores are written to one file for a whole time period (e.g., the validation period, which is Jan 1 2017 - Dec 24 2017 in the Monthly Weather Review paper). For any time period T, basic scores can be aggregated over T to compute advanced scores. This documentation does not list all the basic and advanced scores (there are many), but below is an example:
- The fractions skill score (FSS) is an advanced score, defined as 1 - SSE / SSEref.
- SSE (the actual sum of squared errors) and SSEref (the reference sum of squared errors) are basic scores, each with one value per time step.
- To compute the FSS for a period T, SSE and SSEref are summed over T and then the following equation is applied: FSS = 1 - SSE / SSEref.
If you want to compute basic ungridded scores (averaged over the whole domain), use the script compute_basic_scores_ungridded.py in the directory ml4convection/scripts. Below is an example of how you would call compute_basic_scores_ungridded.py from a Unix terminal.
python compute_basic_scores_ungridded.py \
--input_prediction_dir_name="your directory name here" \
--first_date_string="date in format yyyymmdd" \
--last_date_string="date in format yyyymmdd" \
--time_interval_steps=[integer] \
--use_partial_grids=[0 or 1] \
--smoothing_radius_px=2 \
--matching_distances_px 1 2 3 4 \
--num_prob_thresholds=21 \
--prob_thresholds -1 \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_prediction_dir_nameis a string, naming the directory with prediction files. Files therein will be found byprediction_io.find_fileand read byprediction_io.read_file, as for the input argument toplot_predictions.py.first_date_stringandlast_date_stringare the first and last days for which to compute scores. In other words, the script will compute basic scores for all days fromfirst_date_stringtolast_date_string.time_interval_stepsis used to reduce computing time. If you want to compute scores for every kth time step, maketime_interval_stepsbe k.use_partial_gridsis a Boolean flag (0 or 1), indicating whether you want to compute scores for predictions on the Himawari-8 grid or partial (radar-centered) grids.smoothing_radius_px, used for full-grid predictions (ifuse_partial_gridsis 0), is the e-folding radius for Gaussian smoothing (pixels). Each probability field will be filtered by this amount before computing scores. I suggest making this 2.matching_distances_pxis a list of neighbourhood distances (pixels) for evaluation. Basic scores will be computed for each neighbourhood distance, and one set of files will be written for each neighbourhood distance.num_prob_thresholdsis the number of probability thresholds at which to compute scores based on binary (rather than probabilistic) forecasts. These thresholds will be equally spaced from 0.0 to 1.0. If you instead want to specify probability thresholds, makenum_prob_thresholds-1 and use the argumentprob_thresholds.prob_thresholdsis a list of probability thresholds (between 0.0 and 1.0) at which to compute scores based on binary (rather than probabilistic) forecasts. you instead want equally spaced thresholds from 0.0 to 1.0, makeprob_thresholds-1 and use the argumentnum_prob_thresholds.output_dir_nameis a string, naming the output directory. Basic scores will be saved here as NetCDF files.
If you want to compute basic gridded scores (one set of scores for each grid point), use the script compute_basic_scores_gridded.py in the directory ml4convection/scripts. Below is an example of how you would call compute_basic_scores_gridded.py from a Unix terminal.
python compute_basic_scores_gridded.py \
--input_prediction_dir_name="your directory name here" \
--first_date_string="date in format yyyymmdd" \
--last_date_string="date in format yyyymmdd" \
--smoothing_radius_px=2 \
--matching_distances_px 1 2 3 4 \
--climo_file_names "climatology/climo_neigh-distance-px=1.p" "climatology/climo_neigh-distance-px=2.p" "climatology/climo_neigh-distance-px=3.p" "climatology/climo_neigh-distance-px=4.p" \
--prob_thresholds -1 \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_prediction_dir_nameis the same as forcompute_basic_scores_ungridded.py.first_date_stringandlast_date_stringare the same as forcompute_basic_scores_ungridded.py.smoothing_radius_pxis the same as forcompute_basic_scores_ungridded.py.matching_distances_pxis the same as forcompute_basic_scores_ungridded.py.climo_file_namesis a list of paths to climatology files, one for each matching distance. Each file will be read byclimatology_io.read_file, whereclimatology_io.pyis in the directoryml4convection/io. Each file specifies the climatology (i.e., convection frequency in the training data at each pixel), which is ultimately used to compute the Brier skill score at each pixel. The climatology is different for each matching distance, because a matching distance (radius) of N pixels turns each convective label (one pixel) into πr2 labels (pixels). The climatology depends on the matching distance and training period, which is why I have not included climatology files in this package.prob_thresholdsis the same as forcompute_basic_scores_ungridded.py.output_dir_nameis the same as forcompute_basic_scores_ungridded.py.
If you want to compute advanced ungridded scores (averaged over the whole domain), use the script compute_advanced_scores_ungridded.py in the directory ml4convection/scripts. Below is an example of how you would call compute_advanced_scores_ungridded.py from a Unix terminal.
python compute_advanced_scores_ungridded.py \
--input_basic_score_dir_name="your directory name here" \
--first_date_string="date in format yyyymmdd" \
--last_date_string="date in format yyyymmdd" \
--num_bootstrap_reps=[integer] \
--use_partial_grids=[0 or 1] \
--desired_month=[integer] \
--split_by_hour=[0 or 1] \
--input_climo_file_name="your file name here" \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_basic_score_dir_nameis a string, naming the directory with basic ungridded scores. Files therein will be found byevaluation.find_basic_score_fileand read byevaluation.read_basic_score_file, whereevaluation.pyis in the directoryml4convection/utils.evaluation.find_basic_score_filewill only look for files named like[input_basic_score_dir_name]/[yyyy]/basic_scores_gridded=0_[yyyymmdd].nc(ifuse_partial_gridsis 0) or[input_basic_score_dir_name]/[yyyy]/basic_scores_gridded=0_[yyyymmdd]_radar[k].nc(ifuse_partial_gridsis 1), where[yyyy]is the 4-digit year;[yyyymmdd]is the date; and[k]is the radar number, an integer from 1-3. An example of a good file name, assuming the top-level directory isfoo, isfoo/2016/basic_scores_gridded=0_20160101.ncorfoo/2016/basic_scores_gridded=0_20160101_radar1.nc.first_date_stringandlast_date_stringare the first and last days for which to aggregate basic scores into advanced scores. In other words, the script will compute advanced scores for all days fromfirst_date_stringtolast_date_string.num_bootstrap_repsis the number of replicates (sometimes called "iterations") for bootstrapping, used to compute uncertainty. If you do not want to boostrap, makenum_bootstrap_reps1.use_partial_gridsis a Boolean flag (0 or 1), indicating whether you want to compute scores for predictions on the Himawari-8 grid or partial (radar-centered) grids.desired_monthis an integer from 1 to 12, indicating the month for which you want compute advanced scores. If you want to include all months, make this -1.split_by_houris a Boolean flag (0 or 1), indicating whether or not you want to compute one set of advanced scores for each hour of the day (0000-0059 UTC, 0100-0159 UTC, etc.).input_climo_file_nameis the path to the climatology file. For more details on this (admittedly weird) input argument, see the documentation above for the input argumentclimo_file_namesto the scriptcompute_basic_scores_gridded.py.output_dir_nameis a string, naming the output directory. Advanced scores will be saved here as NetCDF files.
If you want to compute advanced gridded scores (one set of scores for each grid point), use the script compute_advanced_scores_gridded.py in the directory ml4convection/scripts. Below is an example of how you would call compute_advanced_scores_gridded.py from a Unix terminal.
python compute_advanced_scores_gridded.py \
--input_basic_score_dir_name="your directory name here" \
--first_date_string="date in format yyyymmdd" \
--last_date_string="date in format yyyymmdd" \
--num_subgrids_per_dim=3 \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_basic_score_dir_nameis a string, naming the directory with basic gridded scores. Files therein will be found byevaluation.find_basic_score_fileand read byevaluation.read_basic_score_file.evaluation.find_basic_score_filewill only look for files named like[input_basic_score_dir_name]/[yyyy]/basic_scores_gridded=1_[yyyymmdd].nc(ifuse_partial_gridsis 0) or[input_basic_score_dir_name]/[yyyy]/basic_scores_gridded=1_[yyyymmdd]_radar[k].nc(ifuse_partial_gridsis 1), where[yyyy]is the 4-digit year;[yyyymmdd]is the date; and[k]is the radar number, an integer from 1-3. An example of a good file name, assuming the top-level directory isfoo, isfoo/2016/basic_scores_gridded=1_20160101.ncorfoo/2016/basic_scores_gridded=1_20160101_radar1.nc.first_date_stringandlast_date_stringare the same as forcompute_advanced_scores_ungridded.py.num_subgrids_per_dim(an integer) is the number of subgrids per spatial dimension. For example,num_subgrids_per_dimis 3, the script will use 3 * 3 = 9 subgrids. It will aggregate basic scores for one subgrid at a time. Although this input argument is weird, it greatly reduces the memory requirements.output_dir_nameis the same as forcompute_advanced_scores_ungridded.py.
ml4convection contains plotting code only for advanced evaluation scores (aggregated over a time period), not for basic scores (one set of scores per time step).
If you want to plot ungridded scores (averaged over the whole domain) with no separation by month or hour, use the script plot_evaluation.py in the directory ml4convection/scripts. plot_evaluation.py creates an attributes diagram (evaluating probabilistic forecasts) and a performance diagram (evaluating binary forecasts at various probability thresholds). Below is an example of how you would call plot_evaluation.py from a Unix terminal.
python plot_evaluation.py \
--input_advanced_score_file_name="your file name here" \
--best_prob_threshold=[float] \
--confidence_level=0.95 \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_advanced_score_file_nameis a string, giving the full path to the file with advanced ungridded scores. This file will be read byevaluation.read_advanced_score_file, whereevaluation.pyis in the directoryml4convection/utils.best_prob_thresholdis the optimal probability threshold, which will be marked with a star in the performance diagram. If you have not yet chosen the optimal threshold and want it to be determined "on the fly," make this argument -1.confidence_levelis the confidence level for plotting uncertainty. This argument will be only ifinput_advanced_score_file_namecontains bootstrapped scores. For example, if the file contains scores for 1000 bootstrap replicates andconfidence_levelis 0.95, the 95% confidence interval will be plotted (ranging from the 2.5th to 97.5th percentile over all bootstrap replicates).output_dir_nameis a string, naming the output directory. Plots will be saved here as JPEG files.
If you want to plot gridded scores (one set of scores per grid point), use the script plot_gridded_evaluation.py in the directory ml4convection/scripts. plot_gridded_evaluation.py plots a gridded map for each of the following scores: Brier score, Brier skill score, fractions skill score, label climatology (event frequency in the training data, which isn't an evaluation score), model climatology (mean forecast probability, which also isn't an evaluation score), probability of detection, success ratio, frequency bias, and critical success index. Below is an example of how you would call plot_gridded_evaluation.py from a Unix terminal.
python plot_gridded_evaluation.py \
--input_advanced_score_file_name="your file name here" \
--probability_threshold=[float] \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_advanced_score_file_nameis a string, giving the full path to the file with advanced gridded scores. This file will be read byevaluation.read_advanced_score_file.probability_thresholdis the probability threshold for binary forecasts. This is required for plotting probability of detection (POD), success ratio, frequency bias, and critical success index (CSI), which are scores based on binary forecasts.output_dir_nameis a string, naming the output directory. Plots will be saved here as JPEG files.
If you want to plot ungridded scores separated by month and hour, use the script plot_evaluation_by_time.py in the directory ml4convection/scripts. plot_evaluation_by_time.py creates a monthly attributes diagram, hourly attributes diagram, monthly performance diagram, and hourly performance diagram. plot_evaluation_by_time.py also plots fractions skill score (FSS), CSI, and frequency bias as a function of month and hour. Thus, plot_evaluation_by_time.py plots 6 figures. Below is an example of how you would call plot_evaluation_by_time.py from a Unix terminal.
python plot_evaluation_by_time.py \
--input_dir_name="your directory name here" \
--probability_threshold=[float] \
--confidence_level=0.95 \
--output_dir_name="your directory name here" \
More details on the input arguments are provided below.
input_dir_nameis a string, naming the directory with advanced ungridded scores separated by month and hour. Files therein will be found byevaluation.find_advanced_score_fileand read byevaluation.read_advanced_score_file.evaluation.find_advanced_score_filewill only look for files named like[input_dir_name]/advanced_scores_month=[mm]_gridded=0.pand[input_dir_name]/advanced_scores_hour=[hh]_gridded=0.p, where[mm]is the 2-digit month and[hh]is the 2-digit hour. An example of a good file name, assuming the directory isfoo, isfoo/advanced_scores_month=03_gridded=0.porfoo/advanced_scores_hour=12_gridded=0.p.probability_thresholdis the probability threshold for binary forecasts. This is required for plotting CSI and frequency bias versus hour and month.confidence_levelis the confidence level for plotting uncertainty. This argument will be used only if files ininput_dir_namecontain bootstrapped scores. For example, if the files contain scores for 1000 bootstrap replicates andconfidence_levelis 0.95, the 95% confidence interval will be plotted (ranging from the 2.5th to 97.5th percentile over all bootstrap replicates).output_dir_nameis a string, naming the output directory. Plots will be saved here as JPEG files.