Skip to content

Commit 7e89c9d

Browse files
authored
Merge pull request NOAA-ORR-ERD#55 from kthyng/agg_fix
Agg fix
2 parents 3b83918 + 162c2ed commit 7e89c9d

File tree

4 files changed

+21
-8
lines changed

4 files changed

+21
-8
lines changed

docs/aggregations.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ As of October 2022, developmental versions of LSOFS and LOOFS have started to re
3434

3535
##### General
3636

37-
Usually, nowcast and forecast files are created four times a day, and output is hourly in individual files. So, each update generates 6 nowcast files and 48 forecast files (for a 48 hour forecast; the forecast length varies by model). The update cycle time will be the last model output timestep in the nowcast files and the first timestep in the forecast files.
37+
Usually, nowcast and forecast files are created four times a day, and output is hourly in individual files. (WCOFS is updated once a day which changes these details for that model.) So, each update generates 6 nowcast files and 48 forecast files (for a 48 hour forecast; the forecast length varies by model). The update cycle time will be the last model output timestep in the nowcast files and the first timestep in the forecast files.
3838

3939
Example filenames from one update cycle (`20141027.t15z`):
4040

@@ -103,11 +103,11 @@ The datetimes associated with a given NOAA OFS file is not obvious from the file
103103

104104
#### General
105105

106-
The ``n006`` file for timing cycle ``t00z`` is at midnight of the day listed in the filename. Files ``n000`` to ``n005`` for timing cycle ``t00z`` count backward in time from there. Forecast files do not have the 6 hour shift backward. The hour in the timing cycle should be added to this convention. Datetime translations are given in the following table for sample files.
106+
For most models, the ``n006`` file for timing cycle ``t00z`` is at midnight of the day listed in the filename. Files ``n000`` to ``n005`` for timing cycle ``t00z`` count backward in time from there. Forecast files do not have the 6 hour shift backward. The hour in the timing cycle should be added to this convention. Datetime translations are given in the following table for sample files. These are for a filename with the pattern `nos.MODELNAME.fields.[n|f]HHH.YYYYMMDD.tCCz.nc`. Note that the WCOFS model is distinct in that it updates only once a day and so has nowcast files up to ``n024`` and you need to subtract 24 hours from the formula instead of 6.
107107

108108
The formula are:
109109

110-
- Nowcast files: time shift from midnight on date listed = CC + HHH - 6
110+
- Nowcast files: time shift from midnight on date listed = CC + HHH - update period in hours (6 for most, 24 for WCOFS)
111111
- Forecast files: time shift from midnight on date listed = CC + HHH
112112

113113
<details>

docs/whats_new.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
:mod:`What's New`
22
----------------------------
33

4+
v0.7.0 (March 17, 2023)
5+
=======================
6+
* The sorting in filedates2df was slightly wrong and would not consistently return the desired nowcast file over the forecast file. Seems to be correct now.
7+
* WCOFS has a different update frequency which leads to a different conversion from filename to datetime. This is now included in file2dt.
8+
49
v0.6.0 (February 17, 2023)
510
==========================
611
* Updated docs.

model_catalogs/tests/test_catalogs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -535,7 +535,7 @@ def test_filedates2df():
535535
"nos.creofs.fields.n003.20220920.t15z.nc",
536536
"nos.creofs.fields.n004.20220920.t15z.nc",
537537
"nos.creofs.fields.n005.20220920.t15z.nc",
538-
"nos.creofs.fields.f000.20220920.t15z.nc",
538+
"nos.creofs.fields.n006.20220920.t15z.nc",
539539
"nos.creofs.fields.f001.20220920.t15z.nc",
540540
"nos.creofs.fields.f002.20220920.t15z.nc",
541541
"nos.creofs.fields.f003.20220920.t15z.nc",

model_catalogs/utils.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -103,16 +103,22 @@ def file2dt(filename):
103103
# Main style of NOAA OFS files, 1 file per time step
104104
elif fnmatch.fnmatch(filename, "*.n???.*") or fnmatch.fnmatch(filename, "*.f???.*"):
105105

106+
# number of hours in repeat cycle for forecast
107+
if "wcofs" in filename:
108+
tshift = 24
109+
else:
110+
tshift = 6
111+
106112
# pull hours from filename
107113
regex = re.compile(".[n,f][0-9]{3}.")
108114
hour = int(regex.findall(filename)[0][2:-1])
109115

110116
# calculate hours
111117
dt = cycle + hour
112118

113-
# if nowcast file, subtract 6 hours
119+
# if nowcast file, subtract dt hours
114120
if fnmatch.fnmatch(filename, "*.n???.*"):
115-
dt -= 6
121+
dt -= tshift
116122

117123
# construct datetime. dt might be negative.
118124
date += pd.Timedelta(f"{dt} hours")
@@ -323,10 +329,12 @@ def filedates2df(filelocs):
323329
df = pd.DataFrame(index=filedates, data={"filenames": filenames})
324330

325331
# Sort resulting df by filenames and then by index which is the datetime of each file
326-
df = df.sort_values(axis="index", by="filenames").sort_index()
332+
df = df.reset_index().sort_values(by=["index", "filenames"]).set_index("index")
333+
334+
# df = df.sort_values(axis="index", by="filenames").sort_index()
327335

328336
# remove rows if index is duplicated, sorting makes it so nowcast files are kept
329-
df = df[~df.index.duplicated(keep="first")]
337+
df = df[~df.index.duplicated(keep="last")]
330338

331339
return df
332340

0 commit comments

Comments
 (0)