-
Notifications
You must be signed in to change notification settings - Fork 235
Description
Originally posted by @weiji14 in #2730 (comment)
I had a look at refactoring x2sys_cross to use virtualfiles instead of temporary files, but it's a little tricky because:
-
Input: Cannot pass in virtualfiles as input as mentioned at Wrap x2sys_init and x2sys_cross #546 (comment) and Passing in virtual files into the supplementary x2sys modules gmt#3717, since GMT doesn't support virtualfiles to X2SYS modules
-
Output: The
virtualfile_to_datasetmethod from clib: Add virtualfile_to_dataset method for converting virtualfile to a dataset #3083 was able to produce apandas.DataFrameoutput, but the column names were missing. The logic for handlingx2sys_cross's output is actually complicated:pygmt/pygmt/src/x2sys_cross.py
Lines 231 to 250 in bcbbcad
# Read temporary csv output to a pandas table if outfile == tmpfile.name: # if outfile isn't set, return pd.DataFrame # Read the tab-separated ASCII table date_format_kwarg = ( {"date_format": "ISO8601"} if Version(pd.__version__) >= Version("2.0.0") else {} ) table = pd.read_csv( tmpfile.name, sep="\t", header=2, # Column names are on 2nd row comment=">", # Skip the 3rd row with a ">" parse_dates=[2, 3], # Datetimes on 3rd and 4th column **date_format_kwarg, # Parse dates in ISO8601 format on pandas>=2 ) # Remove the "# " from "# x" in the first column table = table.rename(columns={table.columns[0]: table.columns[0][2:]}) elif outfile != tmpfile.name: # if outfile is set, output in outfile only table = None
Important things to handle are:
- Datetime columns need to be parsed correctly as
datetime64dtype x2sys_crossmay output multi-segment parts (see https://docs.generic-mapping-tools.org/6.5/supplements/x2sys/x2sys_cross.html#remarks) when multiple tracks are passed in and-Qe(external COEs) is selected. Unsure how this is handled in GMT virtualfiles (note that we actually just merge all the multi-segments into one table when pandas.DataFrame output is selected, output to file will preserve the segments though).- Last two column names can either be
z_X/z_Morz_1/z_2depending on whether trackvalues/-Zargument is set.
It should be possible to handle 1 and 3 somehow, but I'm not so sure about 2 since it will involve checking how GMT outputs virtualfiles in x2sys_cross. We'll need to do some careful checking to ensure the refactoring doesn't modify the output and makes it incorrect.