Skip to content

x2sys_cross: Refactor to get rid of temporary files and have consistent table-like output behavior #3160

@seisman

Description

@seisman

Originally posted by @weiji14 in #2730 (comment)

I had a look at refactoring x2sys_cross to use virtualfiles instead of temporary files, but it's a little tricky because:

  • Input: Cannot pass in virtualfiles as input as mentioned at Wrap x2sys_init and x2sys_cross #546 (comment) and Passing in virtual files into the supplementary x2sys modules gmt#3717, since GMT doesn't support virtualfiles to X2SYS modules

  • Output: The virtualfile_to_dataset method from clib: Add virtualfile_to_dataset method for converting virtualfile to a dataset #3083 was able to produce a pandas.DataFrame output, but the column names were missing. The logic for handling x2sys_cross's output is actually complicated:

    # Read temporary csv output to a pandas table
    if outfile == tmpfile.name: # if outfile isn't set, return pd.DataFrame
    # Read the tab-separated ASCII table
    date_format_kwarg = (
    {"date_format": "ISO8601"}
    if Version(pd.__version__) >= Version("2.0.0")
    else {}
    )
    table = pd.read_csv(
    tmpfile.name,
    sep="\t",
    header=2, # Column names are on 2nd row
    comment=">", # Skip the 3rd row with a ">"
    parse_dates=[2, 3], # Datetimes on 3rd and 4th column
    **date_format_kwarg, # Parse dates in ISO8601 format on pandas>=2
    )
    # Remove the "# " from "# x" in the first column
    table = table.rename(columns={table.columns[0]: table.columns[0][2:]})
    elif outfile != tmpfile.name: # if outfile is set, output in outfile only
    table = None

Important things to handle are:

  1. Datetime columns need to be parsed correctly as datetime64 dtype
  2. x2sys_cross may output multi-segment parts (see https://docs.generic-mapping-tools.org/6.5/supplements/x2sys/x2sys_cross.html#remarks) when multiple tracks are passed in and -Qe (external COEs) is selected. Unsure how this is handled in GMT virtualfiles (note that we actually just merge all the multi-segments into one table when pandas.DataFrame output is selected, output to file will preserve the segments though).
  3. Last two column names can either be z_X/z_M or z_1/z_2 depending on whether trackvalues/-Z argument is set.

It should be possible to handle 1 and 3 somehow, but I'm not so sure about 2 since it will involve checking how GMT outputs virtualfiles in x2sys_cross. We'll need to do some careful checking to ensure the refactoring doesn't modify the output and makes it incorrect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintenanceBoring but important stuff for the core devs

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions