Skip to content

[C++] Add a strptime option to control the cutoff between 1900 and 2000 when %y  #31951

Description

@asfimport

When parsing to datetime a string with year in the short format ({}%y{}), it would be great if we could have control over the cutoff point between 1900 and 2000. Currently it is implicitly set to 68:

library(arrow, warn.conflicts = FALSE)

a <- Array$create(c("68-05-17", "69-05-17"))
call_function("strptime", a, options = list(format = "%y-%m-%d", unit = 0L))
#> Array
#> <timestamp[s]>
#> [
#>   2068-05-17 00:00:00,
#>   1969-05-17 00:00:00
#> ]

For example, lubridate named this argument cutoff_2000 argument (e.g. for {}fast_strptime){}. This works as follows:

library(lubridate, warn.conflicts = FALSE)

dates_vector <- c("68-05-17", "69-05-17", "55-05-17")
fast_strptime(dates_vector, format = "%y-%m-%d")
#> [1] "2068-05-17 UTC" "1969-05-17 UTC" "2055-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 50)
#> [1] "1968-05-17 UTC" "1969-05-17 UTC" "1955-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 70)
#> [1] "2068-05-17 UTC" "2069-05-17 UTC" "2055-05-17 UTC"

In the lubridate::fast_strptime() documentation it is described as follows:

cutoff_2000
integer. For y format, two-digit numbers smaller or equal to cutoff_2000 are parsed as though starting with 20, otherwise parsed as though starting with 19. {}Available only for functions relying on lubridates internal parser{}.

Reporter: Dragoș Moldovan-Grünfeld / @dragosmg

Related issues:

Note: This issue was originally created as ARROW-16596. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions