Skip to content

[Python] Specify behavior for converting tz-aware datetime.datetime objects to Arrow format #17080

Description

@asfimport

The original description is at pandas-dev/pandas#32587

Code Sample, a copy-pastable example if possible

import pandas as pd
from datetime import datetime, timezone

df = pd.DataFrame.from_records([
    (1, datetime.now().replace(tzinfo=timezone.utc)),
    (2, datetime.now().replace(tzinfo=timezone.min))],
    columns=["1", "2"])

print(df["2"])
print()

df.to_feather("/tmp/1") 
df2 = pd.read_feather("/tmp/1")

print(df2["2"])

This code will output:


0    2020-03-10 18:13:49.405598+00:00
1    2020-03-10 18:13:49.405626-23:59
Name: 2, dtype: object

0   2020-03-10 18:13:49.405598
1   2020-03-10 18:13:49.405626
Name: 2, dtype: datetime64[ns]

Problem description

The round-trip dtype changed from the correct object to incorrect datetime64. Thus the timezones were discarded in Arrow and the timestamps became invalid.

Expected Output

(identical)


0    2020-03-10 18:13:49.405598+00:00
1    2020-03-10 18:13:49.405626-23:59
Name: 2, dtype: object

0    2020-03-10 18:13:49.405598+00:00
1    2020-03-10 18:13:49.405626-23:59
Name: 2, dtype: object

Output of pd.show_versions()


INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.5.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.3.0-40-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.1
numpy            : 1.17.4
pytz             : 2019.2
dateutil         : 2.7.3
pip              : 19.3.1
setuptools       : 42.0.1
Cython           : 0.29.14
pytest           : 5.3.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.5.0
html5lib         : None
pymysql          : None
psycopg2         : 2.8.4 (dt dec pq3 ext lo64)
jinja2           : 2.10.3
IPython          : 7.10.0
pandas_datareader: None
bs4              : 4.8.1
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.5.0
matplotlib       : 3.1.2
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.16.0
pytables         : None
pytest           : 5.3.1
pyxlsb           : None
s3fs             : None
scipy            : 1.2.1
sqlalchemy       : 1.3.12
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

Reporter: Markovtsev Vadim

Related issues:

Note: This issue was originally created as ARROW-8066. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions