Describe the bug, including details regarding any error messages, version, and platform.
import pyarrow as pa
import pyarrow.compute as pc
import duckdb
print(duckdb.sql(
"""
from values (timestamp '1970-01-01') df(a)
select time_bucket('3 years', "a", timestamp '1970-01-01')
"""
))
print(pc.floor_temporal(pa.array([datetime(1970, 1, 1)]), 3, 'year'))
Outputs:
┌────────────────────────────────────────────────────────────┐
│ time_bucket('3 years', a, CAST('1970-01-01' AS TIMESTAMP)) │
│ timestamp │
├────────────────────────────────────────────────────────────┤
│ 1970-01-01 00:00:00 │
└────────────────────────────────────────────────────────────┘
[
1968-01-01 00:00:00.000000
]
The DuckDB output differs from the PyArrow one. Given that the pyarrow docs say
By default, the origin is 1970-01-01T00:00:00.
I would expect it to be aligned with DuckDB when specifying timestamp '1970-01-01' as origin.
In fact, if I use 36, 'month', then PyArrow also returns '1970-01-01'. The fact that 3, 'year' differs from 3*12, 'month' suggests to me that there's a bug
In [6]: pc.floor_temporal(arr, 3, 'year')
Out[6]:
<pyarrow.lib.TimestampArray object at 0x7fe44180fca0>
[
1968-01-01 00:00:00.000000
]
In [7]: pc.floor_temporal(arr, 3*12, 'month')
Out[7]:
<pyarrow.lib.TimestampArray object at 0x7fe443a39540>
[
1970-01-01 00:00:00.000000
]
Component(s)
Python
Describe the bug, including details regarding any error messages, version, and platform.
Outputs:
The DuckDB output differs from the PyArrow one. Given that the pyarrow docs say
I would expect it to be aligned with DuckDB when specifying
timestamp '1970-01-01'as origin.In fact, if I use
36, 'month', then PyArrow also returns'1970-01-01'. The fact that3, 'year'differs from3*12, 'month'suggests to me that there's a bugComponent(s)
Python