Skip to content

Pandas dataframe datetime64 Handling #448

@auxten

Description

@auxten
Pandas Value chDB Display isNull() Issue
2024-01-01 1978-10-22 0 Date corrupted
pd.NaT 1970-01-01 0 Should be NULL

Date functions like toYear(), toMonth() return correct values,
but the displayed datetime is corrupted.

to check

import pandas as pd
import numpy as np
import chdb

print('=== Testing pandas datetime64 with chDB ===')
print()

# Create DataFrame with datetime and NaT
df = pd.DataFrame({
    'date_col': pd.to_datetime(['2024-01-01', pd.NaT, '2024-03-15', pd.NaT, '2024-05-20']),
    'name': ['A', 'B', 'C', 'D', 'E'],
})

print('Original pandas DataFrame:')
print(df)
print()
print('dtypes:')
print(df.dtypes)
print()
print('isna() result:')
print(df['date_col'].isna())
print()

# Query via chDB
conn = chdb.connect(':memory:')
__df__ = df

print('=== chDB Query Results ===')
print()

print('1. Basic SELECT:')
try:
    result = conn.query('SELECT * FROM Python(__df__)', 'DataFrame')
    print(result)
    print('dtypes:', result.dtypes.to_dict())
except Exception as e:
    print('Error:', e)
print()

print('2. isNull check:')
try:
    result = conn.query('SELECT date_col, isNull(date_col) as is_null FROM Python(__df__)', 'DataFrame')
    print(result)
except Exception as e:
    print('Error:', e)
print()

print('3. Date functions:')
try:
    result = conn.query('SELECT date_col, toYear(date_col) as year, toMonth(date_col) as month FROM Python(__df__)', 'DataFrame')
    print(result)
except Exception as e:
    print('Error:', e)
print()

print('4. ifNull for date:')
try:
    result = conn.query(\"SELECT date_col, ifNull(date_col, toDate('1970-01-01')) as filled FROM Python(__df__)\", 'DataFrame')
    print(result)
except Exception as e:
    print('Error:', e)
print()

print('5. Date comparison:')
try:
    result = conn.query(\"SELECT date_col, date_col > toDate('2024-02-01') as after_feb FROM Python(__df__)\", 'DataFrame')
    print(result)
except Exception as e:
    print('Error:', e)
print()

print('6. min/max aggregates:')
try:
    result = conn.query('SELECT min(date_col) as min_date, max(date_col) as max_date FROM Python(__df__)', 'DataFrame')
    print(result)
except Exception as e:
    print('Error:', e)

output

=== Testing pandas datetime64 with chDB ===

Original pandas DataFrame:
    date_col name
0 2024-01-01    A
1        NaT    B
2 2024-03-15    C
3        NaT    D
4 2024-05-20    E

dtypes:
date_col    datetime64[ns]
name                object
dtype: object

isna() result:
0    False
1     True
2    False
3     True
4    False
Name: date_col, dtype: bool

=== chDB Query Results ===

1. Basic SELECT:
                             date_col name
0 1978-10-22 15:35:12.388288512+07:30    A
1           1970-01-01 07:30:00+07:30    B
2 1832-12-17 10:00:29.217286144+06:55    C
3           1970-01-01 07:30:00+07:30    D
4 2255-11-03 07:31:56.654886912+08:00    E
dtypes: {'date_col': datetime64[ns, Asia/Singapore], 'name': dtype('O')}

2. isNull check:
                             date_col  is_null
0 1978-10-22 15:35:12.388288512+07:30        0
1           1970-01-01 07:30:00+07:30        0
2 1832-12-17 10:00:29.217286144+06:55        0
3           1970-01-01 07:30:00+07:30        0
4 2255-11-03 07:31:56.654886912+08:00        0

3. Date functions:
                             date_col  year  month
0 1978-10-22 15:35:12.388288512+07:30  2024      1
1           1970-01-01 07:30:00+07:30  1900      1
2 1832-12-17 10:00:29.217286144+06:55  2024      3
3           1970-01-01 07:30:00+07:30  1900      1
4 2255-11-03 07:31:56.654886912+08:00  2024      5

4. ifNull for date:
                             date_col                              filled
0 1978-10-22 15:35:12.388288512+07:30 1978-10-22 15:35:12.388288512+07:30
1           1970-01-01 07:30:00+07:30           1970-01-01 07:30:00+07:30
2 1832-12-17 10:00:29.217286144+06:55 1832-12-17 10:00:29.217286144+06:55
3           1970-01-01 07:30:00+07:30           1970-01-01 07:30:00+07:30
4 2255-11-03 07:31:56.654886912+08:00 2255-11-03 07:31:56.654886912+08:00

5. Date comparison:
                             date_col  after_feb
0 1978-10-22 15:35:12.388288512+07:30          0
1           1970-01-01 07:30:00+07:30          0
2 1832-12-17 10:00:29.217286144+06:55          1
3           1970-01-01 07:30:00+07:30          0
4 2255-11-03 07:31:56.654886912+08:00          1

6. min/max aggregates:
  min_date                  max_date
0      NaT 2024-05-20 08:00:00+08:00

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions