Skip to content

ARROW-11243: [C++] Recognize time types in CSV files#10782

Closed
pitrou wants to merge 2 commits into
apache:masterfrom
pitrou:ARROW-11243-csv-times
Closed

ARROW-11243: [C++] Recognize time types in CSV files#10782
pitrou wants to merge 2 commits into
apache:masterfrom
pitrou:ARROW-11243-csv-times

Conversation

@pitrou

@pitrou pitrou commented Jul 22, 2021

Copy link
Copy Markdown
Member
  • Allow reading CSV columns as time32 and time64
  • Automatically infer "hh:mm" and "hh:mm:ss" as time32[s]

@pitrou pitrou requested review from bkietz and nealrichardson July 22, 2021 16:32
@github-actions

Copy link
Copy Markdown

@pitrou

pitrou commented Jul 22, 2021

Copy link
Copy Markdown
Member Author

@ursabot please benchmark

@ursabot

ursabot commented Jul 22, 2021

Copy link
Copy Markdown

Benchmark runs are scheduled for baseline = 169b249 and contender = 25b5c1e445f04b53a0beb40afabb3e8012c31f50. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2 (mimalloc)
[Finished ⬇️0.0% ⬆️1.53%] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.91% ⬆️0.1%] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@bkietz bkietz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I'd just like the unit tests for inference to be expanded a bit

Comment thread cpp/src/arrow/csv/column_builder_test.cc Outdated

@nealrichardson nealrichardson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, just one question about time64

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be an inferring test that results in time64?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is no inference to time64. Should there be one (for times with nanosecond precision perhaps)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, I was looking at the wrong case statement, you're right. Maybe there should be one to catch any times with fractional seconds, though I don't feel strongly about it since you can explicitly declare the type/schema you want.

@pitrou pitrou force-pushed the ARROW-11243-csv-times branch from 25b5c1e to 301c1c1 Compare July 26, 2021 13:10
@pitrou

pitrou commented Jul 26, 2021

Copy link
Copy Markdown
Member Author

(rebased)

@westonpace

Copy link
Copy Markdown
Member

This could be a future JIRA or "wait until it's asked for" but technically ISO-8601 allows the omission of the colons if you add a leading T but I've never seen it in practice...

Thh
Thhmm
Thhmmss
Thhmmss.sss

pitrou added 2 commits July 27, 2021 09:57
* Allow reading CSV columns as time32 and time64
* Automatically infer "hh:mm" and "hh:mm:ss" as time32[s]
@pitrou pitrou force-pushed the ARROW-11243-csv-times branch from 301c1c1 to b103dd8 Compare July 27, 2021 08:04
@pitrou

pitrou commented Jul 27, 2021

Copy link
Copy Markdown
Member Author

@westonpace I'd rather wait for someone to request it.

@pitrou pitrou closed this in 2a2c330 Jul 27, 2021
@pitrou pitrou deleted the ARROW-11243-csv-times branch July 27, 2021 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants