CSV reader cannot parse dates or times

### Describe the bug, including details regarding any error messages, version, and platform.

I thought I would gather the same issues below..

Right now if you have any dates or times in a CSV file,  the CSV reader will only successfully parse them if they are in a ISO format like YYYY-MM-DD or HH:MM:SS. Alternative formats like DD-MM-YYYY or MM/DD/YYYY or -H:MM:SS will just fail..
There is a timestamp_parsers option in CSV.ConvertOptions, but those formats only work on timestamp[x] columns.

The error below is common because only YYYY-MM-DD formatted strings are convertable to dates.
**CSV conversion error to date32[day]: invalid value '01-20-2000'**
Adding "%m-%d-%Y %H:%M:%S" or "%m-%d-%Y" to timestamp_parsers doesn't do anything
since this is a date32[day] column.

https://github.com/apache/arrow/issues/26224
https://github.com/apache/arrow/issues/28303
https://github.com/apache/arrow/issues/33357
https://github.com/apache/arrow/issues/37180

I think the best solution is to add date_parsers and time_parsers options to CSV.ConvertOptions..

I have a current hack I implemented to be able to parse DATEs out of CSV files..
If the dataset schema being used to read a CSV file has any column data types that start with "DATE"..
Change the schema and replace any date columns with a timestamp type.
Include the alternative Date formats in timestamp_parsers.. i.e. "%d-%m-%Y", "%m/%d/%Y", etc..
Read the CSV file which will read the date string values in as timestamps..
Convert the timestamp[s] columns of the result back to date32/64[day] using pyarrow.compute.cast().

**Code to swap out date columns with timestamp columns in a schema for dataset api**
```
        new_fields = []
        for field in self.arrow_schema.names:
            new_field = self.arrow_schema.field(field)
            if str(new_field.type).startswith("date"):
                new_fields.append(pa.field(field, pa.timestamp("s")))
            else:
                new_fields.append(self.arrow_schema.field(field))
        new_schema = pa.schema(new_fields)

```

**Expression code to cast timestamp columns to dates when reading CSV files using dataset.to_table**
```
                # convert column list into column dict selection
                if isinstance(columns, List):
                    columns = {column: dataset.field(column) for column in columns}

                # cast timestamps to date32 or date64 in schema definition
                columns = {
                    column: (
                        dataset.field(column).cast(
                            str(self.arrow_schema.field(column).type)
                        )
                        if column in self.arrow_schema.names
                        and str(self.arrow_schema.field(column).type).startswith("date")
                        else expr
                    )
                    for column, expr in columns.items()
                }

```




### Component(s)

C++, Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CSV reader cannot parse dates or times #41488

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

CSV reader cannot parse dates or times #41488

Description

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions