Skip to content

Implement support for reading CSV files with charsets other than UTF-8#9468

Open
Rafferty97 wants to merge 8 commits intoapache:mainfrom
Rafferty97:non-utf8-csv2
Open

Implement support for reading CSV files with charsets other than UTF-8#9468
Rafferty97 wants to merge 8 commits intoapache:mainfrom
Rafferty97:non-utf8-csv2

Conversation

@Rafferty97
Copy link
Contributor

@Rafferty97 Rafferty97 commented Feb 23, 2026

Implement support for reading CSV files with charsets other than UTF-8, via an optional dependency on encoding_rs and a corresponding configuration option.

Which issue does this PR close?

Closes #9465

What changes are included in this PR?

  • Add optional dependency on encoding_rs
  • Add configuration option to CSV reader called "encoding"
  • When an encoding is set, input data is pre-processed before being handed to csv-core

Are these changes tested?

I have written tests that exercise the decoder on windows-1252 and Shift-JIS encoded CSV files, with various batch and buffer sizes to ensure that the various buffering mechanisms are working. I'm fairly confident in the test coverage, but open to suggestions for making the tests more resilient.

Are there any user-facing changes?

The public API is only changed when the new optional feature is enabled, and even then, it's just a new optional configuration parameter.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 23, 2026
@Rafferty97 Rafferty97 marked this pull request as ready for review February 24, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support CSV files encoded with charsets other than UTF-8

1 participant