Fix compatibility with Julia 1.13+ memhash removal #1170

vtjnash · 2025-10-03T18:54:56Z

Remove hash method definition when Base.memhash is not available.
On Julia 1.13+, these AbstractString types will use the default
AbstractString hash implementation which is now efficient and
zero-copy based on codeunit/iterate.

For Julia <1.13, continue using the memhash-based implementation
for compatibility.

Related to JuliaLang/julia#59697

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Improve CSV.Rows performance

* Added some code comments to help clarify things * Update out-dated variable name usage (e.g. tapes) * Cleaned up dependencies (WeakRefStrings is test-only now) * Added lots of tests to increase coverage * Made multithreaded chunk identification more robust by checking we have correct # of columns for 5 consecutive rows instead of just 1 * Made sure we sync Int64 sentinels in multithreaded parsing * Removed some unused functions * Made sure we're testing type promoting when multithreaded parsing * Add a `tasks::Integer` keyword argument to allow controlling how many tasks will be spawned for multithreaded parsing * Clean up keyword arg docs

Lots of cleanup

Add CSV.Chunks for iterating over chunks of large files

Fixes JuliaData#464 (or at least improves it quite a bit). A new precompile.jl file is a script I ran to get some precompile statements for CSV.jl, Parsers.jl, and SentinelArrays.jl (which seem to be the biggest targets and ones that live in JuliaData). The Parsers.jl output ended up being insignificant for now, so that wasn't committed, but SentinelArrays.jl was. I've added two new "precompile.csv" and "precompile_small.csv" files that are used for snooping; they include a column of each type, which should hopefully cover a good chunk of codepaths we're compiling. We can ajust more later if there are certain paths that could use it and are causing people problems. All in all, this cuts TTFP (time-to-first-parse) in half on my machine.

Add some precompiles

Fixes JuliaData#668. The issue here is that when column names were passed manually, the code path that "skipped" to the datarow passed in the starting position as 1 instead of `pos` variable. This used to not be an issue because the `pos` was almost always 1 anyway. With `IOBuffer`, we now start `pos` at `io.ptr`, so we'll have more cases where it's critical to start reading at right position.

When column names passed manually, ensure we respect starting position

I've wanted to do this for a while; previously we were only using the estimate from the first 10 rows. This hooks into the "chunking" code, which looks at `tasks` # of chunks of a file to find the start of rows for each; we now keep track of the # of bytes we saw when doing those row checks and use those totals plus the original 10 rows to form a better estimate of the total # of rows.

Improve accuracy of estimated rows for multithreaded parsing

Make the automatic pooled=>string column promotion more efficient

…r invalid rows

added documentation for the dateformats option

Fixes JuliaData#679; alternative fix to JuliaData#681. When a column is dropped, we essentially turn it into a `Missing` column type and ignore it when parsing. There was a check later in file parsing, however, that said if no missing values were found in a column, to ensure its type is `Vector{T}` instead of `Vector{Union{Missing, T}}`. The core problem in issue JuliaData#679 was that these dropped columns, while completely `missing`, didn't get "flagged" as having `missing` values.

Fixes JuliaData#680. Before custom types, the `typemap` keyword argument was really only about mapping between the standard, supported types. With custom types, we still only support certain type mappings (Int to Float, Any type to String), but we also want to support type mappings like `Int64 => Int32`. This PR readjusts how typemap works when detecting column types to account for the possiblity of custom Integer or AbstractFloat type mappints for Int64 & Float64, and moving directly to String if that's specified.

Ensure dropped columns are ignored in later file processing

* support groupmark * add more documentation --------- Co-authored-by: Lilith Hafner <Lilith.Hafner@gmail.com>

Ref: JuliaData#1093 (comment)

JuliaData#1123)

…liaData#1126)

* support for IOBuffer containing memory * fix errors caught in tests * use Base.wrap if available --------- Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>

* fix breakage caused by JuliaLang/julia/pull/53896 * make __wrap compatible with 1.11 RC

* Update to 0.10.14 * Update julia setup action * Add dependabot

* finalize memory if 1.11 * don't download busybox in test * test windows on lts

* fix decchar handling in writecell() for AbstractFloat * test for JuliaData#1109 fix decchar handling in writecell() for AbstractFloat * fix newline format --------- Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>

* fix INT128_MIN write * add write INT128_MIN test * use Base.uabs Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> * add BigInt test --------- Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>

* Bump codecov/codecov-action from 4 to 5 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4...v5) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Update .github/workflows/ci.yml Co-authored-by: Chengyu Han <cyhan.dev@outlook.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com> Co-authored-by: Chengyu Han <cyhan.dev@outlook.com>

Remove hash method definition when Base.memhash is not available. On Julia 1.13+, these AbstractString types will use the default AbstractString hash implementation which is now efficient and zero-copy based on codeunit/iterate. For Julia <1.13, continue using the memhash-based implementation for compatibility. Related to JuliaLang/julia#59697 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

codecov · 2025-10-03T19:00:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.55%. Comparing base (04ec1cf) to head (f346ccf).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1170   +/-   ##
=======================================
  Coverage   90.55%   90.55%           
=======================================
  Files           9        9           
  Lines        2319     2319           
=======================================
  Hits         2100     2100           
  Misses        219      219

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

quinnj and others added 30 commits June 26, 2020 14:02

Fix docs

2924126

Merge pull request JuliaData#663 from JuliaData/jq/rowperf

3d2ad1f

Improve CSV.Rows performance

Merge pull request JuliaData#664 from JuliaData/jq/cleanup

0075065

Lots of cleanup

Quick fix for testing file

d74c2fc

Add CSV.Chunks for iterating over chunks of large files

c0426a8

fix 32-bit

27a4af3

fix windows

f5023fd

fix travis

ed3dab6

Merge pull request JuliaData#665 from JuliaData/jq/chunks

bd4c8f3

Add CSV.Chunks for iterating over chunks of large files

Merge pull request JuliaData#666 from JuliaData/jq/precompile

138e323

Add some precompiles

Add SentinelArrays compat

4671e57

Merge pull request JuliaData#671 from JuliaData/jq/668

da0a5dc

When column names passed manually, ensure we respect starting position

Bump version

e9651d9

Fix tests

5b06308

Merge pull request JuliaData#673 from JuliaData/jq/estrows

ed48ad2

Improve accuracy of estimated rows for multithreaded parsing

Make the automatic pooled=>string column promotion more efficient

b9cb5d4

remove debug

2933624

fix nightly test

88cf907

Merge pull request JuliaData#676 from JuliaData/jq/pool

e21beb3

Make the automatic pooled=>string column promotion more efficient

Fix JuliaData#678 by ensuring pooled columns get missing value set fo…

478670f

…r invalid rows

Bump version

131f233

added documentation for the dateformats option

b96868a

Merge pull request JuliaData#682 from kragol/document_dateformats

d4ce5b6

added documentation for the dateformats option

Merge pull request JuliaData#683 from JuliaData/jq/679

c53b274

Ensure dropped columns are ignored in later file processing

LilithHafner and others added 26 commits June 4, 2023 22:50

Fix typo in reading.md (JuliaData#1094)

631e456

Support groupmark (JuliaData#1093)

03c22d9

* support groupmark * add more documentation --------- Co-authored-by: Lilith Hafner <Lilith.Hafner@gmail.com>

Update Project.toml

2f0e4a5

Bump Parsers compat to 2.5 (JuliaData#1097)

07fb6c2

Ref: JuliaData#1093 (comment)

Selectively reduce multithreaded parsing @error (JuliaData#1099)

4e6a332

Fix multithreaded fail on trailing empty column (JuliaData#1098)

bbaeec0

doc(examples.md): fix extraneous ``` (JuliaData#1100)

058fa68

docs: fixing Example.md render with @ref => @id (JuliaData#1106)

c81a1af

Add zenodo badge to README; fixes JuliaData#1112

cb1b411

typos (JuliaData#1119)

c6efb45

Update Project.toml

849f17f

Update keyworddocs.jl for limit to remove use of deprecated "threaded" (

00f5510

JuliaData#1123)

Add compat to Documenter.jl, use warnonly = Documenter.except() (Ju…

66a3a65

…liaData#1126)

support for IOBuffer containing Memory (JuliaData#1125)

141e2e4

* support for IOBuffer containing memory * fix errors caught in tests * use Base.wrap if available --------- Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>

Update Project.toml

ba1f4d2

Update ci.yml: Add mac aarch64 CI, codecov v4 (JuliaData#1127)

acd36a6

Fix breakage caused by JuliaLang/julia/pull/53896 (JuliaData#1133)

67424ce

* fix breakage caused by JuliaLang/julia/pull/53896 * make __wrap compatible with 1.11 RC

Update Project.toml to 0.10.14 (JuliaData#1134)

3d61294

* Update to 0.10.14 * Update julia setup action * Add dependabot

Fix CI badge in README

57eca79

Fix reading gzipped file in Julia 1.11 on Windows (JuliaData#1144)

80936af

* finalize memory if 1.11 * don't download busybox in test * test windows on lts

Bump version to 0.10.15

41a6875

fix INT128_MIN write (JuliaData#1152)

8207959

* fix INT128_MIN write * add write INT128_MIN test * use Base.uabs Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> * add BigInt test --------- Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>

Update examples.md to use ZipArchives (JuliaData#1158)

04ec1cf

vtjnash mentioned this pull request Oct 3, 2025

remove Base.memhash global JuliaLang/julia#59697

Merged

t-bltg mentioned this pull request Jan 11, 2026

restructure repository #1176

Closed

quinnj force-pushed the main branch from 04ec1cf to 4f8c505 Compare January 12, 2026 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compatibility with Julia 1.13+ memhash removal #1170

Fix compatibility with Julia 1.13+ memhash removal #1170

Uh oh!

vtjnash commented Oct 3, 2025

Uh oh!

codecov bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Fix compatibility with Julia 1.13+ memhash removal #1170

Are you sure you want to change the base?

Fix compatibility with Julia 1.13+ memhash removal #1170

Uh oh!

Conversation

vtjnash commented Oct 3, 2025

Uh oh!

codecov bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

codecov bot commented Oct 3, 2025 •

edited

Loading