Skip to content

Conversation

@vtjnash
Copy link

@vtjnash vtjnash commented Oct 3, 2025

Remove hash method definition when Base.memhash is not available.
On Julia 1.13+, these AbstractString types will use the default
AbstractString hash implementation which is now efficient and
zero-copy based on codeunit/iterate.

For Julia <1.13, continue using the memhash-based implementation
for compatibility.

Related to JuliaLang/julia#59697

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

quinnj and others added 30 commits June 26, 2020 14:02
* Added some code comments to help clarify things
* Update out-dated variable name usage (e.g. tapes)
* Cleaned up dependencies (WeakRefStrings is test-only now)
* Added lots of tests to increase coverage
* Made multithreaded chunk identification more robust by checking we
have correct # of columns for 5 consecutive rows instead of just 1
* Made sure we sync Int64 sentinels in multithreaded parsing
* Removed some unused functions
* Made sure we're testing type promoting when multithreaded parsing
* Add a `tasks::Integer` keyword argument to allow controlling how many
tasks will be spawned for multithreaded parsing
* Clean up keyword arg docs
Add CSV.Chunks for iterating over chunks of large files
Fixes JuliaData#464 (or at least improves it quite a bit).

A new precompile.jl file is a script I ran to get some precompile
statements for CSV.jl, Parsers.jl, and SentinelArrays.jl (which seem to
be the biggest targets and ones that live in JuliaData). The Parsers.jl
output ended up being insignificant for now, so that wasn't committed,
but SentinelArrays.jl was. I've added two new "precompile.csv" and
"precompile_small.csv" files that are used for snooping; they include a
column of each type, which should hopefully cover a good chunk of
codepaths we're compiling. We can ajust more later if there are certain
paths that could use it and are causing people problems.

All in all, this cuts TTFP (time-to-first-parse) in half on my machine.
Fixes JuliaData#668. The issue here is that when column names were passed
manually, the code path that "skipped" to the datarow passed in the
starting position as 1 instead of `pos` variable. This used to not be an
issue because the `pos` was almost always 1 anyway. With `IOBuffer`, we
now start `pos` at `io.ptr`, so we'll have more cases where it's
critical to start reading at right position.
When column names passed manually, ensure we respect starting position
I've wanted to do this for a while; previously we were only using the
estimate from the first 10 rows. This hooks into the "chunking" code,
which looks at `tasks` # of chunks of a file to find the start of rows
for each; we now keep track of the # of bytes we saw when doing those
row checks and use those totals plus the original 10 rows to form a
better estimate of the total # of rows.
Improve accuracy of estimated rows for multithreaded parsing
Make the automatic pooled=>string column promotion more efficient
added documentation for the dateformats option
Fixes JuliaData#679; alternative fix to JuliaData#681. When a column is dropped, we
essentially turn it into a `Missing` column type and ignore it when
parsing. There was a check later in file parsing, however, that said if
no missing values were found in a column, to ensure its type is
`Vector{T}` instead of `Vector{Union{Missing, T}}`. The core problem in
issue JuliaData#679 was that these dropped columns, while completely `missing`,
didn't get "flagged" as having `missing` values.
Fixes JuliaData#680. Before custom types, the `typemap` keyword argument was
really only about mapping between the standard, supported types. With
custom types, we still only support certain type mappings (Int to Float,
Any type to String), but we also want to support type mappings like
`Int64 => Int32`. This PR readjusts how typemap works when detecting
column types to account for the possiblity of custom Integer or
AbstractFloat type mappints for Int64 & Float64, and moving directly to
String if that's specified.
Ensure dropped columns are ignored in later file processing
LilithHafner and others added 26 commits June 4, 2023 22:50
* support groupmark

* add more documentation

---------

Co-authored-by: Lilith Hafner <Lilith.Hafner@gmail.com>
* support for IOBuffer containing memory

* fix errors caught in tests

* use Base.wrap if available

---------

Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>
* fix breakage caused by JuliaLang/julia/pull/53896

* make __wrap compatible with 1.11 RC
* Update to 0.10.14

* Update julia setup action

* Add dependabot
* finalize memory if 1.11

* don't download busybox in test

* test windows on lts
* fix decchar handling in writecell() for AbstractFloat

* test for JuliaData#1109 fix decchar handling in writecell() for AbstractFloat

* fix newline format

---------

Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
* fix INT128_MIN write

* add write INT128_MIN test

* use Base.uabs

Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>

* add BigInt test

---------

Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>
* Bump codecov/codecov-action from 4 to 5

Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v4...v5)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update .github/workflows/ci.yml

Co-authored-by: Chengyu Han <cyhan.dev@outlook.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
Co-authored-by: Chengyu Han <cyhan.dev@outlook.com>
Remove hash method definition when Base.memhash is not available.
On Julia 1.13+, these AbstractString types will use the default
AbstractString hash implementation which is now efficient and
zero-copy based on codeunit/iterate.

For Julia <1.13, continue using the memhash-based implementation
for compatibility.

Related to JuliaLang/julia#59697

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.55%. Comparing base (04ec1cf) to head (f346ccf).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1170   +/-   ##
=======================================
  Coverage   90.55%   90.55%           
=======================================
  Files           9        9           
  Lines        2319     2319           
=======================================
  Hits         2100     2100           
  Misses        219      219           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.