-
Notifications
You must be signed in to change notification settings - Fork 147
Fix compatibility with Julia 1.13+ memhash removal #1170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vtjnash
wants to merge
780
commits into
JuliaData:main
Choose a base branch
from
vtjnash:fix-memhash-compat
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Improve CSV.Rows performance
* Added some code comments to help clarify things * Update out-dated variable name usage (e.g. tapes) * Cleaned up dependencies (WeakRefStrings is test-only now) * Added lots of tests to increase coverage * Made multithreaded chunk identification more robust by checking we have correct # of columns for 5 consecutive rows instead of just 1 * Made sure we sync Int64 sentinels in multithreaded parsing * Removed some unused functions * Made sure we're testing type promoting when multithreaded parsing * Add a `tasks::Integer` keyword argument to allow controlling how many tasks will be spawned for multithreaded parsing * Clean up keyword arg docs
Lots of cleanup
Add CSV.Chunks for iterating over chunks of large files
Fixes JuliaData#464 (or at least improves it quite a bit). A new precompile.jl file is a script I ran to get some precompile statements for CSV.jl, Parsers.jl, and SentinelArrays.jl (which seem to be the biggest targets and ones that live in JuliaData). The Parsers.jl output ended up being insignificant for now, so that wasn't committed, but SentinelArrays.jl was. I've added two new "precompile.csv" and "precompile_small.csv" files that are used for snooping; they include a column of each type, which should hopefully cover a good chunk of codepaths we're compiling. We can ajust more later if there are certain paths that could use it and are causing people problems. All in all, this cuts TTFP (time-to-first-parse) in half on my machine.
Add some precompiles
Fixes JuliaData#668. The issue here is that when column names were passed manually, the code path that "skipped" to the datarow passed in the starting position as 1 instead of `pos` variable. This used to not be an issue because the `pos` was almost always 1 anyway. With `IOBuffer`, we now start `pos` at `io.ptr`, so we'll have more cases where it's critical to start reading at right position.
When column names passed manually, ensure we respect starting position
I've wanted to do this for a while; previously we were only using the estimate from the first 10 rows. This hooks into the "chunking" code, which looks at `tasks` # of chunks of a file to find the start of rows for each; we now keep track of the # of bytes we saw when doing those row checks and use those totals plus the original 10 rows to form a better estimate of the total # of rows.
Improve accuracy of estimated rows for multithreaded parsing
Make the automatic pooled=>string column promotion more efficient
…r invalid rows
added documentation for the dateformats option
Fixes JuliaData#679; alternative fix to JuliaData#681. When a column is dropped, we essentially turn it into a `Missing` column type and ignore it when parsing. There was a check later in file parsing, however, that said if no missing values were found in a column, to ensure its type is `Vector{T}` instead of `Vector{Union{Missing, T}}`. The core problem in issue JuliaData#679 was that these dropped columns, while completely `missing`, didn't get "flagged" as having `missing` values.
Fixes JuliaData#680. Before custom types, the `typemap` keyword argument was really only about mapping between the standard, supported types. With custom types, we still only support certain type mappings (Int to Float, Any type to String), but we also want to support type mappings like `Int64 => Int32`. This PR readjusts how typemap works when detecting column types to account for the possiblity of custom Integer or AbstractFloat type mappints for Int64 & Float64, and moving directly to String if that's specified.
Ensure dropped columns are ignored in later file processing
* support groupmark * add more documentation --------- Co-authored-by: Lilith Hafner <Lilith.Hafner@gmail.com>
* support for IOBuffer containing memory * fix errors caught in tests * use Base.wrap if available --------- Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>
* fix breakage caused by JuliaLang/julia/pull/53896 * make __wrap compatible with 1.11 RC
* Update to 0.10.14 * Update julia setup action * Add dependabot
* finalize memory if 1.11 * don't download busybox in test * test windows on lts
* fix decchar handling in writecell() for AbstractFloat * test for JuliaData#1109 fix decchar handling in writecell() for AbstractFloat * fix newline format --------- Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
* fix INT128_MIN write * add write INT128_MIN test * use Base.uabs Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com> * add BigInt test --------- Co-authored-by: Nathan Zimmerberg <39104088+nhz2@users.noreply.github.com>
* Bump codecov/codecov-action from 4 to 5 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4...v5) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Update .github/workflows/ci.yml Co-authored-by: Chengyu Han <cyhan.dev@outlook.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com> Co-authored-by: Chengyu Han <cyhan.dev@outlook.com>
Remove hash method definition when Base.memhash is not available. On Julia 1.13+, these AbstractString types will use the default AbstractString hash implementation which is now efficient and zero-copy based on codeunit/iterate. For Julia <1.13, continue using the memhash-based implementation for compatibility. Related to JuliaLang/julia#59697 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1170 +/- ##
=======================================
Coverage 90.55% 90.55%
=======================================
Files 9 9
Lines 2319 2319
=======================================
Hits 2100 2100
Misses 219 219 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove hash method definition when Base.memhash is not available.
On Julia 1.13+, these AbstractString types will use the default
AbstractString hash implementation which is now efficient and
zero-copy based on codeunit/iterate.
For Julia <1.13, continue using the memhash-based implementation
for compatibility.
Related to JuliaLang/julia#59697
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com