ARROW-9718: [Python] ParquetWriter to work with new FileSystem API#7991
ARROW-9718: [Python] ParquetWriter to work with new FileSystem API#7991jorisvandenbossche wants to merge 8 commits into
Conversation
089d84e to
fb9a773
Compare
37ac79f to
12017a5
Compare
pitrou
left a comment
There was a problem hiding this comment.
Thank you @jorisvandenbossche . A couple of comments below.
There was a problem hiding this comment.
If it's the name of an argument, then put backquotes around it.
There was a problem hiding this comment.
Yeah, this was copy pasted from the implementation in pyarrow.filesystem, but agree it can be improved. Will update.
The keyword name itself may vary depending on where this helper function is called, so will keep it on a general "the specified path" or so.
There was a problem hiding this comment.
These are legacy filesystem imports? Do we still need them?
There was a problem hiding this comment.
Perhaps import pyarrow.filesystem as legacyfs would make the code easier to read below.
There was a problem hiding this comment.
Yes, we still need them because the full ParquetDataset/ParquetManifest (python) implementation here is based on the legacy filesystems.
But switched to use legacyfs. for the old ones, and plain imports for the new ones
There was a problem hiding this comment.
Also from pyarrow import filesystem as legacyfs?
There was a problem hiding this comment.
Here I am going to leave it as is for now, because the old ones are still used a lot (would make the diff much larger, will keep that for a next PR, eg when actually deprecating)
There was a problem hiding this comment.
This must be lifted out of the with block.
There was a problem hiding this comment.
Note that it may be simpler to use pytest.raises(ValueError, match="...")
There was a problem hiding this comment.
Yes, indeed, was copied from another test, but updated to use match
There was a problem hiding this comment.
Should we also test ParquetWriter(path=uri)?
There was a problem hiding this comment.
I added one, but it is segfaulting locally .. (maybe similar as ARROW-9814)
There was a problem hiding this comment.
If you pass -s to pytest, you should be able to see the C++ crash message (if any).
There was a problem hiding this comment.
OK to merge it with the commented out test for now? (opened issue for it at https://issues.apache.org/jira/browse/ARROW-9906)
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Thanks for the review! Pushed some updates
There was a problem hiding this comment.
Yeah, this was copy pasted from the implementation in pyarrow.filesystem, but agree it can be improved. Will update.
The keyword name itself may vary depending on where this helper function is called, so will keep it on a general "the specified path" or so.
There was a problem hiding this comment.
Yes, we still need them because the full ParquetDataset/ParquetManifest (python) implementation here is based on the legacy filesystems.
But switched to use legacyfs. for the old ones, and plain imports for the new ones
There was a problem hiding this comment.
Here I am going to leave it as is for now, because the old ones are still used a lot (would make the diff much larger, will keep that for a next PR, eg when actually deprecating)
There was a problem hiding this comment.
I added one, but it is segfaulting locally .. (maybe similar as ARROW-9814)
There was a problem hiding this comment.
Yes, indeed, was copied from another test, but updated to use match
b772535 to
5bafd1d
Compare
|
@jorisvandenbossche Do you want to merge this? |
|
Yep |
No description provided.