ARROW-10420: [C++] Refactor io and filesystem APIs to take an IOContext#9474
ARROW-10420: [C++] Refactor io and filesystem APIs to take an IOContext#9474pitrou wants to merge 3 commits into
Conversation
|
A couple things to discuss:
@westonpace @bkietz input welcome. |
There was a problem hiding this comment.
Note: explicitly closing output files is preferrable, especially with remote filesystems where this might plausibly fail.
There was a problem hiding this comment.
(and MockFileSystem will deliberately not write anything out if you don't close it explicitly)
There was a problem hiding this comment.
Same comment here about explicitly closing files.
There was a problem hiding this comment.
Note that this was scheduling file copies on the CPU thread pool.
443569d to
bb09c58
Compare
My gut says no but I could be convinced otherwise. For example, the S3 filesystem (if I understand correctly) would plug the executor into the AWS client configuration and the random access files / etc. would rely on that and not the Executor directly. At best it would just be for convenience right? A way to provide implementors easy access to the context instead of having to take care of passing it around? |
|
naming nit: IoContext? |
d4608a9 to
356c300
Compare
|
@westonpace I think I misphrased my question. This PR adds an owned @emkornfield Hmm, I'm not sure what the convention should be. We currently have |
According to the style guide IoContext:
However, if we want to keep consistency I'm OK with IO. |
|
Since IOContext only wraps pointers and an id integer, semantically it represents a reference. Therefore I'd recommend never producing references to them; it's redundant and the structure is tiny and trivially copyable anyway. |
bb09c58 to
37373b4
Compare
There was a problem hiding this comment.
Note that the CPU and IO executors were passed in the wrong order here.
|
@ursabot please benchmark |
The `io::IOContext` class allows passing various settings such as the MemoryPool used for allocation and the Executor for async methods.
f16a80c to
a463936
Compare
|
@bkietz Do you want to give this another look? |
bkietz
left a comment
There was a problem hiding this comment.
LGTM, will merge when CI completes
a463936 to
da3ece9
Compare
| return ValueOrStop( | ||
| arrow::csv::TableReader::Make(gc_memory_pool(), arrow::io::AsyncContext(), input, | ||
| *read_options, *parse_options, *convert_options)); | ||
| return ValueOrStop(arrow::csv::TableReader::Make(arrow::io::IOContext(gc_memory_pool()), |
|
Benchmark runs are scheduled for baseline = 9a9baf6 and contender = da3ece9. Results will be available as each benchmark for each run completes: |
|
CI failure is ARROW-11717. Merging |
The
io::IOContextclass allows passing various settings such as the MemoryPool used for allocation and the Executor for async methods.