Describe the usage question you have. Please include as many useful details as possible.
Among the simple HTTP GET client examples in arrow-experiments/http/get_simple:
- Some iterate over the record batches as they stream in from the server (i.e. "streaming" approach).
- Some just make a single function call that collects the full data (i.e. "one-shot" approach).
For example:
- The Python client example shows how to iterate over the batches calling
reader.read_next_batch(), whereas it could have just called reader.read_all() which would be simpler.
- The Ruby client example goes for the simpler all-at-once approach, whereas it could have used a batch-at-a-time approach like in this example.
For many use cases, it makes no difference which approach is used, and we should just prioritize whatever is syntactically simplest.
But for some use cases, the batch-at-a-time approach will be preferred or needed for specific reasons, such as:
- The receiver wants to start processing batches before the final batch is received.
- The receiver wants to stream the received data to a sink without accumulating it in memory.
We should clarify this in the Arrow-over-HTTP conventions doc, and wherever possible we should provide examples showing both approaches.
Component(s)
Documentation
Describe the usage question you have. Please include as many useful details as possible.
Among the simple HTTP GET client examples in
arrow-experiments/http/get_simple:For example:
reader.read_next_batch(), whereas it could have just calledreader.read_all()which would be simpler.For many use cases, it makes no difference which approach is used, and we should just prioritize whatever is syntactically simplest.
But for some use cases, the batch-at-a-time approach will be preferred or needed for specific reasons, such as:
We should clarify this in the Arrow-over-HTTP conventions doc, and wherever possible we should provide examples showing both approaches.
Component(s)
Documentation