Chunked transfer encoding is commonly used in HTTP/1.1. In chunked transfer encoding, special sequences of bytes are used to separate the chunks and as the terminating chunk. But what happens if those sequences of bytes occur inside binary Arrow IPC data, for example in a binary or string array?
I am almost certain (based on an understanding of how HTTP/1.1 clients work) that this will not cause any problems, but we should test to be fully certain.
To test, we could for example use the simple Python GET example, replacing the schema and the definition of GetPutData with the following:
schema = pa.schema([('a', pa.binary())])
def GetPutData():
arrays = [pa.array('4\r\nWiki\r\n7\r\npedia i\r\nB\r\nn \r\nchunks.\r\n0\r\n\r\nabcdefg', type=pa.binary())]
batches = [pa.record_batch(arrays, schema), pa.record_batch(arrays, schema)]
return batches
Or this similar version which creates the buffers manually:
schema = pa.schema([('a', pa.binary())])
def GetPutData():
bytestr = '4\r\nWiki\r\n7\r\npedia i\r\nB\r\nn \r\nchunks.\r\n0\r\n\r\nabcdefg'.encode('ascii')
data = [bytestr, bytestr]
offsets_buffer = pa.py_buffer(b''.join([n.to_bytes(4, 'little') for n in [0, 49, 98]]))
values_buffer = pa.py_buffer(b''.join(data))
array = pa.BinaryArray.from_buffers(pa.binary(), 2, [None, offsets_buffer, values_buffer])
arrays = [array]
batches = [pa.record_batch(arrays, schema), pa.record_batch(arrays, schema)]
return batches
Component(s)
Python
Chunked transfer encoding is commonly used in HTTP/1.1. In chunked transfer encoding, special sequences of bytes are used to separate the chunks and as the terminating chunk. But what happens if those sequences of bytes occur inside binary Arrow IPC data, for example in a binary or string array?
I am almost certain (based on an understanding of how HTTP/1.1 clients work) that this will not cause any problems, but we should test to be fully certain.
To test, we could for example use the simple Python GET example, replacing the schema and the definition of
GetPutDatawith the following:Or this similar version which creates the buffers manually:
Component(s)
Python