From aaea387cebf0337ccf3e73a32935dec13c366d82 Mon Sep 17 00:00:00 2001 From: Micah Kornfield Date: Thu, 25 Jul 2019 22:22:27 -0700 Subject: [PATCH 1/4] [Proposal] Add a padding for flatbuffer alignment --- docs/source/format/IPC.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/format/IPC.rst b/docs/source/format/IPC.rst index 16567a6744e..73d537f7498 100644 --- a/docs/source/format/IPC.rst +++ b/docs/source/format/IPC.rst @@ -24,6 +24,8 @@ Encapsulated message format Data components in the stream and file formats are represented as encapsulated *messages* consisting of: +* A 32-bit continuation indicator (0xFFFFFFFF). This allows flatbuffer bytes + to begin on an 8-byte boundary. This was added as of release 0.15.0. * A length prefix indicating the metadata size * The message metadata as a `Flatbuffer`_ * Padding bytes to an 8-byte boundary @@ -31,6 +33,7 @@ Data components in the stream and file formats are represented as encapsulated Schematically, we have: :: + <0xFFFFFFFF> From 835982a186949a1a249e17e66d7ebb9bdf7f73c5 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Thu, 1 Aug 2019 20:49:17 -0700 Subject: [PATCH 2/4] Update IPC.rst --- docs/source/format/IPC.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/format/IPC.rst b/docs/source/format/IPC.rst index 73d537f7498..14c9ef513ac 100644 --- a/docs/source/format/IPC.rst +++ b/docs/source/format/IPC.rst @@ -26,7 +26,7 @@ Data components in the stream and file formats are represented as encapsulated * A 32-bit continuation indicator (0xFFFFFFFF). This allows flatbuffer bytes to begin on an 8-byte boundary. This was added as of release 0.15.0. -* A length prefix indicating the metadata size +* A 32-bit little-endian length prefix indicating the metadata size * The message metadata as a `Flatbuffer`_ * Padding bytes to an 8-byte boundary * The message body, which must be a multiple of 8 bytes From 254eaa0eec1e4b9755055b9ed3a867e8af44bbfa Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Tue, 6 Aug 2019 14:30:53 -0500 Subject: [PATCH 3/4] Add clarifications about 4-byte IPC continuation marker --- docs/source/format/IPC.rst | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/docs/source/format/IPC.rst b/docs/source/format/IPC.rst index 14c9ef513ac..4ca348244d9 100644 --- a/docs/source/format/IPC.rst +++ b/docs/source/format/IPC.rst @@ -24,8 +24,9 @@ Encapsulated message format Data components in the stream and file formats are represented as encapsulated *messages* consisting of: -* A 32-bit continuation indicator (0xFFFFFFFF). This allows flatbuffer bytes - to begin on an 8-byte boundary. This was added as of release 0.15.0. +* A 32-bit continuation indicator. The value `0xFFFFFFFF` indicates a valid + message. This component was introduced in version 0.15.0 in part to address + the 8-byte alignment requirement of Flatbuffers * A 32-bit little-endian length prefix indicating the metadata size * The message metadata as a `Flatbuffer`_ * Padding bytes to an 8-byte boundary @@ -33,7 +34,7 @@ Data components in the stream and file formats are represented as encapsulated Schematically, we have: :: - <0xFFFFFFFF> + @@ -82,14 +83,14 @@ in a ``RecordBatch`` it should be defined in a ``DictionaryBatch``. :: ... - + When a stream reader implementation is reading a stream, after each message, it may read the next 4 bytes to know how large the message metadata that follows is. Once the message flatbuffer is read, you can then read the message body. -The stream writer can signal end-of-stream (EOS) either by writing a 0 length -as an ``int32`` or simply closing the stream interface. +The stream writer can signal end-of-stream (EOS) either by writing the four +bytes `0x00000000` or closing the stream interface. File format ----------- @@ -222,8 +223,8 @@ take the form: :: Tensor (Multi-dimensional Array) Message Format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The ``Tensor`` message types provides a way to write a multidimensional array of -fixed-size values (such as a NumPy ndarray) using Arrow's shared memory +The ``Tensor`` message types provides a way to write a multidimensional array +of fixed-size values (such as a NumPy ndarray) using Arrow's shared memory tools. Arrow implementations in general are not required to implement this data format, though we provide a reference implementation in C++. From 1f9cff80ae6f3922f673821b7d632368e535fc3a Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Wed, 14 Aug 2019 17:38:08 -0500 Subject: [PATCH 4/4] Use 8-byte EOS --- docs/source/format/IPC.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/source/format/IPC.rst b/docs/source/format/IPC.rst index 4ca348244d9..88cab8a288c 100644 --- a/docs/source/format/IPC.rst +++ b/docs/source/format/IPC.rst @@ -83,14 +83,15 @@ in a ``RecordBatch`` it should be defined in a ``DictionaryBatch``. :: ... - + -When a stream reader implementation is reading a stream, after each message, it -may read the next 4 bytes to know how large the message metadata that follows -is. Once the message flatbuffer is read, you can then read the message body. +When a stream reader implementation is reading a stream, after each +message, it may read the next 8 bytes to determine both if the stream +continues and the size of the message metadata that follows. Once the +message flatbuffer is read, you can then read the message body. -The stream writer can signal end-of-stream (EOS) either by writing the four -bytes `0x00000000` or closing the stream interface. +The stream writer can signal end-of-stream (EOS) either by writing 8 +zero (`0x00`) bytes or closing the stream interface. File format -----------