chore(src): add checked conversions where needed#318
Conversation
There was a problem hiding this comment.
Pull request overview
Adds overflow-checked int64→int32 conversions in IPC/Flight readers to fail fast when Arrow metadata advertises sizes that exceed .NET reader limits.
Changes:
- Use
checked((int)...)when converting IPC record batch length, field node length/null count, and buffer offset/length. - Use
checked((int)...)when converting IPC message body length in the in-memory reader. - Use
checked((int)...)when slicing Flight message bodies byBodyLength.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs | Adds checked conversions for record batch length, field node sizes, and buffer ranges during IPC array materialization. |
| src/Apache.Arrow/Ipc/ArrowMemoryReaderImplementation.cs | Adds checked conversion for IPC Message.BodyLength when reading from an in-memory buffer. |
| src/Apache.Arrow.Flight/Internal/RecordBatchReaderImplementation.cs | Adds checked conversion for Flight Message.BodyLength when slicing the body buffer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int fieldLength = checked((int)fieldNode.Length); | ||
| int fieldNullCount = checked((int)fieldNode.NullCount); |
There was a problem hiding this comment.
These checked casts can raise an OverflowException with no indication of which field node was invalid. Consider validating fieldNode.Length/fieldNode.NullCount are within [0, int.MaxValue] and throwing InvalidDataException with the actual values (and ideally the field name) to improve debuggability for corrupt IPC streams.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs:401
buffer.Offsetandbuffer.Lengthare individually range-checked, but an invalid IPC buffer can still causebodyData.ToReadOnlyMemory(offset, length)to throw (e.g.,offset + lengthexceedsbodyData.Length, oroffset + lengthoverflowsint). Consider adding a checkedoffset + lengthcomputation and verifying it is withinbodyData.Length, and throwingInvalidDataExceptionon failure so corrupted input doesn't surface asArgumentOutOfRangeException.
if (buffer.Offset < 0 || buffer.Offset > int.MaxValue)
{
throw new InvalidDataException(
$"IPC buffer offset is out of range for a .NET buffer: offset={buffer.Offset}, length={buffer.Length}.");
}
if (buffer.Length < 0 || buffer.Length > int.MaxValue)
{
throw new InvalidDataException(
$"IPC buffer length is out of range for a .NET buffer: offset={buffer.Offset}, length={buffer.Length}.");
}
int offset = (int)buffer.Offset;
int length = (int)buffer.Length;
var data = bodyData.ToReadOnlyMemory(offset, length);
return bufferCreator.CreateBuffer(data);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (rb.Length > int.MaxValue) | ||
| { | ||
| throw new InvalidDataException( | ||
| $"Cannot read batch. Message body of {rb.Length} bytes " + | ||
| $"is greater than the maximum supported length ({int.MaxValue})"); |
There was a problem hiding this comment.
The overflow guard is validating rb.Length, which is the record batch row count, but the exception message says “Message body … bytes”. This is misleading (and the value is not bytes). Also consider validating rb.Length >= 0 since a malformed IPC message could make Length negative; the subsequent cast to int would then produce an invalid RecordBatch length.
| if (rb.Length > int.MaxValue) | |
| { | |
| throw new InvalidDataException( | |
| $"Cannot read batch. Message body of {rb.Length} bytes " + | |
| $"is greater than the maximum supported length ({int.MaxValue})"); | |
| if (rb.Length < 0 || rb.Length > int.MaxValue) | |
| { | |
| throw new InvalidDataException( | |
| $"Cannot read batch. Record batch length of {rb.Length} rows " + | |
| $"is outside the supported range [0, {int.MaxValue}]"); |
| if (message.BodyLength > int.MaxValue) | ||
| { | ||
| throw new InvalidDataException( | ||
| $"Cannot read batch. Message body of {message.BodyLength} bytes " + | ||
| $"is greater than the maximum supported length ({int.MaxValue})"); | ||
| } | ||
|
|
||
| int bodyLength = (int)message.BodyLength; |
There was a problem hiding this comment.
message.BodyLength is only checked for > int.MaxValue. If it is negative (malformed IPC), the (int)message.BodyLength cast succeeds and _buffer.Slice(..., bodyLength) will throw ArgumentOutOfRangeException. Consider validating 0 <= message.BodyLength && message.BodyLength <= int.MaxValue before casting, and optionally verifying the buffer has at least bodyLength remaining bytes to produce a consistent InvalidDataException for corrupted input.
| if (message.BodyLength > int.MaxValue) | |
| { | |
| throw new InvalidDataException( | |
| $"Cannot read batch. Message body of {message.BodyLength} bytes " + | |
| $"is greater than the maximum supported length ({int.MaxValue})"); | |
| } | |
| int bodyLength = (int)message.BodyLength; | |
| if (message.BodyLength < 0 || message.BodyLength > int.MaxValue) | |
| { | |
| throw new InvalidDataException( | |
| $"Cannot read batch. Message body of {message.BodyLength} bytes " + | |
| $"is outside the supported length range (0 to {int.MaxValue})"); | |
| } | |
| int bodyLength = (int)message.BodyLength; | |
| if (_buffer.Length - _bufferPosition < bodyLength) | |
| { | |
| throw new InvalidDataException( | |
| $"Corrupted IPC message. Message body length of {bodyLength} bytes exceeds the remaining buffer length."); | |
| } |
| if (message.BodyLength > int.MaxValue) | ||
| { | ||
| throw new InvalidDataException( | ||
| $"Cannot read batch. Message body of {message.BodyLength} bytes " + | ||
| $"is greater than the maximum supported length ({int.MaxValue})"); | ||
| } | ||
|
|
||
| var body = _flightDataStream.Current.DataBody.Memory; | ||
| return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, (int)message.BodyLength)), null); | ||
| return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, checked((int)message.BodyLength))), null); |
There was a problem hiding this comment.
message.BodyLength is only checked for > int.MaxValue. If it is negative, checked((int)message.BodyLength) will succeed and body.Slice(0, ...) will throw ArgumentOutOfRangeException. Consider validating message.BodyLength >= 0 (and ideally <= body.Length) before slicing so malformed Flight/IPC payloads fail with a consistent InvalidDataException.
adamreeve
left a comment
There was a problem hiding this comment.
Looks good to me thanks Curt
|
|
||
| var body = _flightDataStream.Current.DataBody.Memory; | ||
| return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, (int)message.BodyLength)), null); | ||
| return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, checked((int)message.BodyLength))), null); |
There was a problem hiding this comment.
Nit: The checked is redundant due to the check above, although doesn't really hurt
There was a problem hiding this comment.
Maybe there's a nanosecond performance hit!
What's Changed
Adds checks to some int64->int32 conversions when reading data through IPC.