Skip to content

chore(src): add checked conversions where needed#318

Merged
CurtHagenlocher merged 3 commits into
apache:mainfrom
CurtHagenlocher:checked
Apr 24, 2026
Merged

chore(src): add checked conversions where needed#318
CurtHagenlocher merged 3 commits into
apache:mainfrom
CurtHagenlocher:checked

Conversation

@CurtHagenlocher
Copy link
Copy Markdown
Contributor

What's Changed

Adds checks to some int64->int32 conversions when reading data through IPC.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds overflow-checked int64int32 conversions in IPC/Flight readers to fail fast when Arrow metadata advertises sizes that exceed .NET reader limits.

Changes:

  • Use checked((int)...) when converting IPC record batch length, field node length/null count, and buffer offset/length.
  • Use checked((int)...) when converting IPC message body length in the in-memory reader.
  • Use checked((int)...) when slicing Flight message bodies by BodyLength.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs Adds checked conversions for record batch length, field node sizes, and buffer ranges during IPC array materialization.
src/Apache.Arrow/Ipc/ArrowMemoryReaderImplementation.cs Adds checked conversion for IPC Message.BodyLength when reading from an in-memory buffer.
src/Apache.Arrow.Flight/Internal/RecordBatchReaderImplementation.cs Adds checked conversion for Flight Message.BodyLength when slicing the body buffer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Apache.Arrow/Ipc/ArrowMemoryReaderImplementation.cs Outdated
Comment thread src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs Outdated
Comment on lines +260 to +261
int fieldLength = checked((int)fieldNode.Length);
int fieldNullCount = checked((int)fieldNode.NullCount);
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checked casts can raise an OverflowException with no indication of which field node was invalid. Consider validating fieldNode.Length/fieldNode.NullCount are within [0, int.MaxValue] and throwing InvalidDataException with the actual values (and ideally the field name) to improve debuggability for corrupt IPC streams.

Copilot uses AI. Check for mistakes.
Comment thread src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs Outdated
Comment thread src/Apache.Arrow/Ipc/ArrowMemoryReaderImplementation.cs Outdated
Comment thread src/Apache.Arrow.Flight/Internal/RecordBatchReaderImplementation.cs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs:401

  • buffer.Offset and buffer.Length are individually range-checked, but an invalid IPC buffer can still cause bodyData.ToReadOnlyMemory(offset, length) to throw (e.g., offset + length exceeds bodyData.Length, or offset + length overflows int). Consider adding a checked offset + length computation and verifying it is within bodyData.Length, and throwing InvalidDataException on failure so corrupted input doesn't surface as ArgumentOutOfRangeException.
            if (buffer.Offset < 0 || buffer.Offset > int.MaxValue)
            {
                throw new InvalidDataException(
                    $"IPC buffer offset is out of range for a .NET buffer: offset={buffer.Offset}, length={buffer.Length}.");
            }

            if (buffer.Length < 0 || buffer.Length > int.MaxValue)
            {
                throw new InvalidDataException(
                    $"IPC buffer length is out of range for a .NET buffer: offset={buffer.Offset}, length={buffer.Length}.");
            }

            int offset = (int)buffer.Offset;
            int length = (int)buffer.Length;

            var data = bodyData.ToReadOnlyMemory(offset, length);
            return bufferCreator.CreateBuffer(data);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/Apache.Arrow.Tests/ArrowStreamReaderTests.cs
Comment on lines +144 to +148
if (rb.Length > int.MaxValue)
{
throw new InvalidDataException(
$"Cannot read batch. Message body of {rb.Length} bytes " +
$"is greater than the maximum supported length ({int.MaxValue})");
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow guard is validating rb.Length, which is the record batch row count, but the exception message says “Message body … bytes”. This is misleading (and the value is not bytes). Also consider validating rb.Length >= 0 since a malformed IPC message could make Length negative; the subsequent cast to int would then produce an invalid RecordBatch length.

Suggested change
if (rb.Length > int.MaxValue)
{
throw new InvalidDataException(
$"Cannot read batch. Message body of {rb.Length} bytes " +
$"is greater than the maximum supported length ({int.MaxValue})");
if (rb.Length < 0 || rb.Length > int.MaxValue)
{
throw new InvalidDataException(
$"Cannot read batch. Record batch length of {rb.Length} rows " +
$"is outside the supported range [0, {int.MaxValue}]");

Copilot uses AI. Check for mistakes.
Comment on lines 109 to 116
if (message.BodyLength > int.MaxValue)
{
throw new InvalidDataException(
$"Cannot read batch. Message body of {message.BodyLength} bytes " +
$"is greater than the maximum supported length ({int.MaxValue})");
}

int bodyLength = (int)message.BodyLength;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message.BodyLength is only checked for > int.MaxValue. If it is negative (malformed IPC), the (int)message.BodyLength cast succeeds and _buffer.Slice(..., bodyLength) will throw ArgumentOutOfRangeException. Consider validating 0 <= message.BodyLength && message.BodyLength <= int.MaxValue before casting, and optionally verifying the buffer has at least bodyLength remaining bytes to produce a consistent InvalidDataException for corrupted input.

Suggested change
if (message.BodyLength > int.MaxValue)
{
throw new InvalidDataException(
$"Cannot read batch. Message body of {message.BodyLength} bytes " +
$"is greater than the maximum supported length ({int.MaxValue})");
}
int bodyLength = (int)message.BodyLength;
if (message.BodyLength < 0 || message.BodyLength > int.MaxValue)
{
throw new InvalidDataException(
$"Cannot read batch. Message body of {message.BodyLength} bytes " +
$"is outside the supported length range (0 to {int.MaxValue})");
}
int bodyLength = (int)message.BodyLength;
if (_buffer.Length - _bufferPosition < bodyLength)
{
throw new InvalidDataException(
$"Corrupted IPC message. Message body length of {bodyLength} bytes exceeds the remaining buffer length.");
}

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +146
if (message.BodyLength > int.MaxValue)
{
throw new InvalidDataException(
$"Cannot read batch. Message body of {message.BodyLength} bytes " +
$"is greater than the maximum supported length ({int.MaxValue})");
}

var body = _flightDataStream.Current.DataBody.Memory;
return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, (int)message.BodyLength)), null);
return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, checked((int)message.BodyLength))), null);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message.BodyLength is only checked for > int.MaxValue. If it is negative, checked((int)message.BodyLength) will succeed and body.Slice(0, ...) will throw ArgumentOutOfRangeException. Consider validating message.BodyLength >= 0 (and ideally <= body.Length) before slicing so malformed Flight/IPC payloads fail with a consistent InvalidDataException.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me thanks Curt


var body = _flightDataStream.Current.DataBody.Memory;
return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, (int)message.BodyLength)), null);
return CreateArrowObjectFromMessage(message, CreateByteBuffer(body.Slice(0, checked((int)message.BodyLength))), null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The checked is redundant due to the check above, although doesn't really hurt

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a nanosecond performance hit!

@CurtHagenlocher CurtHagenlocher merged commit 2b2afa2 into apache:main Apr 24, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants