Skip to content

[SPARK-57430][CORE] Validate length before allocation in Encoders.ByteArrays.decode#56493

Closed
dongjoon-hyun wants to merge 1 commit into
apache:masterfrom
dongjoon-hyun:SPARK-57430
Closed

[SPARK-57430][CORE] Validate length before allocation in Encoders.ByteArrays.decode#56493
dongjoon-hyun wants to merge 1 commit into
apache:masterfrom
dongjoon-hyun:SPARK-57430

Conversation

@dongjoon-hyun

@dongjoon-hyun dongjoon-hyun commented Jun 14, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Encoders.ByteArrays.decode() now validates the length prefix before allocating the array via Objects.checkFromIndexSize(0, length, buf.readableBytes()), which requires 0 <= length <= buf.readableBytes() and throws IndexOutOfBoundsException otherwise. After readInt(), the reader index is past the prefix, so readableBytes() equals the remaining payload bytes.

Why are the changes needed?

decode() allocated new byte[length] from an untrusted length.

  • A negative value throws an opaque NegativeArraySizeException.
  • An oversized value can trigger OutOfMemoryError on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that OutOfMemoryError can happen even in a secure environment at the first parse on an unauthenticated channel.

AuthMessage challenge;
try {
challenge = AuthMessage.decodeMessage(message);

public static AuthMessage decodeMessage(ByteBuffer buffer) {
ByteBuf buf = Unpooled.wrappedBuffer(buffer);
if (buf.readByte() != TAG_BYTE) {
throw new IllegalArgumentException("Expected ClientChallenge, received something else.");
}
return new AuthMessage(
Encoders.Strings.decode(buf), // AppID
Encoders.ByteArrays.decode(buf), // Salt
Encoders.ByteArrays.decode(buf)); // Ciphertext
}

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs with newly added test cases in EncodersSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

@dongjoon-hyun

Copy link
Copy Markdown
Member Author

Thank you, @peter-toth .

dongjoon-hyun added a commit that referenced this pull request Jun 14, 2026
…teArrays.decode`

### What changes were proposed in this pull request?

`Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases in `EncodersSuite`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56493 from dongjoon-hyun/SPARK-57430.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5c29a93)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun

Copy link
Copy Markdown
Member Author

Merged to master/4.x for now because we have Apache Spark 4.2.0 RC3 vote currently.

If it fails, I will consider to cherry-pick this to branch-4.2 and older.

cc @huaxingao

@dongjoon-hyun dongjoon-hyun deleted the SPARK-57430 branch June 14, 2026 15:49
dongjoon-hyun added a commit that referenced this pull request Jun 17, 2026
…teArrays.decode`

`Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64

No.

Pass the CIs with newly added test cases in `EncodersSuite`.

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56493 from dongjoon-hyun/SPARK-57430.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5c29a93)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 704bfef)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Jun 17, 2026
…teArrays.decode`

`Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64

No.

Pass the CIs with newly added test cases in `EncodersSuite`.

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56493 from dongjoon-hyun/SPARK-57430.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5c29a93)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 704bfef)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 04ca5e6)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Jun 17, 2026
…teArrays.decode`

`Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64

No.

Pass the CIs with newly added test cases in `EncodersSuite`.

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56493 from dongjoon-hyun/SPARK-57430.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5c29a93)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 704bfef)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 04ca5e6)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 0c1c3f9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
iemejia pushed a commit to iemejia/spark that referenced this pull request Jun 17, 2026
…teArrays.decode`

### What changes were proposed in this pull request?

`Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases in `EncodersSuite`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56493 from dongjoon-hyun/SPARK-57430.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants