[SPARK-57430][CORE] Validate length before allocation in `Encoders.ByteArrays.decode` by dongjoon-hyun · Pull Request #56493 · apache/spark

dongjoon-hyun · 2026-06-14T01:52:39Z

What changes were proposed in this pull request?

Encoders.ByteArrays.decode() now validates the length prefix before allocating the array via Objects.checkFromIndexSize(0, length, buf.readableBytes()), which requires 0 <= length <= buf.readableBytes() and throws IndexOutOfBoundsException otherwise. After readInt(), the reader index is past the prefix, so readableBytes() equals the remaining payload bytes.

Note that we use the Java built-in Objects.checkFromIndexSize because we cannot use Netty's checkReadableBytes. However, both methods throws IndexOutOfBoundsException identically for the error cases.
- https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
- https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

Why are the changes needed?

decode() allocated new byte[length] from an untrusted length.

A negative value throws an opaque NegativeArraySizeException.
An oversized value can trigger OutOfMemoryError on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that OutOfMemoryError can happen even in a secure environment at the first parse on an unauthenticated channel.

spark/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java

Lines 88 to 90 in b6450e6

    
           AuthMessage challenge; 
        
           try { 
        
             challenge = AuthMessage.decodeMessage(message);

spark/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java

Lines 53 to 64 in b6450e6

    
           public static AuthMessage decodeMessage(ByteBuffer buffer) { 
        
             ByteBuf buf = Unpooled.wrappedBuffer(buffer); 
        
             if (buf.readByte() != TAG_BYTE) { 
        
               throw new IllegalArgumentException("Expected ClientChallenge, received something else."); 
        
             } 
        
             return new AuthMessage( 
        
               Encoders.Strings.decode(buf),  // AppID 
        
               Encoders.ByteArrays.decode(buf),  // Salt 
        
               Encoders.ByteArrays.decode(buf));  // Ciphertext 
        
           }

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs with newly added test cases in EncodersSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…ys.decode`

dongjoon-hyun · 2026-06-14T15:47:40Z

Thank you, @peter-toth .

…teArrays.decode` ### What changes were proposed in this pull request? `Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes. - Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases. - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int) - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int) ### Why are the changes needed? `decode()` allocated `new byte[length]` from an untrusted length. - A negative value throws an opaque `NegativeArraySizeException`. - An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame. Validating first fails fast with a clear error. Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel. https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90 https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with newly added test cases in `EncodersSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Closes #56493 from dongjoon-hyun/SPARK-57430. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 5c29a93) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2026-06-14T15:49:27Z

Merged to master/4.x for now because we have Apache Spark 4.2.0 RC3 vote currently.

If it fails, I will consider to cherry-pick this to branch-4.2 and older.

cc @huaxingao

…teArrays.decode` `Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes. - Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases. - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int) - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int) `decode()` allocated `new byte[length]` from an untrusted length. - A negative value throws an opaque `NegativeArraySizeException`. - An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame. Validating first fails fast with a clear error. Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel. https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90 https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64 No. Pass the CIs with newly added test cases in `EncodersSuite`. Generated-by: Claude Code (Claude Opus 4.8) Closes #56493 from dongjoon-hyun/SPARK-57430. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 5c29a93) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 704bfef) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…teArrays.decode` `Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes. - Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases. - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int) - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int) `decode()` allocated `new byte[length]` from an untrusted length. - A negative value throws an opaque `NegativeArraySizeException`. - An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame. Validating first fails fast with a clear error. Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel. https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90 https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64 No. Pass the CIs with newly added test cases in `EncodersSuite`. Generated-by: Claude Code (Claude Opus 4.8) Closes #56493 from dongjoon-hyun/SPARK-57430. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 5c29a93) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 704bfef) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 04ca5e6) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…teArrays.decode` `Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes. - Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases. - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int) - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int) `decode()` allocated `new byte[length]` from an untrusted length. - A negative value throws an opaque `NegativeArraySizeException`. - An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame. Validating first fails fast with a clear error. Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel. https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90 https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64 No. Pass the CIs with newly added test cases in `EncodersSuite`. Generated-by: Claude Code (Claude Opus 4.8) Closes #56493 from dongjoon-hyun/SPARK-57430. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 5c29a93) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 704bfef) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 04ca5e6) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 0c1c3f9) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…teArrays.decode` ### What changes were proposed in this pull request? `Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes. - Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases. - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int) - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int) ### Why are the changes needed? `decode()` allocated `new byte[length]` from an untrusted length. - A negative value throws an opaque `NegativeArraySizeException`. - An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame. Validating first fails fast with a clear error. Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel. https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90 https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with newly added test cases in `EncodersSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Closes apache#56493 from dongjoon-hyun/SPARK-57430. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-57430] Validate length before allocation in `Encoders.ByteArra…

47ce83b

…ys.decode`

peter-toth approved these changes Jun 14, 2026

View reviewed changes

dongjoon-hyun closed this in 5c29a93 Jun 14, 2026

dongjoon-hyun deleted the SPARK-57430 branch June 14, 2026 15:49

dongjoon-hyun mentioned this pull request Jun 16, 2026

[SPARK-57496][SQL][BUILD] Keep the Types Framework ops and UDF worker packages out of the published API #56551

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57430][CORE] Validate length before allocation in `Encoders.ByteArrays.decode`#56493

[SPARK-57430][CORE] Validate length before allocation in `Encoders.ByteArrays.decode`#56493
dongjoon-hyun wants to merge 1 commit into
apache:masterfrom
dongjoon-hyun:SPARK-57430

dongjoon-hyun commented Jun 14, 2026 •

edited

Loading

Uh oh!

dongjoon-hyun commented Jun 14, 2026

Uh oh!

dongjoon-hyun commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	AuthMessage challenge;
	try {
	challenge = AuthMessage.decodeMessage(message);

	public static AuthMessage decodeMessage(ByteBuffer buffer) {
	ByteBuf buf = Unpooled.wrappedBuffer(buffer);

	if (buf.readByte() != TAG_BYTE) {
	throw new IllegalArgumentException("Expected ClientChallenge, received something else.");
	}

	return new AuthMessage(
	Encoders.Strings.decode(buf), // AppID
	Encoders.ByteArrays.decode(buf), // Salt
	Encoders.ByteArrays.decode(buf)); // Ciphertext
	}

Uh oh!

Conversation

dongjoon-hyun commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Jun 14, 2026

Uh oh!

dongjoon-hyun commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dongjoon-hyun commented Jun 14, 2026 •

edited

Loading