Skip to content

[SPARK-57426][CORE] Validate length before allocation in Encoders.Strings.decode#56488

Closed
dongjoon-hyun wants to merge 1 commit into
apache:masterfrom
dongjoon-hyun:SPARK-57426
Closed

[SPARK-57426][CORE] Validate length before allocation in Encoders.Strings.decode#56488
dongjoon-hyun wants to merge 1 commit into
apache:masterfrom
dongjoon-hyun:SPARK-57426

Conversation

@dongjoon-hyun

@dongjoon-hyun dongjoon-hyun commented Jun 13, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Encoders.Strings.decode() now validates the length prefix before allocating the array, requiring 0 <= length <= buf.readableBytes() and throwing IndexOutOfBoundsException otherwise. After readInt(), the reader index is past the prefix, so readableBytes() equals the remaining payload bytes.

Why are the changes needed?

decode() allocated new byte[length] from an untrusted length.

  • A negative value throws an opaque NegativeArraySizeException.
  • An oversized value can trigger OutOfMemoryError on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that OutOfMemoryError can happen even in a secured environment at the first parse on an unauthenticated channel.

public boolean doAuthChallenge(
TransportClient client,
ByteBuffer message,
RpcResponseCallback callback) {
if (saslServer == null || !saslServer.isComplete()) {
ByteBuf nettyBuf = Unpooled.wrappedBuffer(message);
SaslMessage saslMessage;
try {
saslMessage = SaslMessage.decode(nettyBuf);

public static SaslMessage decode(ByteBuf buf) {
if (buf.readByte() != TAG_BYTE) {
throw new IllegalStateException("Expected SaslMessage, received something else"
+ " (maybe your client does not have SASL enabled?)");
}
String appId = Encoders.Strings.decode(buf);

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs with newly added test cases.

Without this PR, testStringsDecodeShouldFailWhenLengthExceedsReadableBytes fails with OutOfMemoryError.

[info] Test org.apache.spark.network.protocol.EncodersSuite#testStringsDecodeShouldFailWhenLengthExceedsReadableBytes() started
[0.473s][warning][oom,vendor] java.lang.OutOfMemoryError occurred: Requested array size exceeds VM limit
[error] java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

@dongjoon-hyun

Copy link
Copy Markdown
Member Author

cc @huaxingao, FYI.

@dongjoon-hyun

Copy link
Copy Markdown
Member Author

core module passed already.

Screenshot 2026-06-13 at 16 22 15

@huaxingao huaxingao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun

Copy link
Copy Markdown
Member Author

Thank you, @huaxingao .

dongjoon-hyun added a commit that referenced this pull request Jun 13, 2026
…rings.decode`

### What changes were proposed in this pull request?

`Encoders.Strings.decode()` now validates the length prefix before allocating the array, requiring `0 <= length <= buf.readableBytes()` and throwing `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secured environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java#L72-L80

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslMessage.java#L68-L74

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

Without this PR, `testStringsDecodeShouldFailWhenLengthExceedsReadableBytes` fails with `OutOfMemoryError`.
```
[info] Test org.apache.spark.network.protocol.EncodersSuite#testStringsDecodeShouldFailWhenLengthExceedsReadableBytes() started
[0.473s][warning][oom,vendor] java.lang.OutOfMemoryError occurred: Requested array size exceeds VM limit
[error] java.lang.OutOfMemoryError: Requested array size exceeds VM limit

```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56488 from dongjoon-hyun/SPARK-57426.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 768569d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun

Copy link
Copy Markdown
Member Author

Merged to master/4.x for now because we have Apache Spark 4.2.0 RC3 vote currently.

If it fails, I will consider to cherry-pick this to branch-4.2 and older.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-57426 branch June 14, 2026 00:00
dongjoon-hyun added a commit that referenced this pull request Jun 17, 2026
…rings.decode`

### What changes were proposed in this pull request?

`Encoders.Strings.decode()` now validates the length prefix before allocating the array, requiring `0 <= length <= buf.readableBytes()` and throwing `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secured environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java#L72-L80

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslMessage.java#L68-L74

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

Without this PR, `testStringsDecodeShouldFailWhenLengthExceedsReadableBytes` fails with `OutOfMemoryError`.
```
[info] Test org.apache.spark.network.protocol.EncodersSuite#testStringsDecodeShouldFailWhenLengthExceedsReadableBytes() started
[0.473s][warning][oom,vendor] java.lang.OutOfMemoryError occurred: Requested array size exceeds VM limit
[error] java.lang.OutOfMemoryError: Requested array size exceeds VM limit

```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56488 from dongjoon-hyun/SPARK-57426.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 768569d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a90d3a9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Jun 17, 2026
…rings.decode`

### What changes were proposed in this pull request?

`Encoders.Strings.decode()` now validates the length prefix before allocating the array, requiring `0 <= length <= buf.readableBytes()` and throwing `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secured environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java#L72-L80

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslMessage.java#L68-L74

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

Without this PR, `testStringsDecodeShouldFailWhenLengthExceedsReadableBytes` fails with `OutOfMemoryError`.
```
[info] Test org.apache.spark.network.protocol.EncodersSuite#testStringsDecodeShouldFailWhenLengthExceedsReadableBytes() started
[0.473s][warning][oom,vendor] java.lang.OutOfMemoryError occurred: Requested array size exceeds VM limit
[error] java.lang.OutOfMemoryError: Requested array size exceeds VM limit

```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56488 from dongjoon-hyun/SPARK-57426.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 768569d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Jun 17, 2026
…rings.decode`

### What changes were proposed in this pull request?

`Encoders.Strings.decode()` now validates the length prefix before allocating the array, requiring `0 <= length <= buf.readableBytes()` and throwing `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secured environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java#L72-L80

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslMessage.java#L68-L74

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

Without this PR, `testStringsDecodeShouldFailWhenLengthExceedsReadableBytes` fails with `OutOfMemoryError`.
```
[info] Test org.apache.spark.network.protocol.EncodersSuite#testStringsDecodeShouldFailWhenLengthExceedsReadableBytes() started
[0.473s][warning][oom,vendor] java.lang.OutOfMemoryError occurred: Requested array size exceeds VM limit
[error] java.lang.OutOfMemoryError: Requested array size exceeds VM limit

```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56488 from dongjoon-hyun/SPARK-57426.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 768569d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 7070360)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
iemejia pushed a commit to iemejia/spark that referenced this pull request Jun 17, 2026
…rings.decode`

### What changes were proposed in this pull request?

`Encoders.Strings.decode()` now validates the length prefix before allocating the array, requiring `0 <= length <= buf.readableBytes()` and throwing `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secured environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java#L72-L80

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslMessage.java#L68-L74

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

Without this PR, `testStringsDecodeShouldFailWhenLengthExceedsReadableBytes` fails with `OutOfMemoryError`.
```
[info] Test org.apache.spark.network.protocol.EncodersSuite#testStringsDecodeShouldFailWhenLengthExceedsReadableBytes() started
[0.473s][warning][oom,vendor] java.lang.OutOfMemoryError occurred: Requested array size exceeds VM limit
[error] java.lang.OutOfMemoryError: Requested array size exceeds VM limit

```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56488 from dongjoon-hyun/SPARK-57426.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants