Skip to content

ARROW-9861: [Java] Support big-endian in DecimalVector#8056

Closed
kiszk wants to merge 6 commits into
apache:masterfrom
kiszk:ARROW-9861
Closed

ARROW-9861: [Java] Support big-endian in DecimalVector#8056
kiszk wants to merge 6 commits into
apache:masterfrom
kiszk:ARROW-9861

Conversation

@kiszk

@kiszk kiszk commented Aug 26, 2020

Copy link
Copy Markdown
Member

This PR fixes failures in DecimalVectorTest on a big-endian platform

@kiszk kiszk changed the title ARROW-9861: [java] Support big-endian in DecimalVector ARROW-9861: [Java] Support big-endian in DecimalVector Aug 26, 2020
@github-actions

Copy link
Copy Markdown

@emkornfield

Copy link
Copy Markdown
Contributor

@kiszk you'll probably need to make some fixes for Decimal256 as well, I think.

@BryanCutler you mentioned you could devote some time to reviewing Big Endian changes? Would you mind taking a look through this one and @kiszk other Java changes?

@BryanCutler

Copy link
Copy Markdown
Member

Sure, I can take a look. It might a day or two before I can though @kiszk .

@kiszk

kiszk commented Oct 28, 2020

Copy link
Copy Markdown
Member Author

@emkornfield Sure, I will work for supporting Decimal256.

@BryanCutler BryanCutler left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple minor things

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit odd that it's writing long values at 2 different indices, but I guess that was here before. Do you know what it's trying to do here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess it's to write a long value to be used as BigDecimal? So it writes the long in 8-bytes and then pads the remaining 8-bytes. It would be nice if the doc was a little better, but no big deal..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for DecialVector that is a fixed-width (16-byte) vector. This routine extends the signed bit to a new long. I will write a document in the comment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, this PR will support to write it to a 256-bit entry for the future.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return if length == TYPE_WIDTH?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add if (length == TYPE_WIDTH) return before line 244, no data is copied from value to outAddress. I will add a comment copy data from value to outAddress after line 244.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see above it is already set during the swap. I guess there is no harm in calling PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH - length, pad) if length == TYPE_WIDTH

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in a Preconditions.checkArgument, but we are not trying to change things in this PR so don't need to do that here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could move padBytes out of the if statements

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need 2 loops here, just move the Integer.MAX_VALUE and MIN_VALUE here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will drop lines 49-57.

@kiszk

kiszk commented Nov 2, 2020

Copy link
Copy Markdown
Member Author

I will update the Decimal256Vector class late today or tomorrow.

@BryanCutler BryanCutler left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor nits

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see above it is already set during the swap. I guess there is no harm in calling PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH - length, pad) if length == TYPE_WIDTH

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: DeciimalUtility -> DecimalUtility

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use DecimalVector.TYPE_WIDTH here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@BryanCutler BryanCutler left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looks like a flaky test, will try again.

@BryanCutler

Copy link
Copy Markdown
Member

The test error with Java JNI looks unrelated and seems to be an env issue with ORC, I'll go ahead with merging this.

@BryanCutler

Copy link
Copy Markdown
Member

merged to master, thanks @kiszk !

@kiszk

kiszk commented Nov 5, 2020

Copy link
Copy Markdown
Member Author

@BryanCutler Thank you. One comment.
According to the document, benchmarking is necessary before merging features (I think that the test code does not affect the performance)

Benchmarks for performance critical parts of the code to demonstrate no regression.

I am working for benchmark bot for Java here. It would be good to merge new features after this bot will be available.

cc @emkornfield

@BryanCutler

Copy link
Copy Markdown
Member

I did not see anything that looks like it would affect performance here, but I agree we should get some benchmarks going to be sure. I will look at you other PR next.

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
This PR fixes failures in DecimalVectorTest on a big-endian platform

Closes apache#8056 from kiszk/ARROW-9861

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants