Skip to content

ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array#1681

Closed
pitrou wants to merge 1 commit into
apache:masterfrom
pitrou:ARROW-2135-nan-conversion-when-casting
Closed

ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array#1681
pitrou wants to merge 1 commit into
apache:masterfrom
pitrou:ARROW-2135-nan-conversion-when-casting

Conversation

@pitrou

@pitrou pitrou commented Feb 28, 2018

Copy link
Copy Markdown
Member

No description provided.

Comment thread cpp/src/arrow/python/numpy-internal.h Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason (macro expansion?) these #ifs wouldn't work correctly here, even though NPY_INT64 is defined to NPY_LONG.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually, that must be because NPY_LONGLONG is not a macro...

@pitrou pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch 2 times, most recently from 6cbf133 to d602be7 Compare February 28, 2018 19:13
@pitrou pitrou closed this Feb 28, 2018
@pitrou pitrou reopened this Feb 28, 2018
@pitrou pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch 3 times, most recently from 84766bc to cd37393 Compare February 28, 2018 20:01
Comment thread cpp/src/arrow/python/numpy_to_arrow.cc Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::fill(null_bitmap_data_, null_bitmap_data_ + null_bytes, 0) is a bit more idiomatic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps. This is really a copy/paste of NumPyConverter::InitNullBitmap()...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly time for a subclass then?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there already a test for things like a = [1.0, 2.0, 3.1, np.nan] where a user passes in an integer type?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for the truncation behavior? Let me look.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so. I'm not sure we specify the truncation mode anywhere either?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's a hard cast:

In [7]: pa.array([1, 2, 3.190, np.nan], type=pa.int64())
Out[6]:
<pyarrow.lib.Int64Array object at 0x7f537e42dd68>
[
  1,
  2,
  3,
  NA
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. Was just wondering.

Comment thread cpp/src/arrow/python/type_traits.h Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline is redundant here: http://en.cppreference.com/w/cpp/language/inline.

A function defined entirely inside a class/struct/union definition, whether it's a member function or a non-member friend function, is implicitly an inline function.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is really using the same convention as the rest of the file, though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so that's also called isnull. Shouldn't that mean v == Py_None?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs a test as well since it isn't failing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch :-) I'm not sure how to test it. Defining isnull is necessary for compiling, but that path isn't taken at runtime as object arrays are handled separately.

Comment thread cpp/src/arrow/python/numpy_to_arrow.cc Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we may want to have an STL-compatible view class that makes interacting with iterators constructs in the STL much easier. We have a lot of code that is manually handling iteration using a size/count and a buffer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which iterators are you thinking about? Do you mean the ndarray 1d iterator?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one, though I added begin()/end() for that in #1651.

@pitrou pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch 3 times, most recently from 73916de to bb56637 Compare March 1, 2018 09:56
Comment thread cpp/src/arrow/python/numpy_to_arrow.cc Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I don't know what that is, but this is required to have the tests pass. Why do we always treat NaT as null but not floating-point NaN? @wesm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU There's no other way to interpret NaT other than NULL (unless there's a standard that defines it in a different way than "missing"). nan is part of the IEEE floating point specification (as I'm sure you know) and it has a different meaning than null.

@pitrou

pitrou commented Mar 1, 2018

Copy link
Copy Markdown
Member Author

I addressed some review comments now.

@pitrou pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch from bb56637 to 375418f Compare March 1, 2018 12:23
@pitrou

pitrou commented Mar 1, 2018

Copy link
Copy Markdown
Member Author

@pitrou pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch from 375418f to 0af573b Compare March 5, 2018 11:41
@pitrou pitrou force-pushed the ARROW-2135-nan-conversion-when-casting branch from 0af573b to 939428d Compare March 8, 2018 12:33
@pitrou

pitrou commented Mar 8, 2018

Copy link
Copy Markdown
Member Author

Rebased.

@pitrou

pitrou commented Mar 8, 2018

Copy link
Copy Markdown
Member Author

AppVeyor at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.175

@wesm wesm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for cleaning up the int/uint size issues here, much cleaner now

@wesm wesm closed this in 171340f Mar 12, 2018
@pitrou pitrou deleted the ARROW-2135-nan-conversion-when-casting branch March 12, 2018 19:04
@wesm

wesm commented Mar 12, 2018

Copy link
Copy Markdown
Member

see ARROW-2298 for adding an option about NaN conversions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants