Skip to content

[SPARK-19871] [PySpark][SQL] Improve error message in verify_type to indicate the field#17213

Closed
blrnw3 wants to merge 1 commit into
apache:masterfrom
blrnw3:fix/verify_type_err_msg
Closed

[SPARK-19871] [PySpark][SQL] Improve error message in verify_type to indicate the field#17213
blrnw3 wants to merge 1 commit into
apache:masterfrom
blrnw3:fix/verify_type_err_msg

Conversation

@blrnw3

@blrnw3 blrnw3 commented Mar 8, 2017

Copy link
Copy Markdown

https://issues.apache.org/jira/browse/SPARK-19871

What changes were proposed in this pull request?

Improve error message in verify_type to indicate the field which is responsible for the verification error. This is incredibly useful for tracking down type/nullability errors.

Sample changes:

Before: This field is not nullable, but got None
After: This field (my_column, of type BooleanType) is not nullable, but got None

Before: FloatType can not accept object True in type bool
After: FloatType can not accept object True in type bool for field my_column

How was this patch tested?

Unit tests pass

Please review http://spark.apache.org/contributing.html before opening a pull request.

@blrnw3 blrnw3 changed the title [SPARK-13740] [PySpark][SQL] Improve error message in verify_type to indicate the field [SPARK-19871] [PySpark][SQL] Improve error message in verify_type to indicate the field Mar 8, 2017
@AmplabJenkins

Copy link
Copy Markdown

Can one of the admins verify this patch?



def _verify_type(obj, dataType, nullable=True):
def _verify_type(obj, dataType, nullable=True, name=None):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a doctest below.

@HyukjinKwon

HyukjinKwon commented Mar 9, 2017

Copy link
Copy Markdown
Member

I think we should run ./dev/lint-python. It seems some lines do not comply pep8 here. As a bonus, we could add before/after error messages in the PR description maybe.

@HyukjinKwon

Copy link
Copy Markdown
Member

cc @dgingrich who I guess the reporter of SPARK-19507 - what do you think about this?

@blrnw3

blrnw3 commented Mar 9, 2017

Copy link
Copy Markdown
Author

Ah yes it seems SPARK-19507 is a duplicate. My mistake. In any case I'm sure this PR satisfies that ticket

@blrnw3

blrnw3 commented Mar 9, 2017

Copy link
Copy Markdown
Author

Will add doctest and check linting.
Have added before/after error messages to the PR description

@dgingrich dgingrich left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and basically the same as what I did. I had some additional handling for nested structs and map/array elements which I think would be good to add. Here's my PR (incomplete, tests not done: #17227). I'm fine with either one being merged as long as I get better debug messages in the next release :) . Let me know what you want me to do.

if isinstance(obj, dict):
for f in dataType.fields:
_verify_type(obj.get(f.name), f.dataType, f.nullable)
_verify_type(obj.get(f.name), f.dataType, f.nullable, f.name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work that well for nested structs:

MySubType = StructType([StructField('value', StringType(), nullable=False)])
MyType = StructType([
    StructField('one', MySubType),
    StructField('two', MySubType)])

_verify_type({'one': {'value': 'good'}, 'two': {'value': None}}, MyType)
# "This field (value, of type StringType) is not nullable, but got None"
# But is it one.value or two.value?

_verify_type(k, dataType.keyType, False)
_verify_type(v, dataType.valueType, dataType.valueContainsNull)
_verify_type(k, dataType.keyType, False, name)
_verify_type(v, dataType.valueType, dataType.valueContainsNull, name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also want to flag individual array/map elements.

@blrnw3

blrnw3 commented Mar 10, 2017

Copy link
Copy Markdown
Author

Yours looks more complete so I'm willing to close this if you finish it

@HyukjinKwon

HyukjinKwon commented Mar 10, 2017

Copy link
Copy Markdown
Member

Let me then resolve this JIRA as a duplicate and reopen @dgingrich's one as soon as this one is closed.

@blrnw3 blrnw3 closed this Mar 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants