Skip to content

GH-36323: [Python] Fix Timestamp scalar repr error on values outside datetime range#36877

Closed
aboss123 wants to merge 3 commits into
apache:mainfrom
aboss123:main
Closed

GH-36323: [Python] Fix Timestamp scalar repr error on values outside datetime range#36877
aboss123 wants to merge 3 commits into
apache:mainfrom
aboss123:main

Conversation

@aboss123

@aboss123 aboss123 commented Jul 25, 2023

Copy link
Copy Markdown
Contributor

Rationale for this change

#36323

What changes are included in this PR?

Changed the way repr is handled for TimestampScalar

Are these changes tested?

I have added a very basic test for this change to see whether it will error or not if outside the range.

Are there any user-facing changes?

The functionality of TimestampScalar's repr now uses the strftime function.

@kou kou changed the title MINOR: [Python] Fix Timestamp scalar repr error on values outside datetime range GH-36323: [Python] Fix Timestamp scalar repr error on values outside datetime range Jul 26, 2023
Comment thread python/pyarrow/scalar.pxi Outdated
with nogil:
check_status(self.wrapped.get().Validate())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment thread python/pyarrow/scalar.pxi Outdated
# retrieve the scalar from the first position
scalar = GetResultValue(array.get().GetScalar(0))
return Scalar.wrap(scalar)
return Scalar.wrap(scalar) No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert this change too?

@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jul 26, 2023

@raulcd raulcd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some linting issues and the following test is failing:

__________________ test_sequence_timestamp_from_int_with_unit __________________

    @pytest.mark.pandas
    def test_sequence_timestamp_from_int_with_unit():
        # TODO(wesm): This test might be rewritten to assert the actual behavior
        # when pandas is not installed
    
        data = [1]
    
        s = pa.timestamp('s')
        ms = pa.timestamp('ms')
        us = pa.timestamp('us')
        ns = pa.timestamp('ns')
    
        arr_s = pa.array(data, type=s)
        assert len(arr_s) == 1
        assert arr_s.type == s
>       assert repr(arr_s[0]) == (
            "<pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>"
        )
E       assert "<pyarrow.Tim...01T00:00:01'>" == '<pyarrow.Tim... 1, 0, 0, 1)>'
E         - <pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>
E         + <pyarrow.TimestampScalar: '1970-01-01T00:00:01'>

@aboss123

Copy link
Copy Markdown
Contributor Author

There seems to be some linting issues and the following test is failing:

__________________ test_sequence_timestamp_from_int_with_unit __________________

    @pytest.mark.pandas
    def test_sequence_timestamp_from_int_with_unit():
        # TODO(wesm): This test might be rewritten to assert the actual behavior
        # when pandas is not installed
    
        data = [1]
    
        s = pa.timestamp('s')
        ms = pa.timestamp('ms')
        us = pa.timestamp('us')
        ns = pa.timestamp('ns')
    
        arr_s = pa.array(data, type=s)
        assert len(arr_s) == 1
        assert arr_s.type == s
>       assert repr(arr_s[0]) == (
            "<pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>"
        )
E       assert "<pyarrow.Tim...01T00:00:01'>" == '<pyarrow.Tim... 1, 0, 0, 1)>'
E         - <pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>
E         + <pyarrow.TimestampScalar: '1970-01-01T00:00:01'>

Regarding this error I have a question. The docs appear to want the repr of the TimestampScalar to show datetime but perhaps it could display in a different format? As the initial issue mentions, values that are put in outside the range of the datetime values could break the repr functionality so would it be wise to change docs to show strftime output versus the datetime one?

@danepitkin danepitkin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! We should update documentation as well to reflect this change.

Comment thread python/pyarrow/scalar.pxi Outdated

def __repr__(self):
if isinstance(self, TimestampScalar):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for TimestampScalar type in the base class, let's override __repr__ in the cdef class TimestampScalar.

Here is the implementation of Array.__repr__, which is mentioned in the issue linked. https://github.com/apache/arrow/blob/main/python/pyarrow/array.pxi#L1222

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I can make that change. How would you specify the time format exactly. It mentions the default here but the tests specify <pyarrow.TimestampScalar: datetime.datetime(2012, 1, 1, 0, 0, tzinfo=<UTC>)>. This has timezone info would you like the format to include %z & %Z modifiers to view this?

@danepitkin danepitkin Jul 27, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yes, let's include the timezone when supplied. Let's try and follow the ISO 8601 standard https://en.wikipedia.org/wiki/ISO_8601 and use %z. Maybe something like %Y-%m-%dT%H:%M:%S%z e.g.

>>> res = pa.scalar("2000-01-01").cast(pa.timestamp("s"))
>>> res_with_tz = pc.assume_timezone(res, "America/New_York")
>>> str(pc.strftime(res_with_tz, format='%Y-%m-%dT%H:%M:%S%z'))
'2000-01-01T00:00:00-0500'

@github-actions github-actions Bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 28, 2023
@aboss123

Copy link
Copy Markdown
Contributor Author

Given the comments I've made some changes. Please let me know if there is anything else needed.

@kou

kou commented Jul 30, 2023

Copy link
Copy Markdown
Member

@github-actions crossbow submit -g python

@github-actions

Copy link
Copy Markdown
Command '['git', '-C', '/tmp/tmpryb2in98/arrow', 'fetch', 'origin', 'pull/36877/head:main']' returned non-zero exit status 128.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/5708248418

@aboss123

Copy link
Copy Markdown
Contributor Author

@github-actions crossbow submit python

Command '['git', '-C', '/tmp/tmpryb2in98/arrow', 'fetch', 'origin', 'pull/36877/head:main']' returned non-zero exit status 128.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/5708248418

Is there something that needs to be changed considering this did not exit properly?

@github-actions

Copy link
Copy Markdown
Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/5709570640

@kou

kou commented Jul 31, 2023

Copy link
Copy Markdown
Member

Ah, could you re-submit this PR from a feature branch instead of main?
Our CI tool (Crossbow doesn't support main branch on fork.)

@aboss123

Copy link
Copy Markdown
Contributor Author

Ah, could you re-submit this PR from a feature branch instead of main? Our CI tool (Crossbow doesn't support main branch on fork.)

I have created a new PR here: #36942

@kou

kou commented Jul 31, 2023

Copy link
Copy Markdown
Member

I close this in favor of #36942.

@kou kou closed this Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Repr of Timestamp scalars error if they are outside range of stdlib datetime.datetime

4 participants