Skip to content

GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows for ORC#45425

Merged
amoeba merged 15 commits into
apache:mainfrom
amoeba:fix/GH-45295--python-wheel-tzdata
Feb 11, 2025
Merged

GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows for ORC#45425
amoeba merged 15 commits into
apache:mainfrom
amoeba:fix/GH-45295--python-wheel-tzdata

Conversation

@amoeba

@amoeba amoeba commented Feb 5, 2025

Copy link
Copy Markdown
Member

Rationale for this change

We have two Windows issues and this PR is addressing both:

  1. PyArrow's download_tzdata_on_windows can fail due to TLS issues in certain CI environments.
  2. The Python wheel test infrastructure needs a tzinfo database for ORC and the automation fetching that started failing because the URL was made invalid upstream.

These two issues are being solved in one PR simply because they appeared together during the 19.0.1 release process but they're separate.

What changes are included in this PR?

  1. Makes download_tzdata_on_windows more robust to TLS errors by attempting to use requests if it's available and falling back to urllib otherwise.
  2. Switches our Windows wheel test infrastructure to grab a tzinfo database from the tzdata package on PyPi instead of from a mirror URL. This should be much more stable for us over time.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the awaiting review Awaiting review label Feb 5, 2025
@github-actions

github-actions Bot commented Feb 5, 2025

Copy link
Copy Markdown

⚠️ GitHub issue #45295 has been automatically assigned in GitHub to PR creator.

@amoeba

amoeba commented Feb 5, 2025

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheel-windows-cp39-amd64

@github-actions

github-actions Bot commented Feb 5, 2025

Copy link
Copy Markdown
Unable to match any tasks for `wheel-windows-cp39-amd64`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/13149682761

Comment thread ci/scripts/python_wheel_windows_test.bat Outdated
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 5, 2025
Comment thread ci/scripts/python_wheel_windows_test.bat Outdated
@amoeba

amoeba commented Feb 5, 2025

Copy link
Copy Markdown
Member Author

@kou is there a way to test this PR?

Comment thread ci/scripts/python_wheel_windows_test.bat Outdated
@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 5, 2025
@kou

kou commented Feb 5, 2025

Copy link
Copy Markdown
Member

@github-actions crossbow submit wheel-windows-cp39-amd64

@github-actions

github-actions Bot commented Feb 5, 2025

Copy link
Copy Markdown
Unable to match any tasks for `wheel-windows-cp39-amd64`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/13149993032

@kou

kou commented Feb 5, 2025

Copy link
Copy Markdown
Member

@github-actions crossbow submit wheel-windows-*

@github-actions

github-actions Bot commented Feb 5, 2025

Copy link
Copy Markdown

Revision: 29467e2

Submitted crossbow builds: ursacomputing/crossbow @ actions-435df0c2a2

Task Status
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

@github-actions github-actions Bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 5, 2025
@amoeba

amoeba commented Feb 5, 2025

Copy link
Copy Markdown
Member Author

Thanks for getting the jobs running, I'll check on them tomorrow.

@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 5, 2025

@raulcd raulcd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be working and is back to the original issue failing on a specific test on Arrow to download tzdata:

    def test_download_tzdata_on_windows():
        tzdata_path = os.path.expandvars(r"%USERPROFILE%\Downloads\tzdata")
    
        # Download timezone database and remove data in case it already exists
        if (os.path.exists(tzdata_path)):
            shutil.rmtree(tzdata_path)
>       download_tzdata_on_windows()
...
with urlopen('https://data.iana.org/time-zones/tzdata-latest.tar.gz') as response:

Is the function even necessary?
I understand is a util to download tzdata but I am not sure we want to provide that utility if users can just use importlib.resources to get the tzdata one.
Should we just remove the utility function?
@AlenkaF @jorisvandenbossche thoughts?

Comment thread ci/scripts/python_wheel_windows_test.bat Outdated
@amoeba

amoeba commented Feb 7, 2025

Copy link
Copy Markdown
Member Author

@github-actions crossbow submit wheel-windows-*

@amoeba

amoeba commented Feb 7, 2025

Copy link
Copy Markdown
Member Author

Last round of crossbow job failures look to be due to an issue with GitHub, I'm seeing it on other PRs too. Re-ran jobs and will keep an eye out.

@raulcd whenever you have a moment, could you please look at my recent change to util.py?

@github-actions

github-actions Bot commented Feb 7, 2025

Copy link
Copy Markdown

Revision: 459376c

Submitted crossbow builds: ursacomputing/crossbow @ actions-c246a1f639

Task Status
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

@jorisvandenbossche jorisvandenbossche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amoeba thanks for looking into this!

Apologies for the slow reply if I could have prevented some wasted time figuring this out, but indeed as you summarize in #45425 (comment), the way we understood it in the past is that we unfortunately cannot use the tzdata as shipped by the Python package. This would require some changes to support that format upstream in https://github.com/HowardHinnant/date.
See #31472 for the issue about it.

@amoeba could you update the title and top comment description of the PR, as I think it no longer reflects the change entirely?

Comment thread python/pyarrow/util.py Outdated
Comment thread python/pyarrow/tests/conftest.py Outdated
Comment thread ci/scripts/python_wheel_windows_test.bat
@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 7, 2025
@jorisvandenbossche

Copy link
Copy Markdown
Member

I also see in the wheel build logs the following message (https://github.com/ursacomputing/crossbow/actions/runs/13191501020/job/36825167067#step:9:1122):

SKIPPED [1] Python310\lib\site-packages\pyarrow\tests\test_compute.py:2301: Timezone database is not installed on Windows

so wondering what is going wrong (or are we not downloading the tzdata files in the wheel builds?)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@github-actions github-actions Bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 7, 2025
@amoeba amoeba changed the title GH-45295: [Python][CI] Use tzdata package to get tzinfo database when testing Windows wheels GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows Feb 7, 2025
@amoeba amoeba changed the title GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows for ORC Feb 7, 2025
@amoeba

amoeba commented Feb 7, 2025

Copy link
Copy Markdown
Member Author

Thanks for the review @jorisvandenbossche. Good catch on the message in CI. I'm not sure yet but will investigate.

@amoeba

amoeba commented Feb 7, 2025

Copy link
Copy Markdown
Member Author

@jorisvandenbossche I could be wrong but I don't think we're download the tzdata on the Docker-based Windows builds. On AppVeyor we are though. So AFAICT there's not a regression here.

I've updated the PR title and body to hopefully accurately reflect the changes here. I think once we approve of that we should merge this.

@kou kou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions Bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Feb 10, 2025
@amoeba amoeba merged commit f6e2cbe into apache:main Feb 11, 2025
@amoeba amoeba removed the awaiting merge Awaiting merge label Feb 11, 2025
@amoeba

amoeba commented Feb 11, 2025

Copy link
Copy Markdown
Member Author

Thanks all for the help reviewing and solving the issues here. I put this into 19.0.1 and will work on testing RC1 now.

amoeba added a commit to amoeba/arrow that referenced this pull request Feb 11, 2025
…ust and use tzdata package for tzinfo database on Windows for ORC (apache#45425)

We have two Windows issues and this PR is addressing both:

1. PyArrow's `download_tzdata_on_windows` can fail due to TLS issues in certain CI environments.
2. The Python wheel test infrastructure needs a tzinfo database for ORC and the automation fetching that started failing because the URL was made invalid upstream.

These two issues are being solved in one PR simply because they appeared together during the 19.0.1 release process but they're separate.

1. Makes `download_tzdata_on_windows` more robust to TLS errors by attempting to use `requests` if it's available and falling back to urllib otherwise.
2. Switches our Windows wheel test infrastructure to grab a tzinfo database from the tzdata package on PyPi instead of from a mirror URL. This should be much more stable for us over time.

Yes.

No.
* GitHub Issue: apache#45295

Lead-authored-by: Bryce Mecum <petridish@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Bryce Mecum <petridish@gmail.com>
@jorisvandenbossche

Copy link
Copy Markdown
Member

Thanks @amoeba!

@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit f6e2cbe.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants