Skip to content

ARROW-8147: [C++] Add google-cloud-cpp to ThirdpartyToolchain#8757

Closed
josiahyan wants to merge 10 commits into
apache:masterfrom
josiahyan:ARROW-8147-add-gcp-sdk
Closed

ARROW-8147: [C++] Add google-cloud-cpp to ThirdpartyToolchain#8757
josiahyan wants to merge 10 commits into
apache:masterfrom
josiahyan:ARROW-8147-add-gcp-sdk

Conversation

@josiahyan

Copy link
Copy Markdown
Contributor

I'm hoping to work on parts of the GCS filesystem implementation, and this is a prerequisite.

@josiahyan

Copy link
Copy Markdown
Contributor Author

Running the cmake autoformatter heavily modifies the touched cmake files; should I be doing that?

@josiahyan josiahyan marked this pull request as draft November 24, 2020 15:27
@github-actions

Copy link
Copy Markdown

@xhochy

xhochy commented Nov 24, 2020

Copy link
Copy Markdown
Member

Running the cmake autoformatter heavily modifies the touched cmake files; should I be doing that?

Maybe you have a different version of that installed and that formats differently? You can also summon the autoformatter with @github-actions<space>autotune on the PR.

@josiahyan

Copy link
Copy Markdown
Contributor Author

I'll take a look! Didn't know I could do that.

@josiahyan

Copy link
Copy Markdown
Contributor Author

@github-actions autotune

@josiahyan

josiahyan commented Nov 25, 2020

Copy link
Copy Markdown
Contributor Author

@xhochy I can't seem to summon the bot. But I read what the action did, and saw that it required a very specific version of the cmake autoformatter, as you suggested. It works now! Thanks for your help!

@josiahyan josiahyan marked this pull request as ready for review November 25, 2020 04:31
@xhochy

xhochy commented Nov 25, 2020

Copy link
Copy Markdown
Member

@xhochy I can't seem to summon the bot. But I read what the action did, and saw that it required a very specific version of the cmake autoformatter, as you suggested. It works now! Thanks for your help!

The bot did run but didn't make any changes to your CMake files: https://github.com/apache/arrow/runs/1450988053?check_suite_focus=true#step:10:11

@josiahyan

josiahyan commented Nov 25, 2020

Copy link
Copy Markdown
Contributor Author

Oh, I know why! The CI queue was full (my test runs were taking hours to go through). By the time the runner got to my PR (Wed, 25 Nov 2020 02:15:47 GMT), my final commit with the manual cmake autoformat had gone in about an hour before.

Comment thread ci/docker/conda-cpp.dockerfile Outdated
@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from 7d1014a to 4fd6a36 Compare December 1, 2020 14:43
@josiahyan

Copy link
Copy Markdown
Contributor Author

@github-actions rebase

@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch 2 times, most recently from 979c43a to 953de85 Compare December 1, 2020 16:43
Comment thread cpp/thirdparty/versions.txt Outdated
@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from 3c26730 to b9313da Compare December 1, 2020 21:21
@github-actions github-actions Bot added the needs-rebase A PR that needs to be rebased by the author label Dec 24, 2020
@nealrichardson

Copy link
Copy Markdown
Member

Hi @josiahyan, apologies for the delay here. Could you please rebase this? Hopefully we can get this reviewed and merged soon after.

@josiahyan

josiahyan commented Jan 29, 2021 via email

Copy link
Copy Markdown
Contributor Author

@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from b9313da to 850c239 Compare February 28, 2021 01:03
@josiahyan josiahyan marked this pull request as draft February 28, 2021 01:03
@josiahyan josiahyan marked this pull request as ready for review February 28, 2021 04:29
@josiahyan

Copy link
Copy Markdown
Contributor Author

Sorry about the delay! I've rebased the commits and manually merged in the changes made in master.

Comment thread ci/conda_env_unix.yml Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for the exact pin here?

@josiahyan josiahyan Feb 28, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currently available version in conda (1.24) is incompatible with the way I initially packaged it. I believe I saw in the CI run that storage_client was missing or something, as the google-cloud-cpp package has started namespacing its exports (finally!).

Given the hard link against abseil libraries (as gRPC does too), to reduce packaging size, I decided that the version was safer pinned. The abseil libraries actually loaded probably depend on the matrix of abseil versions and users of it (gRPC, and the proposed google-cloud-cpp inclusion), making it more brittle. Personally, I extract it by a bazel query (provided in comments) and a Python script to lookup the symbols in the ar files. I believe the gRPC builds do something similar, or just pull them out from the compiler invocations (does @pitrou or @kou know how they were determined in commits like these?). Ideally the linker would do the dead section elimination, but I didn't dig into why the build system wouldn't allow so (I'm guessing the libs have to be bundled in and linked separately somehow?).

I'm hoping to upgrade the dependency when there is code using it (which would be the GCS implementation of the Arrow file interfaces).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I determined required abseil libraries manually. CI error helped me.

@salqadri

Copy link
Copy Markdown

What's the status of this PR?

@josiahyan

Copy link
Copy Markdown
Contributor Author

What's the status of this PR?

Sorry, but I lost track of following up on this PR! I believed it was ready to merge in Feb, but assumed people had lost interest in pursuing (native) GCS support, or that this change was not quite worth making for now. Is there still a need for this? If so, I can attempt to rebase it again.

@nealrichardson

Copy link
Copy Markdown
Member

What's the status of this PR?

Sorry, but I lost track of following up on this PR! I believed it was ready to merge in Feb, but assumed people had lost interest in pursuing (native) GCS support, or that this change was not quite worth making for now. Is there still a need for this? If so, I can attempt to rebase it again.

I think there's interest (I'm interested, at least). I'd like to see someone more skilled in cmake than I to review and approve: @kou @pitrou @xhochy ?

@kou

kou commented Aug 1, 2021

Copy link
Copy Markdown
Member

Could you rebase on the master? Then I'll review this.

@josiahyan

josiahyan commented Aug 1, 2021 via email

Copy link
Copy Markdown
Contributor Author

@emkornfield

Copy link
Copy Markdown
Contributor

@josiahyan do you think you will have time to rebase? Otherwise I can see how difficult this might be.

@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from 7076b4c to 7531ca5 Compare August 23, 2021 05:44
@josiahyan

Copy link
Copy Markdown
Contributor Author

@josiahyan do you think you will have time to rebase? Otherwise I can see how difficult this might be.

Whoops sorry, I lost track of this. I’ll be trying to rebase this - I don’t see any major changes (so far) that should result in conflicts.

@josiahyan josiahyan marked this pull request as draft August 23, 2021 07:25
Comment thread ci/conda_env_unix.txt
autoconf
ccache
google-cloud-cpp=1.20.0
nlohmann_json>=3.4.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the new JSON dependency for?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess google's SDK must rely on it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am one of the authors of google's GCS SDK. Yes, the SDK definitely depends on nlohmann_json.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conda has dependency handling, so there is no reason to mention it explicitly, though.

GCPSDK_PREFIX_PATH)

set(GCPSDK_CXX_FLAGS "${EP_CXX_FLAGS}")
# workaround for crc32c not declaring its header interface in its Find*.cmake file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid this by using crc32c 1.1.1:

google/crc32c@b377ce4

@josiahyan

Copy link
Copy Markdown
Contributor Author

I’ve discussed this PR with @coryan , and we think its better to start over, given the need to upgrade dependencies, and handle the Abseil libraries (which I believe, when used with gRPC are causing the build to fail on some platforms) with more scripting. He will be taking over this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: C++ needs-rebase A PR that needs to be rebased by the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants