Skip to content

Use gcp sdk provided flow for obtaininng application default credenti…#430

Merged
jacobtomlinson merged 9 commits into
dask:mainfrom
dbalabka:patch-1
Sep 20, 2024
Merged

Use gcp sdk provided flow for obtaininng application default credenti…#430
jacobtomlinson merged 9 commits into
dask:mainfrom
dbalabka:patch-1

Conversation

@dbalabka

Copy link
Copy Markdown
Collaborator

…als (#429)

@jacobtomlinson jacobtomlinson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great improvement.

  • Do we need to bump the minimum version of the google.auth dependency to support this?
  • Can you fix up the tests, specifically the skip when missing credentials?
  • Can you verify that tests pass locally for you?

@dbalabka

dbalabka commented Aug 8, 2024

Copy link
Copy Markdown
Collaborator Author

@jacobtomlinson, fixing the test requires supporting an older version of google.auth. Also, I've noticed that my changes break the cluster creation. I've been working on this to fix it.

@dbalabka

dbalabka commented Aug 8, 2024

Copy link
Copy Markdown
Collaborator Author

However, I see that the minimum required version supports the used method google.auth.default:
https://github.com/googleapis/google-auth-library-python/blob/v1.23.0/google/auth/_default.py#L251

@jacobtomlinson jacobtomlinson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed things up in the CI here, but there are now conflicts. @dbalabka would you mind polishing this up and getting it ready for another review?

@dbalabka

dbalabka commented Sep 18, 2024

Copy link
Copy Markdown
Collaborator Author

@jacobtomlinson , thanks for the reminder. I belive I can get back to the PR improvements next week. In our case, I've started to use a service account; however, it is not the best option for local development according to GCP recommendation because the service account isn't tight with the developer's account: https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev:~:text=when%20your%20code%20is%20running%20in%20a%20local%20development%20environment%2C%20such%20as%20a%20development%20workstation%2C%20the%20best%20option%20is%20to%20use%20the%20credentials%20associated%20with%20your%20user%20account.

@dbalabka

dbalabka commented Sep 19, 2024

Copy link
Copy Markdown
Collaborator Author

@jacobtomlinson , seems GCP tests are all skipped in test env. I've tried to run cluster locally and get a weird error. Can it be the old image reason?

dask_cloudprovider/gcp/tests/test_gcp.py:100 (test_create_cluster_sync)
>   ???

/opt/conda/lib/python3.10/site-packages/distributed/scheduler.py:4846: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/conda/lib/python3.10/site-packages/distributed/protocol/serialize.py:452: in deserialize
    ???
/opt/conda/lib/python3.10/site-packages/distributed/protocol/serialize.py:111: in pickle_loads
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: code expected at most 16 arguments, got 18

/opt/conda/lib/python3.10/site-packages/distributed/protocol/pickle.py:92: TypeError

The above exception was the direct cause of the following exception:

    @pytest.mark.asyncio
    @pytest.mark.timeout(1200)
    @pytest.mark.external
    async def test_create_cluster_sync():
        skip_without_credentials()
    
        cluster = GCPCluster(zone="europe-west4-a", n_workers=1)
        client = Client(cluster)
    
        def inc(x):
            return x + 1
    
>       assert client.submit(inc, 10).result() == 11

test_gcp.py:113: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../.venv/lib/python3.11/site-packages/distributed/client.py:402: in result
    return self.client.sync(self._result, callback_timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Future: error, key: inc-cf34e1fb7d5c448b29abbaab83f72f6a>
raiseit = True

    async def _result(self, raiseit=True):
        await self._state.wait()
        if self.status == "error":
            exc = clean_exception(self._state.exception, self._state.traceback)
            if raiseit:
                typ, exc, tb = exc
>               raise exc.with_traceback(tb)
E               RuntimeError: Error during deserialization of the task graph. This frequently
E               occurs if the Scheduler and Client have different environments.
E               For more information, see
E               https://docs.dask.org/en/stable/deployment-considerations.html#consistent-software-environments

../../../.venv/lib/python3.11/site-packages/distributed/client.py:410: RuntimeError

@jacobtomlinson

Copy link
Copy Markdown
Member

Yeah quite possibly! But if you got that far in the test then the Dask cluster deployed successfully. We should dig into what is happening here (I assume you have a different Python version locally than the one in the container), but I don't want that to hold up this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants