Skip to content

[PF-692] Phase 2 regionality release#2680

Merged
wnojopra merged 22 commits intodevfrom
willyn-p2
Nov 24, 2021
Merged

[PF-692] Phase 2 regionality release#2680
wnojopra merged 22 commits intodevfrom
willyn-p2

Conversation

@wnojopra
Copy link
Contributor

@wnojopra wnojopra commented Nov 4, 2021

Follow up from #2660 . Phase 2 allows Cloud Environments to be created outside of us-central1. This becomes available if your workspace bucket is outside of the US.

To test:

  1. Go to a workspace with a non-US bucket. If you need to create one, modify the availableBucketRegions function in region-common.js.

  2. Open the Cloud Environment modal and click CUSTOMIZE. The location dropdown should be enabled and should have two choices - the bucket region and the US.

  3. If you create the environment in the non-US region, you should see this warning:
    image

  4. Create the environment!

This PR also updates the pricing estimates to be regional. Previously, price estimate data was looked up manually in the documentation and entered into machines.js. I have written a script that looks up the pricing data via Google's API in https://gist.github.com/wnojopra/be5be3ab7e6a09a351b89085992a940d (TODO: re-write this from Python to Node). You can copy-paste the output of that script into machines.js for updated pricing data. Follow the instructions in the script to run.

@wnojopra wnojopra marked this pull request as ready for review November 14, 2021 23:48
@wnojopra wnojopra requested review from a team, kyuksel and petesantos November 14, 2021 23:48
@wnojopra wnojopra changed the title Phase 2 regionality release [PF-692] Phase 2 regionality release Nov 14, 2021
@kyuksel kyuksel removed the request for review from petesantos November 15, 2021 14:22
Copy link
Contributor

@kyuksel kyuksel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't finish going through the PR yet so to be continued…

@kyuksel
Copy link
Contributor

kyuksel commented Nov 17, 2021

I'm seeing an issue when creating a runtime after deleting a runtime without refreshing the page manually, at least that's the only pattern I could identify so far. Using my test workspace with bucket in a non-US region

image

if I open the compute modal right after deleting a runtime, it displays location as us-central1, and if I click CREATE, it submits a runtime creation request to Leo with location as us-central1 too with none of the warning wording due to locations differing. I didn't get to dig into it why. If I manually refresh the page, the modal seems to eventually pick up the right (Asia) location.

@kyuksel
Copy link
Contributor

kyuksel commented Nov 17, 2021

Workspace cloning modal displays Select… for Bucket location when its bucket is in a non-US location:

image

@kyuksel
Copy link
Contributor

kyuksel commented Nov 17, 2021

The warning message below

image

seems incorrect based on Google's documentation?

image

@wnojopra
Copy link
Contributor Author

I reproduced it again at first try and the screencast is below.

Ok! I think I understand the issue. When WorkspaceContainer is first mounted it makes one api call to GCS for the bucket location. But if you do something that calls wrapWorkspace (such as navigating to a different tab, for example), it re-renders WorkspaceContainer but doesn't keep the old state. So when you opened the ComputeModal, you now opened it with the default location instead of the actual location.

So the fix I pushed copies how WorkspaceContainer caches the workspace data. I create a new store for the bucket location, and fetch it when re-rendering WorkspaceContainer.

@kyuksel
Copy link
Contributor

kyuksel commented Nov 21, 2021

It's a bit confusing, but that says to a "different Google Cloud service".

Oh wow - good eye!

Copy link
Contributor

@kyuksel kyuksel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've validated the fixes since my last round of review and reviewed the corresponding code. Good work! One question inline about the new store entry.

Next (Monday unless something unforeseen happens), I want to do some validation on price, etc. changes in machines.js and region-common.js. I'll so ask the team for someone else to do a review.

I'll plan on reviewing the script PR later in the week since it's not a blocker.

Copy link
Contributor

@jdcanas jdcanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good from a functional testing perspective,

tested it with disks from the same/different region, and cross-region + same-region workspace/VMs.

@wnojopra wnojopra mentioned this pull request Nov 23, 2021
@kyuksel
Copy link
Contributor

kyuksel commented Nov 24, 2021

I see a discrepancy in cost estimates that showed up when I enabled GPUs and started playing around with values.
I'm using the same workspace for both.

Dev:

image

PR:

image

@kyuksel
Copy link
Contributor

kyuksel commented Nov 24, 2021

Another cost estimate discrepancy, this time with Dataproc clusters: Not sure if it could be a rounding difference but even if it is, it'd be good to understand why.

I'm using the same workspace for both.

Dev:

image

PR:

image

@kyuksel
Copy link
Contributor

kyuksel commented Nov 24, 2021

Did some spot-checks on individual resource prices in machines.js vs. Google docs, and they looked good 👍🏼

@wnojopra
Copy link
Contributor Author

The discrepancy with the 4 P4 GPUs is due to a bug on dev. Take a look at the price for 4 P4 GPUs on dev. It is 1.8000. Because the price for one P4 GPU is 0.6000, the price for four should be 4x that, or 2.4000. The price is correct in my PR.

Taking a look at the dataproc discrepancy now.

@wnojopra
Copy link
Contributor Author

The dataproc discrepancy is 100% caused by rounding changes. I believe the cost estimates in my PR are more accurate.

In your example, you have 16 n1-highmem-96 preemtible workers. On dev an n1-highmem-96 preemptible machine costs $1.20 per hour. This makes the total cost for preemptible workers 16 * $1.20 = $19.20

In my PR, the cost of the preemptible machine $1.195488 per hour. This makes the total cost for preemptible workers 16 * $1.195488 = $19.127808. This is nearly an 8 cent difference alone. Then you include the same discrepancies for the non-preemptible workers.

@wnojopra
Copy link
Contributor Author

Thanks @zarsky-broad . I have your changes implemented but I'm running into trouble testing because I'm suddenly getting 403s from sam. I've posted on slack and will update here asap.

@kyuksel
Copy link
Contributor

kyuksel commented Nov 24, 2021

The discrepancy with the 4 P4 GPUs is due to a bug on dev. Take a look at the price for 4 P4 GPUs on dev. It is 1.8000. Because the price for one P4 GPU is 0.6000, the price for four should be 4x that, or 2.4000. The price is correct in my PR.

💯 Thank you!

Copy link
Contributor

@kyuksel kyuksel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 pending Isaac's comments, rebasing and having tests passed.
Thanks for sticking with this! 🏆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants