[SS-69] user-facing docs for Iceberg sink, now with GCP support#36773
[SS-69] user-facing docs for Iceberg sink, now with GCP support#36773ublubu wants to merge 27 commits into
Conversation
81cac44 to
f582bfa
Compare
1f326ad to
07ae4e1
Compare
|
|
||
| The name of the user to connect as. | ||
|
|
||
| - name: "syntax-gcp" |
There was a problem hiding this comment.
Q: How does this affect https://preview.materialize.com/materialize/36773/sql/create-connection/#s3-compatible-object-storage where we mention Google Cloud Storage ... ?
There was a problem hiding this comment.
Interesting! That feature is not connected to this one, and they use different auth mechanisms. Maybe one day we'll make a GCS-native "copy from" operation, and that would use the new GCP connection primitive.
There was a problem hiding this comment.
Ah .. thanks ... I think we'll want to then be a little more explicit in the connection page. Will add my comments to that page then.
07ae4e1 to
aa79174
Compare
acc2db7 to
26b818e
Compare
aa79174 to
7912fa8
Compare
7912fa8 to
0479930
Compare
c8b6ed1 to
4f0aa58
Compare
a5b9e2b to
1ab3fd1
Compare
41dfc25 to
7b1feb5
Compare
7b1feb5 to
7fc632d
Compare
7fc632d to
a9a6ed9
Compare
|
|
||
| The name of the user to connect as. | ||
|
|
||
| - name: "syntax-gcp" |
There was a problem hiding this comment.
Ah .. thanks ... I think we'll want to then be a little more explicit in the connection page. Will add my comments to that page then.
| ); | ||
| ``` | ||
|
|
||
| ### GCP |
There was a problem hiding this comment.
In the above S3 compatible object storage section:
-
"You can use an AWS connection to perform bulk exports and bulk imports ..." -> "You can use an AWS connection to perform bulk exports (`COPY TO`) and bulk imports (`COPY FROM`) ..."
-
We could also add a disambiguation sentence that tells them if creating a GCP Iceberg sink to see this section.
|
|
||
| A Google Cloud Platform (GCP) connection gives Materialize | ||
| a [service account](https://docs.cloud.google.com/iam/docs/service-account-overview) | ||
| in your GCP project. You can use a GCP connection to authenticate with |
There was a problem hiding this comment.
I would break this up and reorder so that the usage (i.e., creating an Iceberg catalog) is more prominent. This might make it easier to reword that service account thing as well.
|
|
||
| ### GCP | ||
|
|
||
| A Google Cloud Platform (GCP) connection gives Materialize |
There was a problem hiding this comment.
"gives MZ a service account" makes it seem like creating a connection creates a service account. it just uses the service account when authenticating, yes?
| - name: "`<connection_name>`" | ||
| description: | | ||
| A name for the connection. | ||
| - name: "`<service_account_key>`" |
There was a problem hiding this comment.
Ah ... I should have been clearer in my comment.
Since it's a syntax for the create connection ... we don't need to explain the syntax of the service account key as part of this syntax. By incorporating ... I meant more SERVICE ACCOUNT KEY description referencing the secret that stores the service account key. and recommend the base64 encoding and using decode ... when creating the secret.
Also, did you mean to remove the example?
| - name: "syntax-iceberg-catalog" | ||
| - name: "syntax-gcp" | ||
| code: | | ||
| CREATE SECRET <secret_name> AS decode('<service_account_key>', 'base64'); |
There was a problem hiding this comment.
This already assumes that you have <service_account_key> is base64 encoded.
probably can add a comment that <service_account_key> is base64-encoded. If you want to have an example, we can remove this CREATE SECRET ... decode from here.
|
|
||
| ## Prerequisites | ||
|
|
||
| Google Cloud [documents the Lakehouse/BigLake setup process here](https://docs.cloud.google.com/lakehouse/docs/lakehouse-iceberg-rest-catalog). The parts you'll need: |
There was a problem hiding this comment.
Start with the actual prerequisites, the bullet points.
| - A Google Cloud project with the BigLake API enabled. | ||
| - A Google Cloud Storage bucket to serve as the Iceberg warehouse. | ||
| - A Lakehouse runtime catalog backed by your warehouse bucket. | ||
| - _NOTE: Materialize uses a service account key, not catalog-vended credentials, to write Iceberg data files._ |
There was a problem hiding this comment.
We generally use {{< note >}} {{< /note >}} blocks for notes.
| - `serviceusage.serviceUsageConsumer` (Service Usage Consumer) | ||
| 3. Grant the service account this role on your **Iceberg warehouse bucket**: | ||
| - `storage.objectUser` (Storage Object User) | ||
| 4. [Create a service account key.](https://docs.cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-gcloud) |
There was a problem hiding this comment.
The google docs tell people to use JSON and not p12 ... not sure if we want to just mention create a service account key (JSON format) or something.
There was a problem hiding this comment.
Do we need to base64 encode it here? or is it already base64 encoded when you get it from google? That is, the example in step 2 has a comment showing how to base 64 encode the key.json ... but shouldn't we do it explicitly as a step here if people need to do it?
|
|
||
| ### Limitations | ||
|
|
||
| {{% include-headless "/headless/iceberg-sinks/limitations-list" %}} |
There was a problem hiding this comment.
Can we add the compaction here
7428408 to
19a77b5
Compare
|
Uh, this got automatically closed when I merged the previous PR. I will reopen after getting it on the correct base branch again. |
|
Because I accidentally pushed instead of updating to point at |
|
You can just open a new PR from the same branch. |
reopening #36773 ---- Adding docs for: - GCP connection - GCP Lakehouse/BigLake Iceberg Catalog Connection _(BigLake is still the name of the API and everything in the GCP console, but it lives under a Lakehouse umbrella now.)_ Small changes to docs for Iceberg sink.

Adding docs for:
Small changes to docs for Iceberg sink.
stacked on #36695