The code in this directory defines a 'meta-template' (i.e. a template that can be parameterized with sources, sinks, and intermediate transforms).
The prototype comes with a few utilties to bootstrap a simple workflow, though these are meant only for testing of the template capabilities.
This is a prototype and may change significantly. It is not supported for direct use.
The template works by relying on Beam transforms that implement the SchemaTransform interface and exist in the
template's classpath.
The template receives a pipeline spec, which is a specification defining the pipeline's source, sink, and intermediate transforms.
The simplest way to configure and launch a template is using the jsonSpecPayload parameter, which expects a
JSON payload with the following shape:
{
"source": {
"urn": "beam:source:urn",
"configurationParameters": {
"host": "https://somehost",
"port": 12345
}
},
"sink": {
"urn": "beam:sink:urn",
"configurationParameters": {
"projectId": "sample-project",
"tableId": "table-for-something"
}
}
}
The easiest way to find supported Schema Transform implementations, their URNs and their configuration
parameters is by running the GenerateConfiguration script, developed in PR 543:
mvn compile exec:java -pl syndeo-template/pom.xml \
-Dexec.mainClass="com.google.cloud.syndeo.GenerateConfiguration" \
-Dexec.args="output_file_name.prototext"To apply spotless formatting rules to the Syndeo template code, run the following command:
mvn -B spotless:apply compile -f pom.xml -pl syndeo-template/pom.xmlUsers and consumers of the Syndeo Template can generate a proto text description of the supported SchemaTransform
implementations, as well as their requirements and parameters. To generate this configuration, see:
mvn compile exec:java -pl syndeo-template/pom.xml \
-Dexec.mainClass="com.google.cloud.syndeo.GenerateConfiguration" \
-Dexec.args="output_file_name.prototext"Supported SchemaTransform and configuration parameters are configured in SyndeoTemplate.java. Specifically, the
SUPPORTED_URNS constant.
To run unit tests for the Syndeo template, run the following command. Note that this command knows to skip integration tests and only runs unit tests:
mvn clean package test -f pom.xml -pl syndeo-template -am -Djib.skip=trueThe command will compile and install necessary dependencies related to syndeo-template.
Note that syndeo depends on the teleport integration testing framework, so make sure to install that locally if the previous command doesn't already:
mvn install -DskipTests -f pom.xml -pl .,metadata,it/pom.xmlNote that the unit tests make use of Testcontainer, which relies on Docker. So make sure Docker is configured locally.
To set up your Google Cloud project for the integration tests, the following steps assume you have installed and setup gcloud.
- Set your default project.
gcloud config set project PROJECTFor more detailed information on individual tests, check out the TESTING.md file in this directory.
The following command will take the already-built artifacts and push them to GCS and GCR, where they can be utilized to run a template.
(Note: The project does not currently have an end-to-end test to validate template-based runs because the template would run as part of syndeo, which involves repeated runs and so on).
mvn package -DskipTests -f pom.xml -pl syndeo-template/pom.xml
# Push Template to GCP:
gcloud dataflow flex-template build gs://$GCS_BUCKET_NAME/syndeo-template.json \
--metadata-file syndeo-template/metadata.json --sdk-language "JAVA" \
--flex-template-base-image JAVA11 --image-gcr-path=gcr.io/$GCP_PROJECT/syndeo-template:latest \
--jar "syndeo-template/target/syndeo-template-1.0-SNAPSHOT.jar" \
--env FLEX_TEMPLATE_JAVA_MAIN_CLASS="com.google.cloud.syndeo.SyndeoTemplate"After staging a template in gs://$GCS_BUCKET_NAME/syndeo-template.json, you can then use that template to launch a Dataflow job.
Using the Google API Explorer on the Flex Template Launch API you can configure the parameters. Note that the jsonSpecPayload needs to be escaped and passed as a string (see the page).
Another option is via the command line ((pabloem) - I have not tested this successfully). You also need to provide an escaped JSON:
gcloud dataflow flex-template run ${JOB_NAME} \
--template-file-gcs-location gs://$GCS_BUCKET_NAME/syndeo-template.json \
--region ${REGION} --temp-location ${TEMP_LOCATION} \
--parameters "jsonSpecPayload={ \"source\": { \"urn\": \"bigquery:read\", \"configurationParameters\" : { \"table\": \"dataflow-syndeo.taxirides.realtime\" } }, \"sink\": { \"urn\": \"bigtable:write\", \"configurationParameters\": { \"projectId\": \"dataflow-syndeo\", \"instanceId\": \"syndeo-bt-test\", \"tableId\": \"syndeo-demo-table\", \"keyColumns\": [\"ride_id\"] } } },stagingLocation=${TEMP_LOCATION}/staging/"The Syndeo template inspects its classpath for implementations of SchemaTransformProvider.
If you are interested in developing for Beam, check out the Beam wiki: https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides
If you are developing a SchemaTransformProvider in Beam and you want to test it with a local Syndeo Template,
you need to install it into your local Maven .m2 folder.
To publish the module you are working with to your local .m2 folder, you can run the publishMavenJavaPublicationToMavenLocal
task on the subproject, like so:
./gradlew :sdks:java:io:${SUBPROJECT}:publishMavenJavaPublicationToMavenLocal -Ppublishingor
./gradlew -p sdks/java/io/${SUBPROJECT} -Ppublishing PublishToMavenLocalThis will create the appropriate artifacts under $USER/.m2/repository/org/apache/beam/beam-sdks-java-io-${SUBPROJECT}/.
You can then amend the beam.version parameter in syndeo-template/pom.xml to use the latest snapshot (i.e. 2.${XX}.0-SNAPSHOT)
for that Beam module. For example, for sdks-java-io-google-cloud-platform:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<version>2.47.0-SNAPSHOT</version>
</dependency>NOTE: Usually it is good to only upgrade the library that you're testing, to avoid pulling in too many changes from Beam.