Migrate Mlops Stacks to use databricks asset templates for project creation by mingyu89 · Pull Request #96 · databricks/mlops-stacks

mingyu89 · 2023-08-21T22:24:01Z

The project is thoroughly tested by using different input parameter combinations and compare the created project by cookiecutter and asset templates.
There's no need to do bugbash or manual integration test since there's no change to the project output.

For easier review:
The diff of test changes is in #97
The diff of changes is in #98

Design doc and validation Link

https://docs.google.com/document/d/19MOK9o-f0A_-NMmxC1cJGQSQ2WkJXZ4vXQZ4plu-tng/edit#bookmark=id.tsk81t8hk8bf

Sanity Check

Created project successfully run bundle deploy for staging and prod

Test PR feature=>staging of generated project https://github.com/mingyu89/test-repo1/pull/27
Successful unit test and integration test in test(E2-dogfood) environment https://github.com/mingyu89/test-repo1/actions/runs/6104146227/job/16565759289?pr=27

Validation

No.	Test Case	Validation Status	Diff Link
1	All parameters are default	Validated	https://github.com/mingyu89/test-repo1/pull/15/files
2	azure + github + delta	Validated	https://github.com/mingyu89/test-repo1/pull/16/files
3	azure + github + feature store	Validated	https://github.com/mingyu89/test-repo1/pull/17/files
4	azure + github + recipes	Validated	https://github.com/mingyu89/test-repo1/pull/18/files
5	aws + github enterprise + delta	Validated	https://github.com/mingyu89/test-repo1/pull/19/files
6	aws + github enterprise + feature store	Validated	https://github.com/mingyu89/test-repo1/pull/20/files
7	aws + github enterprise + recipes	Validated	https://github.com/mingyu89/test-repo1/pull/21/files
8	azure + ADO + delta	Validated	https://github.com/mingyu89/test-repo1/pull/22/files
9	azure + ADO + feature store	Validated	https://github.com/mingyu89/test-repo1/pull/23/files
10	aws + github + delta	Validated	https://github.com/mingyu89/test-repo1/pull/24/files
11	aws + github + feature store	Validated	https://github.com/mingyu89/test-repo1/pull/25/files
12	Default branch is the same as release branch	Validated	https://github.com/mingyu89/test-repo1/pull/26/files
13	invalid combination of aws + ADO	Validated
14	invalid project name	Validated
15	invalid selection of cloud/platform/feature store/recipes	Validated
16	invalid staging host	Validated
17	invalid prod host	Validated
18	invalid characters in project name	Validated

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

README.md

template/{{.input_root_dir}}/_params_testing_only.txt.tmpl

...ate/{{.input_root_dir}}/.github/workflows/{{.input_project_name}}-bundle-cd-staging.yml.tmpl

README.md

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

* Fix all mlops stacks tests Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * Remove all cookiecutter references from tests Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * black . Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * Remove cookiecutter parameter file Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * Rename more azure_devlops Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * Bump version of databricks cli Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * update docs to remove cookiecutter and add asset bundle templates (#98) * Update docs Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * Add more information about asset templats and link about databricks cli Signed-off-by: Mingyu Li <mingyu.li@databricks.com> * Fix doc issues Signed-off-by: Mingyu Li <mingyu.li@databricks.com> --------- Signed-off-by: Mingyu Li <mingyu.li@databricks.com> --------- Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

lennartkats-db

Added a few comments as I'm working on a reference template for DABs: databricks/cli#685, databricks/cli#686, databricks/cli#700. PTAL. cc @vladimirk-db

lennartkats-db · 2023-08-28T11:46:54Z

stack-customization.md


 ### ML resource configs
-Root ML resource config file can be found as ``{{cookiecutter.root_dir__update_if_you_intend_to_use_monorepo}}/{{cookiecutter.project_name_alphanumeric_underscore}}/bundle.yml``. 
+Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml``. 


Please note that bundle.yml should be replaced by databricks.yml throughout this code.

lennartkats-db · 2023-08-28T11:47:23Z

...te/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml.tmpl

  staging:
    workspace:
-      host: {{cookiecutter.databricks_staging_workspace_host}}
+      host: {{template `databricks_staging_workspace_host` .}}


For the above note that environments was renamed to targets.

lennartkats-db · 2023-08-28T11:48:22Z

...te/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml.tmpl

-      host: {{cookiecutter.databricks_staging_workspace_host}}
+      host: {{template `databricks_staging_workspace_host` .}}

  prod:


Please review if this could use mode: development, mode: production, and the other conventions(databricks/cli#686.

lennartkats-db · 2023-08-28T11:50:40Z

...te/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml.tmpl

+  name: {{template `project_name` .}}


 include:


This should follow the databricks/cli#686 convention and should be called resources. I'd just include resources/*.

lennartkats-db · 2023-08-28T11:51:00Z

...te/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml.tmpl

+    default: ${bundle.environment}-{{template `model_name` .}}


 bundle:


I'd put this at the top. And add a descriptive comment.

lennartkats-db · 2023-08-28T11:52:49Z

...alphanumeric_underscore` .}}/databricks-resources/batch-inference-workflow-resource.yml.tmpl

  jobs:
    batch_inference_job:
-      name: ${bundle.environment}-{{cookiecutter.project_name}}-batch-inference-job
+      name: ${bundle.environment}-{{template `project_name` .}}-batch-inference-job


Could you remove ${bundle.environment} here in favor of mode: development as seen in databricks/cli#686? Note that we plan to make prefixing first-class so it can be customized, but right now mode: development just adds "[dev short-username]` as a prefix. We don't prefix for production yet.

lennartkats-db · 2023-08-28T11:54:34Z

...e/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/__init__.py.tmpl

Would it work to have the Python code in src/ for consistency with the base DAB template of databricks/cli#686? Beyond that, please consider employing a scratch/ directory as well.

Hi @lennartkats-db, this is the layout we are using https://docs.google.com/document/d/1BTFOzxiVzCJ2uKN0f9N3id85pL32Gof6c0cD1ao7fNo/edit?disco=AAAAqSBw8nQ

The reason we have the current layout is because we have both python files and notebooks in the project and want to support both polyrepo and monorepo.

In the reference template, we actually put both notebooks and Python files in in src/. Tests live in tests/. So that seems like it would work for your use case as well? Both for monorepos and polyrepos? See https://github.com/databricks/bundle-examples/tree/main/default_python/src

shreyas-goenka

This look good, thanks for doing this! I am good with moving forward with this for now since we validated the diff with the current mlops-stacks.

We can followup with incremental improvements. @pietern WDYT?

Also, now we have the capability to bundle templates right in the CLI! We should consider moving mlops-stacks there!

shreyas-goenka · 2023-09-06T21:36:21Z

.../{{template `project_name_alphanumeric_underscore` .}}/tests/training/test_notebooks.py.tmpl

We should consider removing the .tmpl extension from files that are not templates, like this one. The benefit is:

It makes it clear which files are templates, are which are not.

You get nice syntax highlighting in the IDE for those files.

shreyas-goenka · 2023-09-06T21:38:55Z

library/template_variables.tmpl

@@ -0,0 +1,150 @@
+# define template variables
+{{ define `root_dir` -}}


We should consider directly using the parameters instead to print text. {{ .input_root_dir }}. This is idiomatic with examples in https://pkg.go.dev/text/template.

I still have the concern of using both {{.input_root_dir}} and {{template staging_host_name .}} in the files.
It will be great if there's a way to define input variable inside library functions. Then we can only use the input parameters.

@mingyu89 Do you mean some kind of aliasing? IIUC you want to call {{ .staging_host_name }} from the templates instead of the explicit {{ template "staging_host_name" . }}.

@pietern Yeah I made the input variables all have the input_ prefix and only use template variables in content output. I feel mixing the two type of variables could be confusing.

input variable .input_staging_host_name

can be used in file content output

can be used in if conditions or functions

template variable template "staging_host_name" .

can only be used in file content output

can not be used in if conditions or functions/parameters

I'm trying to only use template "staging_host_name" . in file content outputs and only use .input_staging_host_name in conditions so that we explicitly separate them into two levels.

Thanks. I agree it's cumbersome you have to use different approaches for output and conditionals.

@shreyas-goenka WDYT? We can discuss outside of this PR.

shreyas-goenka · 2023-09-06T21:39:55Z

library/template_variables.tmpl

+    {{ .input_project_name }}
+{{- end}}
+
+{{ define `cloud` -}}


Enum support coming up! We can then remove this. databricks/cli#668

shreyas-goenka · 2023-09-06T21:45:27Z

template/update_layout.tmpl

+    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/ingest_test.py`) }}
+    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/split_test.py`) }}
+    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/train_test.py`) }}
+    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/test_sample.parquet`) }}
+    {{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/transform_test.py`) }}


Skip supports glob patterns! You could replace the python files. Feel free to skip if you would rather have this be explict.

Suggested change

{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/ingest_test.py`) }}

{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/split_test.py`) }}

{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/train_test.py`) }}

{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/test_sample.parquet`) }}

{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/transform_test.py`) }}

{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/*_test.py`) }}

{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `tests/training/test_sample.parquet`) }}

It's great that the skip command support pattern matching. I feel it's less error prone to specify each file in this case instead of using ....../*_test.py

shreyas-goenka · 2023-09-06T21:48:38Z

template/update_layout.tmpl

@@ -0,0 +1,66 @@
+# Remove unrelated CICD platform files


Meta comment, IMO it would be best practice to have the skip patterns in the directories being skipped. Another benefit would be that it would simplify the skip logic since you would not have to specify the full path.

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

pietern

Excellent work!

pietern · 2023-09-07T08:17:14Z

README.md

 ### Prerequisites
 - Python 3.8+
- - [Cookiecutter Python package](http://cookiecutter.readthedocs.org/en/latest/installation.html) >= 2.1.0: This can be installed with pip:
+ - [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) >= v0.203.2


Since you're using the order field in the template schema, you need v0.204.0.

pietern · 2023-09-07T08:20:49Z

template/{{.input_root_dir}}/.github/workflows/{{.input_project_name}}-bundle-cd-prod.yml.tmpl

+jobs:
+  prod:
+    concurrency: {{template `project_name` .}}-prod-bundle-job
+    runs-on: ubuntu-20.04


ubuntu-22.04 is available and newer

vladimirk-db

Outstanding work @mingyu89 !

How do we want to release this? Merge directly into main and make it available for everyone immediately? Should be fine because the user facing changes here are small (databricks CLI instead of cookiecutter, and input names have slightly changed, but the output remains the same.)

vladimirk-db · 2023-09-07T16:14:48Z

README.md

 To create a new project, run:

-    cookiecutter https://github.com/databricks/mlops-stack
+    databricks bundle init https://github.com/databricks/mlops-stack


Love this - big milestone :-)

vladimirk-db · 2023-09-07T16:18:18Z

databricks_template_schema.json

+    "input_cloud": {
+      "order": 3,
+      "type": "string",
+      "description": "Select cloud. \nChoose from azure, aws",


I think we can (and should) support GCP as well. Can you please file a follow-up to verify it works on GCP and update docs?

MLOPS-200 - Investigate and onboard stacks to GCP cloud

…ate project using cookiecutter Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

…generating project Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

mingyu89 added 5 commits August 10, 2023 01:12

WIP

074ff7d

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Complete the implementation and fix all issues

4ef07a0

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Skip the test data file when recipe is not used

bd42533

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Add back support for root dir

5f8627d

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Fix all inconsistency issues based on validation

e1d2682

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

mingyu89 requested review from arpitjasa-db, vladimirk-db and zhe-db August 21, 2023 23:20

zhe-db reviewed Aug 22, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

zhe-db reviewed Aug 22, 2023

View reviewed changes

template/{{.input_root_dir}}/_params_testing_only.txt.tmpl Show resolved Hide resolved

zhe-db reviewed Aug 22, 2023

View reviewed changes

...ate/{{.input_root_dir}}/.github/workflows/{{.input_project_name}}-bundle-cd-staging.yml.tmpl Show resolved Hide resolved

mingyu89 commented Aug 22, 2023

View reviewed changes

README.md Show resolved Hide resolved

mingyu89 added 2 commits August 22, 2023 00:45

Revert changes to stacks docs

12813b7

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Replace azure_devlops with azure_devops

0750d51

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

mingyu89 changed the title ~~Use asset template~~ Migrate Mlops Stacks to use databricks asset templates for project creation Aug 24, 2023

mingyu89 requested a review from shreyas-goenka August 25, 2023 04:38

lennartkats-db reviewed Aug 28, 2023

View reviewed changes

shreyas-goenka approved these changes Sep 6, 2023

View reviewed changes

Add order to the stacks template schema

f9539df

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

mingyu89 requested review from lennartkats-db, vladimirk-db and zhe-db and removed request for vladimirk-db September 6, 2023 23:44

pietern approved these changes Sep 7, 2023

View reviewed changes

lennartkats-db approved these changes Sep 7, 2023

View reviewed changes

vladimirk-db approved these changes Sep 7, 2023

View reviewed changes

mingyu89 added 3 commits September 7, 2023 11:47

Update cli version to v0.204.0. Throw exception when users try to cre…

234b50a

…ate project using cookiecutter Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Fix version used by test

1b555a5

Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

Fix tests by adding a dummy default profile to databricks CLI before …

02c3e90

…generating project Signed-off-by: Mingyu Li <mingyu.li@databricks.com>

mingyu89 merged commit d2f1f66 into main Sep 7, 2023

mingyu89 deleted the use-asset-template branch September 7, 2023 20:34

		default: ${bundle.environment}-{{template `model_name` .}}


		bundle:

		@@ -0,0 +1,150 @@
		# define template variables
		{{ define `root_dir` -}}

Conversation

mingyu89 commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design doc and validation Link

Sanity Check

Validation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lennartkats-db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shreyas-goenka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shreyas-goenka Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mingyu89 Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mingyu89 Sep 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mingyu89 commented Aug 21, 2023 •

edited

Loading

shreyas-goenka Sep 6, 2023 •

edited

Loading

mingyu89 Sep 6, 2023 •

edited

Loading

mingyu89 Sep 7, 2023 •

edited

Loading