Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions dbt_sql/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# dbt_sql

The 'dbt_sql' project was generated by using the dbt template for
Databricks Asset Bundles. It follows the standard dbt project structure
Declarative Automation Bundles. It follows the standard dbt project structure
and has an additional `resources` directory to define Databricks resources such as jobs
that run dbt models.

* Learn more about dbt and its standard project structure here: https://docs.getdbt.com/docs/build/projects.
* Learn more about Databricks Asset Bundles here: https://docs.databricks.com/en/dev-tools/bundles/index.html
* Learn more about Declarative Automation Bundles here: https://docs.databricks.com/en/dev-tools/bundles/index.html

The remainder of this file includes instructions for local development (using dbt)
and deployment to production (using Databricks Asset Bundles).
and deployment to production (using Declarative Automation Bundles).

## Development setup

Expand Down Expand Up @@ -88,20 +88,20 @@ $ dbt test

## Production setup

Your production dbt profiles are defined in dbt_profiles/profiles.yml.
These profiles define the default catalog, schema, and any other
Your production dbt profiles are defined in `dbt_profiles/profiles.yml`.
These profiles define the default warehouse, catalog, schema, and any other
target-specific settings. Read more about dbt profiles on Databricks at
https://docs.databricks.com/en/workflows/jobs/how-to/use-dbt-in-workflows.html#advanced-run-dbt-with-a-custom-profile.

The target workspaces for staging and prod are defined in databricks.yml.
The target workspaces for staging and prod are defined in `databricks.yml`.
You can manually deploy based on these configurations (see below).
Or you can use CI/CD to automate deployment. See
https://docs.databricks.com/dev-tools/bundles/ci-cd.html for documentation
on CI/CD setup.

## Manually deploying to Databricks with Databricks Asset Bundles
## Manually deploying to Databricks with Declarative Automation Bundles

Databricks Asset Bundles can be used to deploy to Databricks and to execute
Declarative Automation Bundles can be used to deploy to Databricks and to execute
dbt commands as a job using Databricks Workflows. See
https://docs.databricks.com/dev-tools/bundles/index.html to learn more.

Expand All @@ -120,7 +120,7 @@ For example, the default template would deploy a job called
You can find that job by opening your workpace and clicking on **Workflows**.

You can also deploy to your production target directly from the command-line.
The warehouse, catalog, and schema for that target are configured in databricks.yml.
The warehouse, catalog, and schema for that target are configured in `dbt_profiles/profiles.yml`.
When deploying to this target, note that the default job at resources/dbt_sql.job.yml
has a schedule set that runs every day. The schedule is paused when deploying in development mode
(see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
Expand Down
3 changes: 1 addition & 2 deletions dbt_sql/databricks.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# This file defines the structure of this project and how it is deployed
# to production using Databricks Asset Bundles.
# to production using Declarative Automation Bundles.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: dbt_sql
uuid: 5e5ca8d5-0388-473e-84a1-1414ed89c5df

include:
- resources/*.yml
- resources/*/*.yml

# Deployment targets.
# The default schema, catalog, etc. for dbt are defined in dbt_profiles/profiles.yml
Expand Down
2 changes: 1 addition & 1 deletion default_minimal/.vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"recommendations": [
"databricks.databricks",
"redhat.vscode-yaml",
"ms-python.black-formatter"
"charliermarsh.ruff"
]
}
2 changes: 1 addition & 1 deletion default_minimal/.vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
},
}
1 change: 0 additions & 1 deletion default_minimal/databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ bundle:

include:
- resources/*.yml
- resources/*/*.yml

# Variable declarations. These variables are assigned in the dev/prod targets below.
variables:
Expand Down
2 changes: 1 addition & 1 deletion default_python/.vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"recommendations": [
"databricks.databricks",
"redhat.vscode-yaml",
"ms-python.black-formatter"
"charliermarsh.ruff"
]
}
2 changes: 1 addition & 1 deletion default_python/.vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
},
}
1 change: 0 additions & 1 deletion default_python/databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ bundle:

include:
- resources/*.yml
- resources/*/*.yml

artifacts:
python_artifact:
Expand Down
6 changes: 4 additions & 2 deletions default_python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ dependencies = [
[dependency-groups]
dev = [
"pytest",
"ruff",
"databricks-dlt",
"databricks-connect>=15.4,<15.5",
"ipykernel",
]

[project.scripts]
Expand All @@ -25,5 +27,5 @@ main = "default_python.main:main"
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.black]
line-length = 125
[tool.ruff]
line-length = 120
4 changes: 3 additions & 1 deletion default_python/src/default_python/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@

def main():
# Process command-line arguments
parser = argparse.ArgumentParser(description="Databricks job with catalog and schema parameters")
parser = argparse.ArgumentParser(
description="Databricks job with catalog and schema parameters",
)
parser.add_argument("--catalog", required=True)
parser.add_argument("--schema", required=True)
args = parser.parse_args()
Expand Down
6 changes: 1 addition & 5 deletions default_python/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
"""This file configures pytest.

This file is in the root since it can be used for tests in any place in this
project, including tests under resources/.
"""
"""This file configures pytest, initializes Databricks Connect, and provides fixtures for Spark and loading test data."""

import os, sys, pathlib
from contextlib import contextmanager
Expand Down
7 changes: 6 additions & 1 deletion default_sql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ The 'default_sql' project was generated by using the default-sql template.
6. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
https://docs.databricks.com/dev-tools/vscode-ext.html.

7. For documentation on the Databricks Asset Bundles format used
7. For documentation on the Declarative Automation Bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.

## Changing the warehouse, catalog, or schema

The default SQL warehouse, catalog, and schema are configured in `databricks.yml`.
To change these settings, edit the `variables` section for each target (dev/prod).
1 change: 0 additions & 1 deletion default_sql/databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ bundle:

include:
- resources/*.yml
- resources/*/*.yml

# Variable declarations. These variables are assigned in the dev/prod targets below.
variables:
Expand Down
2 changes: 1 addition & 1 deletion lakeflow_pipelines_python/.vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"recommendations": [
"databricks.databricks",
"redhat.vscode-yaml",
"ms-python.black-formatter"
"charliermarsh.ruff"
]
}
2 changes: 1 addition & 1 deletion lakeflow_pipelines_python/.vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
},
}
1 change: 0 additions & 1 deletion lakeflow_pipelines_python/databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ bundle:

include:
- resources/*.yml
- resources/*/*.yml

# Variable declarations. These variables are assigned in the dev/prod targets below.
variables:
Expand Down
6 changes: 4 additions & 2 deletions lakeflow_pipelines_python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ dependencies = [
[dependency-groups]
dev = [
"pytest",
"ruff",
"databricks-dlt",
"databricks-connect>=15.4,<15.5",
"ipykernel",
]

[project.scripts]
Expand All @@ -28,5 +30,5 @@ build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src"]

[tool.black]
line-length = 125
[tool.ruff]
line-length = 120
2 changes: 1 addition & 1 deletion lakeflow_pipelines_sql/.vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"recommendations": [
"databricks.databricks",
"redhat.vscode-yaml",
"ms-python.black-formatter"
"charliermarsh.ruff"
]
}
2 changes: 1 addition & 1 deletion lakeflow_pipelines_sql/.vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
},
}
1 change: 0 additions & 1 deletion lakeflow_pipelines_sql/databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ bundle:

include:
- resources/*.yml
- resources/*/*.yml

# Variable declarations. These variables are assigned in the dev/prod targets below.
variables:
Expand Down
2 changes: 1 addition & 1 deletion pydabs/.vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"recommendations": [
"databricks.databricks",
"redhat.vscode-yaml",
"ms-python.black-formatter"
"charliermarsh.ruff"
]
}
2 changes: 1 addition & 1 deletion pydabs/.vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
},
}
1 change: 0 additions & 1 deletion pydabs/databricks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ python:

include:
- resources/*.yml
- resources/*/*.yml

artifacts:
python_artifact:
Expand Down
8 changes: 5 additions & 3 deletions pydabs/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@ dependencies = [
[dependency-groups]
dev = [
"pytest",
"ruff",
"databricks-dlt",
"databricks-connect>=15.4,<15.5",
"databricks-bundles==0.279.0",
"ipykernel",
"databricks-bundles==0.295.0",
]

[project.scripts]
Expand All @@ -26,5 +28,5 @@ main = "pydabs.main:main"
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.black]
line-length = 125
[tool.ruff]
line-length = 120
2 changes: 1 addition & 1 deletion pydabs/src/pydabs/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
def main():
# Process command-line arguments
parser = argparse.ArgumentParser(
description="Databricks job with catalog and schema parameters"
description="Databricks job with catalog and schema parameters",
)
parser.add_argument("--catalog", required=True)
parser.add_argument("--schema", required=True)
Expand Down
4 changes: 1 addition & 3 deletions pydabs/src/pydabs_etl/transformations/sample_zones_pydabs.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,5 @@
def sample_zones_pydabs():
# Read from the "sample_trips" table, then sum all the fares
return (
spark.read.table(f"sample_trips_pydabs")
.groupBy(col("pickup_zip"))
.agg(sum("fare_amount").alias("total_fare"))
spark.read.table(f"sample_trips_pydabs").groupBy(col("pickup_zip")).agg(sum("fare_amount").alias("total_fare"))
)
6 changes: 1 addition & 5 deletions pydabs/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
"""This file configures pytest.

This file is in the root since it can be used for tests in any place in this
project, including tests under resources/.
"""
"""This file configures pytest, initializes Databricks Connect, and provides fixtures for Spark and loading test data."""

import os, sys, pathlib
from contextlib import contextmanager
Expand Down
5 changes: 2 additions & 3 deletions scripts/update_from_templates.sh
Original file line number Diff line number Diff line change
Expand Up @@ -81,15 +81,15 @@ init_bundle "default-sql" "853cd9bc-631c-4d4f-bca0-3195c7540854" '{
"project_name": "default_sql",
"http_path": "/sql/1.0/warehouses/abcdef1234567890",
"default_catalog": "catalog",
"personal_schemas": "yes, automatically use a schema based on the current user name during development"
"personal_schemas": "yes"
}'

init_bundle "dbt-sql" "5e5ca8d5-0388-473e-84a1-1414ed89c5df" '{
"project_name": "dbt_sql",
"http_path": "/sql/1.0/warehouses/abcdef1234567890",
"serverless": "yes",
"default_catalog": "catalog",
"personal_schemas": "yes, use a schema based on the current user name during development"
"personal_schemas": "yes"
}'

init_bundle "lakeflow-pipelines" "295000fc-1ea8-4f43-befe-d5fb9f7d4ad4" '{
Expand All @@ -99,7 +99,6 @@ init_bundle "lakeflow-pipelines" "295000fc-1ea8-4f43-befe-d5fb9f7d4ad4" '{
"language": "sql"
}'


init_bundle "lakeflow-pipelines" "87a174ba-60e4-4867-a140-1936bc9b00de" '{
"project_name": "lakeflow_pipelines_python",
"default_catalog": "catalog",
Expand Down
Loading