From e7623d186d33d923872ce1302c134fab02047786 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:17:26 +0300 Subject: [PATCH 001/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 67 +++++++++++++++++++++++++++++++++++ docs/mint.json | 1 + 2 files changed, 68 insertions(+) create mode 100644 docs/dbt/on-run-end_hooks.mdx diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx new file mode 100644 index 000000000..019e1d6ca --- /dev/null +++ b/docs/dbt/on-run-end_hooks.mdx @@ -0,0 +1,67 @@ +--- +title: "Elementary dbt package on-run-end hooks" +sidebarTitle: "on-run-end hooks" +--- + +Elementary dbt package uses `on-run-end` [hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) to log results and metadata to tables in the Elementary schema. + +## What happens on the `on-run-end` hooks? + +On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `graph` objects, and runs SQL queries to load this data to the Elementary models. + +There are 2 types of models that Elementary updates : + +1. Metadata models - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. +2. Result models - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. + +#### Updates of metadata models + +These models store the current resources and configuration in your dbt projects (models, snapshots, sources, tests, etc.). +The metadata in the models only represents the project state on the latest run, so upon changes the metadata is replaced. +The `on-run-end` hook runs SQL queries with the new metadata and updates the relevant tables. + +#### Updates of result models +These models store a log of results of dbt invocations, and of the specific executed resources. +The `on-run-end` hook runs SQL queries with the run results and invocation details. + + +## What's the performance impact of `on-run-end` hooks? + +We give a lot of thought and effort to making Elementary efficient in both cost and performance. +We only run the hooks that are relevant to each run, and each hook creates a minimal amount of queries possible. + +**Metadata models** +For `dbt 1.4.0` and above, we maintain a metadata cache. +This means each of these models are only updated with changes in your project (new model, change in config, etc.). +For this reason, on the first time you execute Elementary the initial update might take a while, but the following updates should be quick. +The performance impact of this update depends on the frequency and volume of changes to your dbt project. + +If you are using `dbt 1.3.0` or lower, these models would be fully updated on each run. +The performance impact depends on the size of your dbt project. +You can also disable the metadata autoupload, and run the same update using the command `dbt run --select elementary.edr.dbt_artifacts`. + +**Result models** +The size of the queries depends on the amount of models/tests executed in the run. +The time the run results adds to the invocation shouldn't be significant. + + +## Can I disable the `on-run-end` hooks? + +Yes, but note that this may cause missing results and/or outdated metadata in Elementary report and alerts. + +**Disable metadata models updates** +Configure the following var: +```yaml dbt_project.yml +vars: + disable_dbt_artifacts_autoupload: true +``` +If you disable the artifacts autoupload, we recommend your run `dbt run --select elementary.edr.dbt_artifacts` every time you deploy changes to your project. + +**Disable result models updates** +Configure the following vars (you can also disable with conditions): +```yaml dbt_project.yml +vars: + disable_run_results: true + disable_tests_results: true + disable_dbt_invocation_autoupload: "{{ target.name != 'prod' }}" +``` diff --git a/docs/mint.json b/docs/mint.json index ca9e0c222..d859ce8b9 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -138,6 +138,7 @@ "pages": [ "understand-elementary/elementary-overview", "guides/modules-overview/dbt-package", + "dbt/on-run-end_hooks", "dbt/dbt-artifacts", "understand-elementary/elementary-report-ui", "understand-elementary/elementary-alerts" From af4b70b243f0a3a637d309186593184a0adfe854 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:22:12 +0300 Subject: [PATCH 002/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index 019e1d6ca..e2633b002 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -11,8 +11,8 @@ On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `g There are 2 types of models that Elementary updates : -1. Metadata models - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. -2. Result models - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. +**1. Metadata models** - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. +**2. Result models** - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. #### Updates of metadata models @@ -25,22 +25,21 @@ These models store a log of results of dbt invocations, and of the specific exec The `on-run-end` hook runs SQL queries with the run results and invocation details. -## What's the performance impact of `on-run-end` hooks? +## Performance impact of `on-run-end` hooks We give a lot of thought and effort to making Elementary efficient in both cost and performance. We only run the hooks that are relevant to each run, and each hook creates a minimal amount of queries possible. **Metadata models** -For `dbt 1.4.0` and above, we maintain a metadata cache. -This means each of these models are only updated with changes in your project (new model, change in config, etc.). -For this reason, on the first time you execute Elementary the initial update might take a while, but the following updates should be quick. -The performance impact of this update depends on the frequency and volume of changes to your dbt project. -If you are using `dbt 1.3.0` or lower, these models would be fully updated on each run. -The performance impact depends on the size of your dbt project. -You can also disable the metadata autoupload, and run the same update using the command `dbt run --select elementary.edr.dbt_artifacts`. +**For `dbt 1.4.0` and above**, we maintain a metadata cache. This means each of these models are only updated with changes in your project (new model, change in config, etc.). +The first time you execute Elementary the initial update might take a while, but the following updates should be quick. + +**If you are using `dbt 1.3.0`** or lower, these models would be fully updated on each run. +The performance impact depends on the size of your dbt project. **Result models** + The size of the queries depends on the amount of models/tests executed in the run. The time the run results adds to the invocation shouldn't be significant. From e479874b60abe91e19dad842589a1901c7e768d0 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:24:01 +0300 Subject: [PATCH 003/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index e2633b002..093a4d689 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -30,15 +30,15 @@ The `on-run-end` hook runs SQL queries with the run results and invocation detai We give a lot of thought and effort to making Elementary efficient in both cost and performance. We only run the hooks that are relevant to each run, and each hook creates a minimal amount of queries possible. -**Metadata models** +#### Metadata models **For `dbt 1.4.0` and above**, we maintain a metadata cache. This means each of these models are only updated with changes in your project (new model, change in config, etc.). The first time you execute Elementary the initial update might take a while, but the following updates should be quick. -**If you are using `dbt 1.3.0`** or lower, these models would be fully updated on each run. +**For `dbt 1.3.0` and lower**, these models would be fully updated on each run. The performance impact depends on the size of your dbt project. -**Result models** +#### Result models The size of the queries depends on the amount of models/tests executed in the run. The time the run results adds to the invocation shouldn't be significant. @@ -48,7 +48,8 @@ The time the run results adds to the invocation shouldn't be significant. Yes, but note that this may cause missing results and/or outdated metadata in Elementary report and alerts. -**Disable metadata models updates** +#### Disable metadata models updates + Configure the following var: ```yaml dbt_project.yml vars: @@ -56,7 +57,8 @@ vars: ``` If you disable the artifacts autoupload, we recommend your run `dbt run --select elementary.edr.dbt_artifacts` every time you deploy changes to your project. -**Disable result models updates** +#### Disable result models updates + Configure the following vars (you can also disable with conditions): ```yaml dbt_project.yml vars: From c0067886f0651a9e8b43c0f152938c1dcbb02fd5 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:28:49 +0300 Subject: [PATCH 004/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index 093a4d689..e984c3c8c 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -11,8 +11,8 @@ On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `g There are 2 types of models that Elementary updates : -**1. Metadata models** - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. -**2. Result models** - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. +1. **Metadata models** - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. +2. **Result models** - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. #### Updates of metadata models From aec5752cdf98eb341ea90872287dfcc2cc29517d Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:36:33 +0300 Subject: [PATCH 005/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index e984c3c8c..01f243a4b 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -5,6 +5,15 @@ sidebarTitle: "on-run-end hooks" Elementary dbt package uses `on-run-end` [hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) to log results and metadata to tables in the Elementary schema. +## Why Elementary uses `on-run-end` hooks? + +As a data observability solution, the completeness and freshness of the results Elementary collects is critical. + +By leveraging `on-run-end` hooks, we add a built-in collection of the latest results and metadata as part of your runs. +This means the results you see in Elementary report and the alerts you receive are full, up-to-date and accurate. + +We stringly recommend not to disable the hooks for environments you want to minitor using Elementary. + ## What happens on the `on-run-end` hooks? On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `graph` objects, and runs SQL queries to load this data to the Elementary models. From 59a75198a5b1cb2fa7010bf2fc8061d21fc73ab2 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:43:40 +0300 Subject: [PATCH 006/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index 01f243a4b..9deb9223d 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -7,12 +7,13 @@ Elementary dbt package uses `on-run-end` [hooks](https://docs.getdbt.com/referen ## Why Elementary uses `on-run-end` hooks? -As a data observability solution, the completeness and freshness of the results Elementary collects is critical. +Elementary report and alerts are generated from the data in the Elementary schema. +The solution relies on the Elementary schema being up-to-date and complete to be able to provide reliable and accurate observability. By leveraging `on-run-end` hooks, we add a built-in collection of the latest results and metadata as part of your runs. This means the results you see in Elementary report and the alerts you receive are full, up-to-date and accurate. -We stringly recommend not to disable the hooks for environments you want to minitor using Elementary. +We strongly recommend not to disable the hooks for environments you want to monitor using Elementary. ## What happens on the `on-run-end` hooks? From f33435cf35a84a5c08488158346e4a4e799b90d8 Mon Sep 17 00:00:00 2001 From: Elon Gliksberg Date: Wed, 17 May 2023 18:14:16 +0300 Subject: [PATCH 007/194] Fixed typos. --- docs/cloud/manage-team.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index d4d5162c7..6c999433a 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -7,7 +7,7 @@ sidebarTitle: "Team settings" After you signup, you could invite team members to join you! 🎉 -On the top left buttun select `Account settings`, and you can invite users on the `Team` screen. +On the top left button select `Account settings`, and you can invite users on the `Team` screen. Users you invite will recieve an Email saying you invited them, and will need to accept and activate their account. @@ -18,6 +18,6 @@ Users you invite will recieve an Email saying you invited them, and will need to ### Remove users -On the top left buttun select `Account settings`, and select the `Team` screen. +On the top left button select `Account settings`, and select the `Team` screen. You can remove users by clicking selecting this option under the user options. From f573592a87ac900e2badfbb4e1a56b2b441eb233 Mon Sep 17 00:00:00 2001 From: Elon Gliksberg Date: Thu, 18 May 2023 15:38:37 +0300 Subject: [PATCH 008/194] Fixed incorrect test argument name. --- .../anomaly-detection-configuration/anomaly-sensitivity.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx index be9d46a9c..12a635746 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx @@ -34,7 +34,7 @@ models: - name: this_is_a_model tests: - elementary.volume_anomalies: - anomaly_sensitivity: 3 + sensitivity: 3 ``` From 792adb223e36e879520c431ddb4e0ad15c77d177 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:07:51 +0300 Subject: [PATCH 009/194] Update signup.mdx --- docs/cloud/onboarding/signup.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index 0b3f00145..119fb45ab 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,6 +1,6 @@ --- title: "Quickstart: Signup and connect" -sidebarTitle: "Signup and connect" +sidebarTitle: "Signup and login" --- ### Signup to Elementary cloud @@ -28,4 +28,4 @@ After you connect a data warehouse with an Elementary schema in it, you can star ### What's next? -[Connect your Elementary schema to Elementary cloud](/cloud/saas-onboarding/connect-data-warehouse). +[Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). From 86783684692646bdb65d57e47711f3a25a520ff4 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:08:10 +0300 Subject: [PATCH 010/194] Update connect-data-warehouse.mdx --- docs/cloud/onboarding/connect-data-warehouse.mdx | 8 -------- 1 file changed, 8 deletions(-) diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 9783eae96..723e3cb09 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -11,14 +11,6 @@ Here are the steps needed to enable the connection: Elementary needs authentication details, permissions to read the Elementary schema (and not the rest of your data), and network access enabled by adding the cloud IPs to your data warehouse allowlist. -Here are the guides on how to configure these on each supported data warehouse: - -- Bigquery -- Snowflake -- Redshift -- Databricks -- Postgres - Elementary IP for allowlist: `3.126.156.226` ### Create a `profiles.yml` file From 0543c8acd2d13bf1b7f38df8677ac6b02da3a0d2 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:29:18 +0300 Subject: [PATCH 011/194] Update mint.json --- docs/mint.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/mint.json b/docs/mint.json index d859ce8b9..42b2f70c4 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -22,7 +22,7 @@ }, "topbarCtaButton": { "name": "Try Elementary Cloud", - "url": "https://www.elementary-data.com/cloud-beta" + "url": "https://t2taztilhde.typeform.com/to/oevDtdJn?utm_source=docs&utm_medium=cta&utm_content=v1" }, "topbarLinks": [ { From 92239d6690f234bd464d5450a817ac8aec99c830 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:30:57 +0300 Subject: [PATCH 012/194] Update introduction.mdx --- docs/cloud/introduction.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index ccac2e097..ca36a55ef 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -6,7 +6,7 @@ title: "Introduction" Elementary Cloud is the easiest and fastest way to get the most out of Elementary. - + _The service is currently in private beta_ @@ -39,7 +39,7 @@ alt="Elementary Managed high level flow" 1. [Install the Elementary dbt package in your project](/cloud/onboarding/quickstart-dbt-package). 2. [Signup and setup integrations](/cloud/onboarding/signup). - + _The service is currently in private beta_ From db42d9eabe8e496f047b1a85675e4a2447aee6d5 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:31:41 +0300 Subject: [PATCH 013/194] Update elementary-in-production.mdx --- docs/deployment-and-configuration/elementary-in-production.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/deployment-and-configuration/elementary-in-production.mdx b/docs/deployment-and-configuration/elementary-in-production.mdx index bca6a2fc0..7d9f6dd32 100644 --- a/docs/deployment-and-configuration/elementary-in-production.mdx +++ b/docs/deployment-and-configuration/elementary-in-production.mdx @@ -2,8 +2,7 @@ title: "Elementary in production" --- - - _The service is currently in private beta_ + Running Elementary in production means to include the dbt package in your production dbt project, From 116ca36e1f9f313ff298f7cf713c29623a18631d Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sun, 21 May 2023 16:36:34 +0300 Subject: [PATCH 014/194] on-run-end hooks --- docs/cloud/introduction.mdx | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index ca36a55ef..83acc9c5c 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -6,8 +6,7 @@ title: "Introduction" Elementary Cloud is the easiest and fastest way to get the most out of Elementary. - - _The service is currently in private beta_ + ## @@ -39,8 +38,7 @@ alt="Elementary Managed high level flow" 1. [Install the Elementary dbt package in your project](/cloud/onboarding/quickstart-dbt-package). 2. [Signup and setup integrations](/cloud/onboarding/signup). - - _The service is currently in private beta_ + ## Security and privacy From cc579fa874a01f95091afd8e908da1baa10321f7 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 24 May 2023 13:10:24 +0300 Subject: [PATCH 015/194] new typeform --- docs/cloud/introduction.mdx | 4 ++-- .../deployment-and-configuration/elementary-in-production.mdx | 2 +- docs/mint.json | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index 83acc9c5c..ce75c603a 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -6,7 +6,7 @@ title: "Introduction" Elementary Cloud is the easiest and fastest way to get the most out of Elementary. - + ## @@ -38,7 +38,7 @@ alt="Elementary Managed high level flow" 1. [Install the Elementary dbt package in your project](/cloud/onboarding/quickstart-dbt-package). 2. [Signup and setup integrations](/cloud/onboarding/signup). - + ## Security and privacy diff --git a/docs/deployment-and-configuration/elementary-in-production.mdx b/docs/deployment-and-configuration/elementary-in-production.mdx index 7d9f6dd32..22ae484eb 100644 --- a/docs/deployment-and-configuration/elementary-in-production.mdx +++ b/docs/deployment-and-configuration/elementary-in-production.mdx @@ -2,7 +2,7 @@ title: "Elementary in production" --- - + Running Elementary in production means to include the dbt package in your production dbt project, diff --git a/docs/mint.json b/docs/mint.json index 42b2f70c4..7f1a38ff8 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -22,7 +22,7 @@ }, "topbarCtaButton": { "name": "Try Elementary Cloud", - "url": "https://t2taztilhde.typeform.com/to/oevDtdJn?utm_source=docs&utm_medium=cta&utm_content=v1" + "url": "https://t2taztilhde.typeform.com/to/ObfMbxB5?utm_source=docs&utm_medium=cta&utm_content=v1" }, "topbarLinks": [ { From 4ce47fc06dbb2fadbca44ff484da74e0578729d0 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 11:28:29 +0300 Subject: [PATCH 016/194] tests docs changes --- docs/guides/add-elementary-tests.mdx | 386 ++---------------- ...umn_anomalies.mdx => column-anomalies.mdx} | 0 .../all-columns-anomalies.mdx | 75 ++++ .../column-anomalies.mdx | 111 +++++ .../dimension-anomalies.mdx | 88 ++++ .../event-freshness-anomalies.mdx | 75 ++++ .../freshness-anomalies.mdx | 69 ++++ .../volume-anomalies.mdx | 81 ++++ .../guides/elementary-tests-configuration.mdx | 4 +- docs/guides/how-anomaly-detection-works.mdx | 2 +- docs/mint.json | 13 +- 11 files changed, 547 insertions(+), 357 deletions(-) rename docs/guides/anomaly-detection-configuration/{column_anomalies.mdx => column-anomalies.mdx} (100%) create mode 100644 docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/column-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/dimension-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/freshness-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/volume-anomalies.mdx diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 1852e328f..0bfaf105f 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,381 +18,61 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests -#### Volume (row count) anomalies - -`elementary.volume_anomalies` - -Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). - -Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` -field), and then we compute the row count per bucket for the last [`days_back`](/guides/anomaly-detection-configuration/days-back) days (by default 14). - -The test then compares the row count of each bucket buckets within the detection period (last 2 days by default, controlled by the -`backfill_days` var), and compares it to the row count of the previous time buckets. -If there were any anomalies during the detection period, the test will fail. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - tests: - - elementary.volume_anomalies: - timestamp_column: < timestamp column > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: "loaded_at" - tests: - - elementary.volume_anomalies: - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn - - - name: users - # if no timestamp is configured, elementary will monitor without time filtering - tests: - - elementary.volume_anomalies: - tags: ["elementary"] + elementary.volume_anomalies ``` + Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). + - - -#### Freshness anomalies - -`elementary.freshness_anomalies` - -Monitors the freshness of your table over time, as the expected time between data updates. - -Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` -field), and then we compute the maximum freshness value per bucket for the last `days_back` days (by default 14). - -The test then compares the freshness of each bucket within the detection period (last 2 days by default, controlled by the -`backfill_days` var), and compares it to the freshness of the previous time buckets. -If there were any anomalies during the detection period, the test will fail. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 -models: - - name: < model name > - tests: - - elementary.freshness_anomalies: - timestamp_column: < timestamp column > # Mandatory - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - tests: - - elementary.freshness_anomalies: - timestamp_column: "updated_at" - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn + elementary.freshness_anomalies ``` + Monitors the freshness of your table over time, as the expected time between data updates. + Requires a [`timestamp_column`](/guides/anomaly-detection-configuration/timestamp-column) configuration. + - - -#### Event freshness anomalies - -`elementary.event_freshness_anomalies` - -Monitors the freshness of event data over time, as the expected time it takes each event to load - -that is, the time between the when the event actually occurs (the event timestamp), and when it is loaded to the -database (the update timestamp). - -This test compliments the `freshness_anomalies` test and is primarily intended for data that is updated in a -continuous / streaming fashion. - -The test can work in a couple of modes: - -- If only an `event_timestamp_column` is supplied, the test measures over time the difference between the current - timestamp ("now") and the most recent event timestamp. -- If both an `event_timestamp_column` and an `update_timestamp_column` are provided, the test will measure over time - the difference between these two columns. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - tests: - - elementary.event_freshness_anomalies: - event_timestamp_column: < timestamp column > # Mandatory - update_timestamp_column: < timestamp column > # Optional - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - tests: - - elementary.event_freshness_anomalies: - event_timestamp_column: "occurred_at" - update_timestamp_column: "updated_at" - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn + elementary.event_freshness_anomalies ``` + Monitors the freshness of event data over time, as the expected time it takes each event to load - + that is, the time between when the event actually occurs (the `event timestamp`), and when it is loaded to the + database (the `update timestamp`). The configuration `event_timestamp_column` is required, and `update_timestamp_column` is optional. + - - -#### Dimension anomalies - -`elementary.dimension_anomalies` - -This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in -the distribution. -It is best to configure it on low-cardinality fields. -The test counts rows grouped by given columns/expressions, and can be configured using the `dimensions` -and `where_expression` keys. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - config: - elementary: - timestamp_column: < timestamp column > - tests: - - elementary.dimension_anomalies: - dimensions: < columns or sql expressions of columns > - # optional - configure a where a expression to accurate the dimension monitoring - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: "loaded_at" - tests: - - elementary.dimension_anomalies: - dimensions: - - event_type - - country_name - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: hour - count: 4 - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn - - - name: users - # if no timestamp is configured, elementary will monitor without time filtering - tests: - - elementary.dimension_anomalies: - dimensions: - - event_type - tags: ["elementary"] + elementary.dimension_anomalies ``` + This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in the distribution. + It is best to configure it on low-cardinality fields. + The test counts rows grouped by given `dimensions` (columns/expressions). + - - -#### All columns anomalies - -`elementary.all_columns_anomalies` - -Executes column level monitors and anomaly detection on all the columns of the table. Specific monitors -are [detailed here](/guides/data-anomaly-detection#tests-and-monitors-types) and can be configured using -the `all_columns_anomalies` key. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - config: - elementary: - timestamp_column: < timestamp column > - tests: - - elementary.all_columns_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: "loaded_at" - tests: - - elementary.all_columns_anomalies: - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - tags: ["elementary"] - # optional - change global sensitivity - sensitivity: 3.5 + elementary.all_columns_anomalies ``` + Executes column level monitors and anomaly detection on all the columns of the table. + Specific monitors are [detailed here](/guides/anomaly-detection-configuration/column-anomalies). + You can use `column_anomalies` param to override the default monitors, and `exclude_prefix` / `exclude_regexp` to exclude columns from the test. + - - - ### Column tests -#### Column anomalies - -`elementary.column_anomalies` - -Executes column level monitors and anomaly detection. Specific monitors -are [detailed here](/guides/data-anomaly-detection#tests-and-monitors-types) and can be configured using -the `column_anomalies` key. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - -```yml Models -version: 2 - -models: - - name: < model name > - config: - elementary: - timestamp_column: < timestamp column > - columns: - - name: < column name > - tests: - - elementary.column_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > - - - name: < model name > - ## if no timestamp is configured, elementary will monitor without time filtering - columns: - - name: < column name > - tests: - - elementary.column_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: 'loaded_at' - columns: - - name: user_name - tests: - - elementary.column_anomalies: - column_anomalies: - - missing_count - - min_length - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - tags: ['elementary'] - - - name: users - ## if no timestamp is configured, elementary will monitor without time filtering - tests: - elementary.table_anomalies - tags: ['elementary'] - columns: - - name: user_id - tests: - - elementary.column_anomalies: - tags: ['elementary'] - timestamp_column: 'updated_at' - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: < time period > - count: < number of periods > - - name: user_name - tests: - - elementary.column_anomalies: - column_anomalies: - - missing_count - - min_length - tags: ['elementary'] + elementary.column_anomalies ``` + Executes column level monitors and anomaly detection on the column. + Specific monitors are [detailed here](/guides/anomaly-detection-configuration/column-anomalies) and can be configured using + the `columns_anomalies` configuration. + - - -#### Column anomalies - - #### Adding tests examples: diff --git a/docs/guides/anomaly-detection-configuration/column_anomalies.mdx b/docs/guides/anomaly-detection-configuration/column-anomalies.mdx similarity index 100% rename from docs/guides/anomaly-detection-configuration/column_anomalies.mdx rename to docs/guides/anomaly-detection-configuration/column-anomalies.mdx diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx new file mode 100644 index 000000000..b3b9a182b --- /dev/null +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -0,0 +1,75 @@ +--- +title: "all_columns_anomalies" +sidebarTitle: "all_columns_anomalies" +--- + +`elementary.all_columns_anomalies` + +Executes column level monitors and anomaly detection on all the columns of the table. +Specific monitors are detailed in the table below and can be configured using the `columns_anomalies` configuration. + +The test checks the data type of each column and only executes monitors that are relevant to it. +You can use `column_anomalies` param to override the default monitors, and `exclude_prefix` / `exclude_regexp` to exclude columns from the test. + + + + +### Test configuration + +No mandatory configuration, however it is highly recommended to configure a `timestamp_column`. + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- column_anomalies: column monitors list>
+      -- exclude_prefix: string>
+      -- exclude_regexp: regex>
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models +models: + - name: < model name > + config: + elementary: + timestamp_column: < timestamp column > + tests: + - elementary.all_columns_anomalies: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + tests: + - elementary.all_columns_anomalies: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + tags: ["elementary"] + # optional - change global sensitivity + sensitivity: 3.5 +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx new file mode 100644 index 000000000..7572b6831 --- /dev/null +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -0,0 +1,111 @@ +--- +title: "column_anomalies" +sidebarTitle: "column_anomalies" +--- + +`elementary.column_anomalies` + +Executes column level monitors and anomaly detection on the column. +Specific monitors are detailed in the table below and can be configured using the `columns_anomalies` configuration. + +The test checks the data type of the column and only executes monitors that are relevant to it. + + + + +### Test configuration + +No mandatory configuration, however it is highly recommended to configure a `timestamp_column`. + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- column_anomalies: column monitors list>
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models + +models: + - name: < model name > + config: + elementary: + timestamp_column: < timestamp column > + columns: + - name: < column name > + tests: + - elementary.column_anomalies: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > + + - name: < model name > + ## if no timestamp is configured, elementary will monitor without time filtering + columns: + - name: < column name > + tests: + - elementary.column_anomalies: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > +``` + +```yml Models example + +models: + - name: login_events + config: + elementary: + timestamp_column: 'loaded_at' + columns: + - name: user_name + tests: + - elementary.column_anomalies: + column_anomalies: + - missing_count + - min_length + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + tags: ['elementary'] + + - name: users + ## if no timestamp is configured, elementary will monitor without time filtering + tests: + elementary.table_anomalies + tags: ['elementary'] + columns: + - name: user_id + tests: + - elementary.column_anomalies: + tags: ['elementary'] + timestamp_column: 'updated_at' + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: < time period > + count: < number of periods > + - name: user_name + tests: + - elementary.column_anomalies: + column_anomalies: + - missing_count + - min_length + tags: ['elementary'] +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx new file mode 100644 index 000000000..7155ae525 --- /dev/null +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -0,0 +1,88 @@ +--- +title: "dimension_anomalies" +sidebarTitle: "dimension_anomalies" +--- + +`elementary.dimension_anomalies` + +This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in the distribution. +It is best to configure it on low-cardinality fields. +The test counts rows grouped by given `dimensions` (columns/expressions). + +If `timestamp_column` is configured, the distribution is collected per `time_bucket`. If not, it counts the total rows per dimension. + + +### Test configuration + +_Required configuration: `dimensions`_ + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- dimensions: sql expression>
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + + +```yml Models + +models: + - name: < model name > + config: + elementary: + timestamp_column: < timestamp column > + tests: + - elementary.dimension_anomalies: + dimensions: < columns or sql expressions of columns > + # optional - configure a where a expression to accurate the dimension monitoring + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example + +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + tests: + - elementary.dimension_anomalies: + dimensions: + - event_type + - country_name + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: hour + count: 4 + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn + + - name: users + # if no timestamp is configured, elementary will monitor without time filtering + tests: + - elementary.dimension_anomalies: + dimensions: + - event_type + tags: ["elementary"] +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx new file mode 100644 index 000000000..c76b159de --- /dev/null +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -0,0 +1,75 @@ +--- +title: "event_freshness_anomalies" +sidebarTitle: "event_freshness_anomalies" +--- + +`elementary.event_freshness_anomalies` + +Monitors the freshness of event data over time, as the expected time it takes each event to load - +that is, the time between when the event actually occurs (the `event timestamp`), and when it is loaded to the +database (the `update timestamp`). + +This test compliments the `freshness_anomalies` test and is primarily intended for data that is updated in a continuous / streaming fashion. + +The test can work in a couple of modes: + +- If only an `event_timestamp_column` is supplied, the test measures over time the difference between the current + timestamp ("now") and the most recent event timestamp. +- If both an `event_timestamp_column` and an `update_timestamp_column` are provided, the test will measure over time + the difference between these two columns. + +### Test configuration + +_Required configuration: `event_timestamp_column`_ +_Default configuration: `anomaly_direction: spike` to alert only on delays._ + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- event_timestamp_column: column name>
+      -- update_timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models + +models: + - name: < model name > + tests: + - elementary.event_freshness_anomalies: + event_timestamp_column: < timestamp column > # Mandatory + update_timestamp_column: < timestamp column > # Optional + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example + +models: + - name: login_events + tests: + - elementary.event_freshness_anomalies: + event_timestamp_column: "occurred_at" + update_timestamp_column: "updated_at" + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx new file mode 100644 index 000000000..47fa9a787 --- /dev/null +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -0,0 +1,69 @@ +--- +title: "freshness_anomalies" +sidebarTitle: "freshness_anomalies" +--- + +`elementary.freshness_anomalies` + +Monitors the freshness of your table over time, as the expected time between data updates. + +Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` field), +and then we compute the maximum freshness value per bucket for the last `days_back` days (by default 14). + +The test then compares the freshness of each bucket within the detection period (last 2 days by default, controlled by the +`backfill_days` var), and compares it to the freshness of the previous time buckets. +If there were any anomalies during the detection period, the test will fail. + + +### Test configuration + +_Required configuration: `timestamp_column`_ +_Default configuration: `anomaly_direction: spike` to alert only on delays._ + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models + +models: + - name: < model name > + tests: + - elementary.freshness_anomalies: + timestamp_column: < timestamp column > # Mandatory + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example + +models: + - name: login_events + tests: + - elementary.freshness_anomalies: + timestamp_column: "updated_at" + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx new file mode 100644 index 000000000..5921e41af --- /dev/null +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -0,0 +1,81 @@ +--- +title: "volume_anomalies" +sidebarTitle: "volume_anomalies" +--- + +`elementary.volume_anomalies` + +Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). + +Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` field), +and then we compute the row count per bucket for the last [`days_back`](/guides/anomaly-detection-configuration/days-back) days (by default 14). + +The test then compares the row count of each bucket within the detection period (last 2 days by default, configured as [`backfill_days`](/guides/anomaly-detection-configuration/backfill-days)), +and compares it to the row count of the previous time buckets. + +**The test will only run on completed time buckets**, so if you run it with daily buckets in the middle of today, the test would only count yesterday as a complete bucket. +If there were any anomalies during the detection period, the test will fail. + + +### Test configuration + +No mandatory configuration, however it is highly recommended to configure a `timestamp_column`. + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- anomaly_direction: [both | spike | drop]>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models +models: + - name: < model name > + tests: + - elementary.volume_anomalies: + timestamp_column: < timestamp column > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + tests: + - elementary.volume_anomalies: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn + + - name: users + # if no timestamp is configured, elementary will monitor without time filtering + tests: + - elementary.volume_anomalies: + tags: ["elementary"] +``` + + \ No newline at end of file diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index e9a3401c7..236567676 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -35,7 +35,7 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt -- seasonality: day_of_week> all_columns_anomalies test: - -- column_anomalies: column monitors list> + -- column_anomalies: column monitors list> -- exclude_prefix: string> -- exclude_regexp: regex> @@ -70,7 +70,7 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt -- seasonality: day_of_week> -- anomaly_sensitivity: int> -- anomaly_direction: [both | spike | drop]> - -- column_anomalies: column monitors list> + -- column_anomalies: column monitors list> -- exclude_prefix: string> -- exclude_regexp: regex> -- dimensions: sql expression> diff --git a/docs/guides/how-anomaly-detection-works.mdx b/docs/guides/how-anomaly-detection-works.mdx index 7d76c18ce..eff6bc2be 100644 --- a/docs/guides/how-anomaly-detection-works.mdx +++ b/docs/guides/how-anomaly-detection-works.mdx @@ -66,7 +66,7 @@ To detect data issues with high accuracy, it is important to leverage the config Configuration params related directly to the test's core concepts: **Data monitors** -- [column_anomalies](/guides/anomaly-detection-configuration/column_anomalies) +- [column_anomalies](/guides/anomaly-detection-configuration/column-anomalies) **Expected range** - [anomaly_sensitivity](/guides/anomaly-detection-configuration/anomaly-sensitivity) diff --git a/docs/mint.json b/docs/mint.json index 7f1a38ff8..40fc4699a 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -111,6 +111,17 @@ "pages": [ "guides/how-anomaly-detection-works", "guides/data-anomaly-detection", + { + "group": "Anomaly detection tests", + "pages": [ + "guides/anomaly-detection-tests/volume-anomalies", + "guides/anomaly-detection-tests/freshness-anomalies", + "guides/anomaly-detection-tests/event-freshness-anomalies", + "guides/anomaly-detection-tests/dimension-anomalies", + "guides/anomaly-detection-tests/all-columns-anomalies", + "guides/anomaly-detection-tests/column-anomalies" + ] + }, "guides/elementary-tests-configuration", { "group": "Tests params", @@ -123,7 +134,7 @@ "guides/anomaly-detection-configuration/backfill-days", "guides/anomaly-detection-configuration/time-bucket", "guides/anomaly-detection-configuration/seasonality", - "guides/anomaly-detection-configuration/column_anomalies", + "guides/anomaly-detection-configuration/column-anomalies", "guides/anomaly-detection-configuration/exclude_prefix", "guides/anomaly-detection-configuration/exclude_regexp", "guides/anomaly-detection-configuration/dimensions", From b9cd4d0fccb796660f36d4b24428c2fe94672546 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 11:59:29 +0300 Subject: [PATCH 017/194] test config at all levels --- ...uestion-tests-configuration-priorities.mdx | 22 ++++---- docs/guides/add-elementary-tests.mdx | 2 +- .../guides/elementary-tests-configuration.mdx | 53 ++++++++++++++----- 3 files changed, 52 insertions(+), 25 deletions(-) diff --git a/docs/_snippets/faq/question-tests-configuration-priorities.mdx b/docs/_snippets/faq/question-tests-configuration-priorities.mdx index 90d9d5b7e..4c81a2315 100644 --- a/docs/_snippets/faq/question-tests-configuration-priorities.mdx +++ b/docs/_snippets/faq/question-tests-configuration-priorities.mdx @@ -1,20 +1,22 @@ -The configuration of Elementary is dbt native and follows the same priorities of `dbt configuration`. -The more granular and specific configuration overrides the less granular one. +The configuration of Elementary is dbt native and follows the same priorities and inheritance. +The more granular and specific configuration overrides the less granular one. Elementary searches and prioritizes configuration in the following order: -For models: +**For models tests:** 1. Test arguments. -2. Model configuration. -3. Global vars in `dbt_project.yml`. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Model configuration. +4. Path configuration under `models` key in `dbt_project.yml`. +5. Global vars in `dbt_project.yml`. -For sources: +**For sources tests:** 1. Test arguments. -2. Table configuration. -3. Source configuration. -4. Global vars in `dbt_project.yml`. - +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Table configuration. +4. Source configuration. +5. Global vars in `dbt_project.yml`. \ No newline at end of file diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 0bfaf105f..4bd341013 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -40,7 +40,7 @@ The tests are configured and executed like any other tests in your project. ``` Monitors the freshness of event data over time, as the expected time it takes each event to load - that is, the time between when the event actually occurs (the `event timestamp`), and when it is loaded to the - database (the `update timestamp`). The configuration `event_timestamp_column` is required, and `update_timestamp_column` is optional. + database (the `update timestamp`). Configuring `event_timestamp_column` is required, and `update_timestamp_column` is optional.
diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index 236567676..39538ef2f 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -5,7 +5,24 @@ sidebarTitle: "Tests configuration" The anomaly detection tests configuration is defined in `.yml` files in your dbt project, just like in native dbt tests. - +The configuration of Elementary is dbt native and follows the same priorities and inheritance. +The more granular and specific configuration overrides the less granular one. + +Elementary searches and prioritizes configuration in the following order: + +**For models tests:** +1. Test arguments. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Model configuration. +4. Path configuration under `models` key in `dbt_project.yml`. +5. Global vars in `dbt_project.yml`. + +**For sources tests:** +1. Test arguments. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Table configuration. +4. Source configuration. +5. Global vars in `dbt_project.yml`. --- @@ -17,7 +34,7 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt - +
      
       All anomaly detection tests:
@@ -50,32 +67,40 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt
 
   
 
-  
+  
     
      
-      dbt_project.yml vars:
+      Expected range:
        -- anomaly_sensitivity: int>
-       -- days_back: int>
+       -- anomaly_direction: [both | spike | drop]>
+
+      Detection period and detection set:
        -- backfill_days: int>
+       -- seasonality: day_of_week>
 
-      Model / source level:
-       -- timestamp_column: column name>
+      Training period and training set:
+       -- days_back: int>
+       -- seasonality: day_of_week>
 
-      Test level:
+      Time buckets:
        -- timestamp_column: column name>
-       -- where_expression: sql expression>
        -- time_bucket:>
                 period: [hour | day | week | month]
                 count: int
-       -- seasonality: day_of_week>
-       -- anomaly_sensitivity: int>
-       -- anomaly_direction: [both | spike | drop]>
-       -- column_anomalies: column monitors list>
+
+      Monitored data set:
+       -- where_expression: sql expression>
        -- exclude_prefix: string>
        -- exclude_regexp: regex>
        -- dimensions: sql expression>
+      
+      Data monitors:
+       -- column_anomalies: column monitors list>
+
+      Other:
        -- event_timestamp_column: column name>
-       -- update_timestamp_column: column name>
+       -- update_timestamp_column: column name> 
+
      
     
From 05686a9645faffe7f8cd25362ff44d3348257fb4 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 12:45:35 +0300 Subject: [PATCH 018/194] test config at all levels --- .../anomaly-direction.mdx | 19 +++++++++--- .../anomaly-sensitivity.mdx | 31 ++++++++++++++----- .../backfill-days.mdx | 21 ++++++++----- .../column-anomalies.mdx | 2 +- .../days-back.mdx | 21 +++++++++++-- .../dimensions.mdx | 2 +- .../event_timestamp_column.mdx | 2 +- .../exclude_prefix.mdx | 2 +- .../exclude_regexp.mdx | 2 +- .../seasonality.mdx | 16 ++++++++-- .../time-bucket.mdx | 24 +++++++++++--- .../timestamp-column.mdx | 26 +++++++++------- .../update_timestamp_column.mdx | 2 +- .../where-expression.mdx | 17 ++++++++-- 14 files changed, 140 insertions(+), 47 deletions(-) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx index c4b4abf71..4e1e22698 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx @@ -14,7 +14,6 @@ The anomaly_direction configuration is used to configure the direction of the ex - _Default: `both`_ - _Supported values: `both`, `spike`, `drop`_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model - tests: - + tests: - elementary.volume_anomalies: anomaly_direction: drop @@ -42,4 +40,17 @@ models: ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + anomaly_direction: drop +``` + +```yml dbt_project +vars: + anomaly_direction: both +``` + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx index 12a635746..7e21fa6e2 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx @@ -13,7 +13,6 @@ Larger values will have the opposite effect and will reduce the number of anomal - _Default: 3_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: var, test config_ -```yaml dbt_project.yml -vars: - anomaly_sensitivity: 3 +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + anomaly_sensitivity: 2.5 + + - elementary.all_columns_anomalies: + column_anomalies: + - null_count + - missing_count + - zero_count + anomaly_sensitivity: 4 + ``` -```yaml test +```yml model models: - name: this_is_a_model - tests: - - elementary.volume_anomalies: - sensitivity: 3 + config: + elementary: + anomaly_sensitivity: 3.5 +``` + +```yml dbt_project +vars: + anomaly_sensitivity: 3 ``` diff --git a/docs/guides/anomaly-detection-configuration/backfill-days.mdx b/docs/guides/anomaly-detection-configuration/backfill-days.mdx index 29636d74c..37a831839 100644 --- a/docs/guides/anomaly-detection-configuration/backfill-days.mdx +++ b/docs/guides/anomaly-detection-configuration/backfill-days.mdx @@ -15,7 +15,6 @@ This configuration should be changed according to your data delays. - _Default: 2_ - _Relevant tests: Anomaly detection tests with `timestamp_column`_ -- _Configuration level: test, var_ -```yaml dbt_project.yml -vars: - backfill_days: 2 -``` - -```yaml test +```yml test models: - name: this_is_a_model tests: @@ -40,6 +34,19 @@ models: backfill_days: 7 ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + backfill_days: 4 +``` + +```yml dbt_project.yml +vars: + backfill_days: 2 +``` + diff --git a/docs/guides/anomaly-detection-configuration/column-anomalies.mdx b/docs/guides/anomaly-detection-configuration/column-anomalies.mdx index 122230c19..2ab2d5435 100644 --- a/docs/guides/anomaly-detection-configuration/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-configuration/column-anomalies.mdx @@ -13,7 +13,7 @@ Select which monitors to activate as part of the test. -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/days-back.mdx b/docs/guides/anomaly-detection-configuration/days-back.mdx index c58e4dba0..f174a6f79 100644 --- a/docs/guides/anomaly-detection-configuration/days-back.mdx +++ b/docs/guides/anomaly-detection-configuration/days-back.mdx @@ -11,7 +11,6 @@ This timeframe includes the training period and detection period. - _Default: 14_ - _Relevant tests: Anomaly detection tests with `timestamp_column`_ -- _Configuration level: var_ -```yaml dbt_project.yml +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + days_back: 30 +``` + +```yml model +models: + - name: this_is_a_model + config: + elementary: + days_back: 60 +``` + +```yml dbt_project.yml vars: - days_back: 14 + days_back: 45 ``` diff --git a/docs/guides/anomaly-detection-configuration/dimensions.mdx b/docs/guides/anomaly-detection-configuration/dimensions.mdx index 0aadf066b..1aae222ef 100644 --- a/docs/guides/anomaly-detection-configuration/dimensions.mdx +++ b/docs/guides/anomaly-detection-configuration/dimensions.mdx @@ -18,7 +18,7 @@ It is best to configure it on low-cardinality fields. -```yaml test +```yml test models: - name: model_name config: diff --git a/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx b/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx index cfbb6701d..35acc4785 100644 --- a/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx +++ b/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx @@ -19,7 +19,7 @@ The test can work in a couple of modes: -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx b/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx index 49432676f..eea95815f 100644 --- a/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx +++ b/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx @@ -13,7 +13,7 @@ Param for the `all_columns_anomalies` test only, which enables to exclude a colu -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx b/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx index 1ebf3a1c4..f41b5616f 100644 --- a/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx +++ b/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx @@ -13,7 +13,7 @@ Param for the `all_columns_anomalies` test only, which enables to exclude a colu -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/seasonality.mdx b/docs/guides/anomaly-detection-configuration/seasonality.mdx index eae00084a..0c071d1aa 100644 --- a/docs/guides/anomaly-detection-configuration/seasonality.mdx +++ b/docs/guides/anomaly-detection-configuration/seasonality.mdx @@ -20,7 +20,6 @@ The expected range for Monday will be based on a training set of previous Monday - _Default: none_ - _Supported values: `day_of_week`_ - _Relevant tests: Anomaly detection tests with `timestamp_column` and 1 day `time_bucket`_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model tests: @@ -39,6 +38,19 @@ models: seasonality: day_of_week ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + seasonality: day_of_week +``` + +```yml dbt_project.yml +vars: + seasonality: day_of_week +``` + diff --git a/docs/guides/anomaly-detection-configuration/time-bucket.mdx b/docs/guides/anomaly-detection-configuration/time-bucket.mdx index 19478817d..a5e748e49 100644 --- a/docs/guides/anomaly-detection-configuration/time-bucket.mdx +++ b/docs/guides/anomaly-detection-configuration/time-bucket.mdx @@ -19,7 +19,6 @@ For example, if you want to detect volume anomalies in an hourly resolution, you - _Default: daily buckets. `time_bucket: {period: day, count: 1}`_ - _Relevant tests: Anomaly detection tests with `timestamp_column`_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model tests: - elementary.volume_anomalies: time_bucket: - period: hour - count: 4 + period: day + count: 2 +``` + +```yml model +models: + - name: this_is_a_model + config: + elementary: + time_bucket: + period: hour + count: 4 +``` + +```yml dbt_project.yml +vars: + time_bucket: + period: hour + count: 12 ``` diff --git a/docs/guides/anomaly-detection-configuration/timestamp-column.mdx b/docs/guides/anomaly-detection-configuration/timestamp-column.mdx index 54b332ed7..d72390686 100644 --- a/docs/guides/anomaly-detection-configuration/timestamp-column.mdx +++ b/docs/guides/anomaly-detection-configuration/timestamp-column.mdx @@ -15,11 +15,18 @@ If undefined, default is null (no time buckets). - _Default: none_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: model config, test config_ -```yaml model +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + timestamp_column: created_at +``` + +```yml model models: - name: this_is_a_model config: @@ -27,9 +34,8 @@ models: timestamp_column: updated_at ``` - -```yaml source -ources: +```yml source +sources: - name: my_non_dbt_tables schema: raw tables: @@ -39,12 +45,10 @@ ources: timestamp_column: loaded_at ``` -```yaml test -models: - - name: this_is_a_model - tests: - - elementary.volume_anomalies: - timestamp_column: created_at +```yml dbt_project.yml +vars: + timestamp_column: loaded_at ``` + diff --git a/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx b/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx index 46542f66d..e7068b1af 100644 --- a/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx +++ b/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx @@ -19,7 +19,7 @@ The test can work in a couple of modes: -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/where-expression.mdx b/docs/guides/anomaly-detection-configuration/where-expression.mdx index 8f0096c6f..7413e4482 100644 --- a/docs/guides/anomaly-detection-configuration/where-expression.mdx +++ b/docs/guides/anomaly-detection-configuration/where-expression.mdx @@ -9,11 +9,10 @@ Filter the tested data using a valid sql expression. - _Default: None_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model tests: @@ -21,4 +20,18 @@ models: where_expression: "user_name != 'test'" ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + where_expression: "loaded_at is not null" +``` + +```yml dbt_project.yml +vars: + timestamp_column: "loaded_at > '2022-01-01'" +``` + + \ No newline at end of file From 30411482583ca1b2649d4b50d8ff7a29f3d7cf63 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 15:17:09 +0300 Subject: [PATCH 019/194] removed `table_anomalies` --- docs/guides/add-elementary-tests.mdx | 13 +++------ .../column-anomalies.mdx | 2 +- .../guides/elementary-tests-configuration.mdx | 6 ++-- docs/tutorial/adding-elementary-tests.mdx | 28 ++++++------------- 4 files changed, 17 insertions(+), 32 deletions(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 4bd341013..4bac22a9f 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -87,10 +87,8 @@ models: elementary: timestamp_column: < timestamp column > tests: - - elementary.table_anomalies: - table_anomalies: < specific monitors, all if null > + - elementary.freshness_anomalies: # optional - configure different freshness column than timestamp column - freshness_column: < freshness_column > where_expression: < sql expression > time_bucket: period: < time period > @@ -128,10 +126,7 @@ models: elementary: timestamp_column: 'loaded_at' tests: - - elementary.table_anomalies: - table_anomalies: - - row_count - - freshness + - elementary.volume_anomalies: # optional - use tags to run elementary tests on a dedicated run tags: ['elementary'] config: @@ -160,7 +155,7 @@ models: - name: users ## if no timestamp is configured, elementary will monitor without time filtering tests: - elementary.table_anomalies + elementary.volume_anomalies tags: ['elementary'] columns: - name: user_id @@ -203,7 +198,7 @@ sources: elementary: timestamp_column: "loaded_at" tests: - - elementary.table_anomalies + - elementary.freshness_anomalies - elementary.dimension_anomalies: dimensions: - event_type diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 7572b6831..4a828dbc7 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -87,7 +87,7 @@ models: - name: users ## if no timestamp is configured, elementary will monitor without time filtering tests: - elementary.table_anomalies + elementary.volume_anomalies tags: ['elementary'] columns: - name: user_id diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index 39538ef2f..a085c52ae 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -136,7 +136,7 @@ models: elementary: timestamp_column: updated_at tests: - - elementary.table_anomalies: + - elementary.freshness_anomalies: tags: ["elementary"] - elementary.all_columns_anomalies: tags: ["elementary"] @@ -144,7 +144,7 @@ models: - name: users ## if no timestamp is configured, elementary will monitor without time filtering tests: - - elementary.table_anomalies: + - elementary.volume_anomalies: tags: ["elementary"] ``` @@ -174,7 +174,7 @@ sources: elementary: timestamp_column: "loaded_at" tests: - - elementary.table_anomalies + - elementary.volume_anomalies - elementary.all_columns_anomalies: column_anomalies: - null_count diff --git a/docs/tutorial/adding-elementary-tests.mdx b/docs/tutorial/adding-elementary-tests.mdx index 7d4b0a4d1..0d104f130 100644 --- a/docs/tutorial/adding-elementary-tests.mdx +++ b/docs/tutorial/adding-elementary-tests.mdx @@ -21,7 +21,7 @@ A `schema.yml` file that includes all the below tests can be found at the bottom First, we will use the **tables_anomalies** test to perform a **row_count**. This tests counts the number of rows created in a given period to determine if there have been any anomalies in the number of signups in a given time period. -We will add the **table_anomalies** test using **row_count** as a monitor as follows: +We will add the **volume_anomalies** test as follows: ```yaml models: @@ -30,12 +30,10 @@ models: config: tags: ["PII"] tests: - - elementary.table_anomalies: - table_anomalies: - - row_count + - elementary.volume_anomalies ``` -Now that we have selected row_count as our monitor, we must define a column to use for our timestamp. This will be used to create time buckets for anomaly detection. We select the **signup_date** column as seen below: +Now that we have configured a test, we should define a column to use for our timestamp. This will be used to create time buckets for anomaly detection. We select the **signup_date** column as seen below: ```yaml models: @@ -46,9 +44,7 @@ models: elementary: timestamp_column: "signup_date" tests: - - elementary.table_anomalies: - table_anomalies: - - row_count + - elementary.volume_anomalies ``` This test will fail if there are any days (as defined by **signup_date**) where the number of rows exceeds 3 standard deviations above/below the mean. @@ -56,7 +52,7 @@ This test will fail if there are any days (as defined by **signup_date**) where
-Similar to Test 1, we will use the **table_anomalies** test and **row_count** to detect an anomalous number of returned orders in a given time period. In this test, however, we will define the timestamp column at the test level - instead of at the model level. +Similar to Test 1, we will use the **volume_anomalies** test to detect an anomalous number of returned orders in a given time period. In this test, however, we will define the timestamp column at the test level - instead of at the model level. ```yaml - name: returned_orders description: This table contains all of the returned orders @@ -64,10 +60,8 @@ Similar to Test 1, we will use the **table_anomalies** test and **row_count** to tags: ["finance"] tests: - - elementary.table_anomalies: + - elementary.volume_anomalies tags: ["table_anomalies"] - table_anomalies: - - row_count timestamp_column: "order_date" ```` @@ -140,10 +134,8 @@ models: elementary: timestamp_column: "signup_date" tests: - - elementary.table_anomalies: - table_anomalies: - - row_count - + - elementary.volume_anomalies + columns: - name: customer_id description: This is a unique identifier for a customer @@ -225,10 +217,8 @@ models: tags: ["finance"] tests: - - elementary.table_anomalies: + - elementary.volume_anomalies: tags: ["table_anomalies"] - table_anomalies: - - row_count timestamp_column: "order_date" columns: From 70230a0f8c5549849d9caa4f0f6b6c3dc9b9be4a Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 15:50:49 +0300 Subject: [PATCH 020/194] docs min training set size --- .../min-training-set-size.mdx | 59 +++++++++++++++++++ .../all-columns-anomalies.mdx | 1 + .../column-anomalies.mdx | 1 + .../dimension-anomalies.mdx | 1 + .../event-freshness-anomalies.mdx | 1 + .../freshness-anomalies.mdx | 1 + .../volume-anomalies.mdx | 1 + .../guides/elementary-tests-configuration.mdx | 2 + docs/mint.json | 1 + 9 files changed, 68 insertions(+) create mode 100644 docs/guides/anomaly-detection-configuration/min-training-set-size.mdx diff --git a/docs/guides/anomaly-detection-configuration/min-training-set-size.mdx b/docs/guides/anomaly-detection-configuration/min-training-set-size.mdx new file mode 100644 index 000000000..91f9c8ab1 --- /dev/null +++ b/docs/guides/anomaly-detection-configuration/min-training-set-size.mdx @@ -0,0 +1,59 @@ +--- +title: "min_training_set_size" +sidebarTitle: "min_training_set_size" +--- + +`min_training_set_size: [int]` + +The minimal amount of data points a test requires for calculating and detecting an anomaly. +It's recommended not to configure a value smaller than 14, so the result could be statistically significant. + +- _Default: 14_ +- _Relevant tests: All anomaly detection tests_ + + + min_training_set_size change impact + + + + + +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + min_training_set_size: 20 +``` + +```yml model +models: + - name: this_is_a_model + config: + elementary: + min_training_set_size: 18 +``` + +```yml dbt_project.yml +vars: + min_training_set_size: 15 +``` + + + + + + +#### How it works? + +If the test won't have at least `min_training_set_size` it will pass, as there isn't enough data to determine if there is an anomaly. +The Elementary report will show a message saying "Not enough data to calculate anomaly score" instead of a graph. + +#### The impact of changing `min_training_set_size` + +If you **increase `min_training_set_size`** your test training set will be larger. This means a larger sample size for calculating the expected range, which should make the test less sensitive to outliers. This means less chance of false positive anomalies, but also less sensitivity so anomalies have a higher threshold. + +If you **decrease `min_training_set_size`** your test training set will be smaller. This means a smaller sample size for calculating the expected range, which might make the test more sensitive to outliers. This means more chance of false positive anomalies, but also more sensitivity as anomalies have a lower threshold. \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index b3b9a182b..36a690af7 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -30,6 +30,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 4a828dbc7..164edac57 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -27,6 +27,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index 7155ae525..b1b1eedae 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -26,6 +26,7 @@ _Required configuration: `dimensions`_ -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index c76b159de..632813176 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -33,6 +33,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 47fa9a787..18bc080c8 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -29,6 +29,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 5921e41af..0255d6d7c 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -31,6 +31,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- anomaly_direction: [both | spike | drop]> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index a085c52ae..dab5a401a 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -41,6 +41,7 @@ Elementary searches and prioritizes configuration in the following order: -- timestamp_column: column name> -- where_expression: sql expression> -- anomaly_sensitivity: int> + -- min_training_set_size: int> -- anomaly_direction: [both | spike | drop]> Anomaly detection tests with timestamp_column: @@ -80,6 +81,7 @@ Elementary searches and prioritizes configuration in the following order: Training period and training set: -- days_back: int> + -- min_training_set_size: int> -- seasonality: day_of_week> Time buckets: diff --git a/docs/mint.json b/docs/mint.json index 40fc4699a..45eafc16c 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -134,6 +134,7 @@ "guides/anomaly-detection-configuration/backfill-days", "guides/anomaly-detection-configuration/time-bucket", "guides/anomaly-detection-configuration/seasonality", + "guides/anomaly-detection-configuration/min-training-set-size", "guides/anomaly-detection-configuration/column-anomalies", "guides/anomaly-detection-configuration/exclude_prefix", "guides/anomaly-detection-configuration/exclude_regexp", From 2b918d87c22c3c49fc45d9fef2323fd0f59a9359 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 10:59:59 +0300 Subject: [PATCH 021/194] jobs info --- .../collect-job-data.mdx | 74 +++++++++++++++++++ docs/mint.json | 1 + 2 files changed, 75 insertions(+) create mode 100644 docs/deployment-and-configuration/collect-job-data.mdx diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx new file mode 100644 index 000000000..c8f8f5c40 --- /dev/null +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -0,0 +1,74 @@ +--- +title: "Collect jobs info from orchestrator" +sidebarTitle: "Jobs name & info" +--- + +🚧 _Under development_ 🚧 + +Elementary can collect metadata about your jobs from the orchestrator you are using, and enrich the Elementary report with this information. + +The goal is to provide context that is useful to triage and resolve data issues, such as: +- Is my freshness / volume issue related to a job that didn't complete? Which job? +- Which tables were built as part of the job that loaded data with issues? +- Which job should I rerun to resolve? + + +Elementary supports collecting the following job details: +- Orchestrator name: `orchestrator` +- Job name: `job_name` +- Job ID: `job_id` +- Job URL: `job_url` +- Job run ID: `job_run_id` + +### How Elementary collects jobs metadata? + +**Environment variables** +Elementary collects jobs metadata in run time from `env_vars`. +Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. + +To configure `env_var` for your orchestrator, refer to your orchestrator's docs. + +**dbt vars** +Elementary also supports passing job metadata as dbt vars. If `env_var` and `var` exist, the `var` will be prioritized. + +To pass job data to elementary using `var`, use the `--vars` flag in your invocations: +```shell +dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' +``` + +### Which orchestrators are supported? + +Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. +These are the default env var that are collected: + +| Orchestrator | Env vars | +|--------------|--------------------------------------------------------| +| Any | `ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID` | + +The following orchestrators and their default environment variables are supported out of the box: + +| Orchestrator | Env vars | +|----------------|--------------------------------------------------------------------------------------------------------------------| +| dbt cloud | orchestrator name, job_id: `DBT_CLOUD_JOB_ID`, job_run_id: `DBT_CLOUD_RUN_ID` | +| Github actions | orchestrator name, job_run_id: `GITHUB_RUN_ID`, job_url: generated from `GITHUB_SERVER_URL`, `GITHUB_REPOSITORY`, `GITHUB_RUN_ID` | +| Airflow | orchestrator name | + + +### What if I use dbt cloud + orchestrator? + +By default, Elementary will collect the dbt cloud jobs info. +If you wish to override that, change your dbt cloud invocations to pass the orchestrator job info using `--vars`: +```shell +dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' +``` + +### Where can I see my job info? + +- In your Elementary schema, the fields are stored in the table `dbt_invocations`. +- In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. + + +### Can't find your orchestrator? Missing info? + +We would love to support more orchestrators and collect more useful info! +Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. \ No newline at end of file diff --git a/docs/mint.json b/docs/mint.json index 7f1a38ff8..6a2c79a6b 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -102,6 +102,7 @@ "group": "Deployment and Configuration", "pages": [ "deployment-and-configuration/elementary-in-production", + "deployment-and-configuration/collect-job-data", "understand-elementary/cli-install", "understand-elementary/cli-commands" ] From a7f073bfa1439db2f4212ab3f982652669816c93 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:03:06 +0300 Subject: [PATCH 022/194] jobs info --- .../collect-job-data.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index c8f8f5c40..9a8a8e799 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -13,7 +13,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Which job should I rerun to resolve? -Elementary supports collecting the following job details: +#### Elementary supports collecting the following job details: - Orchestrator name: `orchestrator` - Job name: `job_name` - Job ID: `job_id` @@ -22,13 +22,13 @@ Elementary supports collecting the following job details: ### How Elementary collects jobs metadata? -**Environment variables** +#### Environment variables Elementary collects jobs metadata in run time from `env_vars`. Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. To configure `env_var` for your orchestrator, refer to your orchestrator's docs. -**dbt vars** +#### dbt vars Elementary also supports passing job metadata as dbt vars. If `env_var` and `var` exist, the `var` will be prioritized. To pass job data to elementary using `var`, use the `--vars` flag in your invocations: @@ -47,11 +47,11 @@ These are the default env var that are collected: The following orchestrators and their default environment variables are supported out of the box: -| Orchestrator | Env vars | -|----------------|--------------------------------------------------------------------------------------------------------------------| -| dbt cloud | orchestrator name, job_id: `DBT_CLOUD_JOB_ID`, job_run_id: `DBT_CLOUD_RUN_ID` | -| Github actions | orchestrator name, job_run_id: `GITHUB_RUN_ID`, job_url: generated from `GITHUB_SERVER_URL`, `GITHUB_REPOSITORY`, `GITHUB_RUN_ID` | -| Airflow | orchestrator name | +| Orchestrator | Env vars | +|----------------|----------------------------------------------------------------------------------------------------------------------------| +| dbt cloud | orchestrator
job_id: `DBT_CLOUD_JOB_ID`
job_run_id: `DBT_CLOUD_RUN_ID` | +| Github actions | orchestrator
job_run_id: `GITHUB_RUN_ID`
job_url: generated from `GITHUB_SERVER_URL`, `GITHUB_REPOSITORY`, `GITHUB_RUN_ID` | +| Airflow | orchestrator | ### What if I use dbt cloud + orchestrator? From 98fbdc5f9adc00ec7169623b1ac78666c2b4c2ae Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:03:38 +0300 Subject: [PATCH 023/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 9a8a8e799..b17f8871a 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -20,7 +20,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Job URL: `job_url` - Job run ID: `job_run_id` -### How Elementary collects jobs metadata? +## How Elementary collects jobs metadata? #### Environment variables Elementary collects jobs metadata in run time from `env_vars`. @@ -36,7 +36,7 @@ To pass job data to elementary using `var`, use the `--vars` flag in your invoca dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' ``` -### Which orchestrators are supported? +## Which orchestrators are supported? Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. These are the default env var that are collected: @@ -54,7 +54,7 @@ The following orchestrators and their default environment variables are supporte | Airflow | orchestrator | -### What if I use dbt cloud + orchestrator? +## What if I use dbt cloud + orchestrator? By default, Elementary will collect the dbt cloud jobs info. If you wish to override that, change your dbt cloud invocations to pass the orchestrator job info using `--vars`: @@ -62,13 +62,13 @@ If you wish to override that, change your dbt cloud invocations to pass the orch dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' ``` -### Where can I see my job info? +## Where can I see my job info? - In your Elementary schema, the fields are stored in the table `dbt_invocations`. - In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. -### Can't find your orchestrator? Missing info? +## Can't find your orchestrator? Missing info? We would love to support more orchestrators and collect more useful info! Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. \ No newline at end of file From e99f3ee47dabfc0abe9709174edad0e0fa2a1535 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:04:30 +0300 Subject: [PATCH 024/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index b17f8871a..6370826ce 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -13,7 +13,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Which job should I rerun to resolve? -#### Elementary supports collecting the following job details: +**Elementary collects the following job details:** - Orchestrator name: `orchestrator` - Job name: `job_name` - Job ID: `job_id` From ced95dcab1192607d333d31f10bb6c8f1255bd28 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:11:37 +0300 Subject: [PATCH 025/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 6370826ce..d91686fc9 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -17,7 +17,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Orchestrator name: `orchestrator` - Job name: `job_name` - Job ID: `job_id` -- Job URL: `job_url` +- Job results URL: `job_url` - Job run ID: `job_run_id` ## How Elementary collects jobs metadata? @@ -64,7 +64,7 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa ## Where can I see my job info? -- In your Elementary schema, the fields are stored in the table `dbt_invocations`. +- In your Elementary schema, the raw fields are stored in the table `dbt_invocations`. You could also use the view `job_run_results` which groups invocation by job. - In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. From e57c0c39e02177235e2b60bd1abaa423b678d349 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:21:36 +0300 Subject: [PATCH 026/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index d91686fc9..2bfbedd31 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -36,6 +36,16 @@ To pass job data to elementary using `var`, use the `--vars` flag in your invoca dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' ``` +#### Variables supported format + +| var / env_var | Format | +|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| +| orchestrator | one of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | +| job_name, job_id, job_run_d | string | +| job_url | valid HTTP URL | + + + ## Which orchestrators are supported? Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. From 02f27019271501c881bd740e452917eac355b90c Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:22:09 +0300 Subject: [PATCH 027/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 2bfbedd31..b46cd4fa6 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -40,9 +40,9 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa | var / env_var | Format | |-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| -| orchestrator | one of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | -| job_name, job_id, job_run_d | string | -| job_url | valid HTTP URL | +| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | +| job_name, job_id, job_run_d | String | +| job_url | Valid HTTP URL | From e02da98f8cac5e68329bf2b1d5fe915d74f66e2f Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:32:19 +0300 Subject: [PATCH 028/194] jobs info --- .../collect-job-data.mdx | 26 +++++++++---------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index b46cd4fa6..5990bb2ae 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -18,7 +18,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Job name: `job_name` - Job ID: `job_id` - Job results URL: `job_url` -- Job run ID: `job_run_id` +- The ID of a specific run execution: `job_run_id` ## How Elementary collects jobs metadata? @@ -26,7 +26,10 @@ The goal is to provide context that is useful to triage and resolve data issues, Elementary collects jobs metadata in run time from `env_vars`. Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. -To configure `env_var` for your orchestrator, refer to your orchestrator's docs. +These are the env vars that are collected: +`ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID` + +To configure `env_var` for your orchestrator, refer to your orchestrator's docs. #### dbt vars Elementary also supports passing job metadata as dbt vars. If `env_var` and `var` exist, the `var` will be prioritized. @@ -38,24 +41,19 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa #### Variables supported format -| var / env_var | Format | -|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| -| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | -| job_name, job_id, job_run_d | String | -| job_url | Valid HTTP URL | +| var / env_var | Format | +|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| +| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | +| job_name, job_id, job_run_id | String | +| job_url | Valid HTTP URL | ## Which orchestrators are supported? -Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. -These are the default env var that are collected: - -| Orchestrator | Env vars | -|--------------|--------------------------------------------------------| -| Any | `ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID` | +You can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. -The following orchestrators and their default environment variables are supported out of the box: +The following default environment variables are supported out of the box: | Orchestrator | Env vars | |----------------|----------------------------------------------------------------------------------------------------------------------------| From 082641e507c012e2542268da67a838ac46b819a4 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 1 Jun 2023 07:41:29 +0300 Subject: [PATCH 029/194] Update collect-job-data.mdx --- docs/deployment-and-configuration/collect-job-data.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 5990bb2ae..7ef4aa542 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -3,7 +3,7 @@ title: "Collect jobs info from orchestrator" sidebarTitle: "Jobs name & info" --- -🚧 _Under development_ 🚧 +_Supported in Elementary 0.8.0 and above_ Elementary can collect metadata about your jobs from the orchestrator you are using, and enrich the Elementary report with this information. @@ -79,4 +79,4 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa ## Can't find your orchestrator? Missing info? We would love to support more orchestrators and collect more useful info! -Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. \ No newline at end of file +Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. From 924bda6d712d41822833e1bed56af306dbc25017 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:43:19 +0300 Subject: [PATCH 030/194] jobs info --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 36a690af7..dd729dc95 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int +      period: [hour | day | week | month] +      count: int -- seasonality: day_of_week>
From 5e07358e31cb96118ea191c729e5ec3a0db70c82 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:43:59 +0300 Subject: [PATCH 031/194] jobs info --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index dd729dc95..1c79fceab 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -      period: [hour | day | week | month] -      count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week> From 93fda01d89ed23a07be7aa5c51a9378b6866a60f Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:44:19 +0300 Subject: [PATCH 032/194] jobs info --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 1c79fceab..eec0ccd00 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -   period: [hour | day | week | month] -   count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week> From 2bcf6d8e7f103b58911254449910b7e433a21f0a Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:45:52 +0300 Subject: [PATCH 033/194] jobs info --- docs/guides/anomaly-detection-tests/column-anomalies.mdx | 4 ++-- docs/guides/anomaly-detection-tests/dimension-anomalies.mdx | 4 ++-- .../anomaly-detection-tests/event-freshness-anomalies.mdx | 4 ++-- docs/guides/anomaly-detection-tests/freshness-anomalies.mdx | 4 ++-- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 164edac57..7fcb550cb 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -29,8 +29,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index b1b1eedae..a8f78dde6 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -28,8 +28,8 @@ _Required configuration: `dimensions`_ -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int +   period: [hour | day | week | month] + count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index 632813176..d97e2e4b5 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -35,8 +35,8 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int + period: [hour | day | week | month] + count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 18bc080c8..9c6093642 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -31,8 +31,8 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int + period: [hour | day | week | month] + count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 0255d6d7c..9fcc87628 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int + period: [hour | day | week | month] + count: int -- seasonality: day_of_week> From 7e0a7374c00572dbe7a5a320e79a50ac5810c04a Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:38:40 +0300 Subject: [PATCH 034/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 9fcc87628..cb0e62cca 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> - period: [hour | day | week | month] - count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week> From eaf18c3d3570e6b439b5d7ce503f11c8a2a5b863 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:41:38 +0300 Subject: [PATCH 035/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index cb0e62cca..21b114f51 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -   period: [hour | day | week | month] -   count: int +    period: [hour | day | week | month] +    count: int -- seasonality: day_of_week> From e0a6e7ce9ec35b8b4849de0a84beadc0220f7d9b Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:42:10 +0300 Subject: [PATCH 036/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 21b114f51..10e68d3fc 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -    period: [hour | day | week | month] -    count: int +     period: [hour | day | week | month] +     count: int -- seasonality: day_of_week> From 85834cfcd3c3d44f4a55b7ce6c75eabb8380c37d Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:42:56 +0300 Subject: [PATCH 037/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 10e68d3fc..c7c03812f 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -24,8 +24,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:
-    elementary.volume_anomalies:
-      -- timestamp_column: column name>
+        elementary.volume_anomalies:
+          -- timestamp_column: column name>
       -- where_expression: sql expression>
       -- anomaly_sensitivity: int>
       -- anomaly_direction: [both | spike | drop]>

From 63d25b5ff7e771c6a94d2c9be466cf80d5f6965c Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Thu, 1 Jun 2023 17:44:39 +0300
Subject: [PATCH 038/194] jobs info

---
 docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
index c7c03812f..b89113aa6 100644
--- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
@@ -24,7 +24,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim
 
  
   tests:
-        elementary.volume_anomalies:
+       elementary.volume_anomalies:
           -- timestamp_column: column name>
       -- where_expression: sql expression>
       -- anomaly_sensitivity: int>

From 32264460b0bbe62cae40e5e19d1c964a9438ded1 Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Thu, 1 Jun 2023 17:45:35 +0300
Subject: [PATCH 039/194] jobs info

---
 .../volume-anomalies.mdx                      | 20 +++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
index b89113aa6..a078083f9 100644
--- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
@@ -26,16 +26,16 @@ No mandatory configuration, however it is highly recommended to configure a `tim
   tests:
        elementary.volume_anomalies:
           -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- anomaly_direction: [both | spike | drop]>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-          period: [hour | day | week | month]
-          count: int
-      -- seasonality: day_of_week>
+          -- where_expression: sql expression>
+          -- anomaly_sensitivity: int>
+          -- anomaly_direction: [both | spike | drop]>
+          -- days_back: int>
+          -- backfill_days: int>
+          -- min_training_set_size: int>
+          -- time_bucket:>
+             period: [hour | day | week | month]
+             count: int
+          -- seasonality: day_of_week>
  
 
From d404be9709a8bcbef2902af15340f9995c3c9914 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:08:38 +0300 Subject: [PATCH 040/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index a078083f9..24b552351 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim     -- backfill_days: int>     -- min_training_set_size: int>     -- time_bucket:> -        period: [hour | day | week | month] -        count: int +         period: [hour | day | week | month] +         count: int     -- seasonality: day_of_week>
From 6ea7e929ac29862676950aa22e5ca437c190112c Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:09:35 +0300 Subject: [PATCH 041/194] jobs info --- .../volume-anomalies.mdx | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 24b552351..5990b989b 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -24,18 +24,18 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:
-       elementary.volume_anomalies:
-          -- timestamp_column: column name>
-          -- where_expression: sql expression>
-          -- anomaly_sensitivity: int>
-          -- anomaly_direction: [both | spike | drop]>
-          -- days_back: int>
-          -- backfill_days: int>
-          -- min_training_set_size: int>
-          -- time_bucket:>
+       -- elementary.volume_anomalies:
+          timestamp_column: column name>
+          where_expression: sql expression>
+          anomaly_sensitivity: int>
+          anomaly_direction: [both | spike | drop]>
+          days_back: int>
+          backfill_days: int>
+          min_training_set_size: int>
+          time_bucket:>
               period: [hour | day | week | month]
               count: int
-          -- seasonality: day_of_week>
+          seasonality: day_of_week>
  
 
From 75bfbcbc0ec80090a17265453a6c104c586da066 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:10:33 +0300 Subject: [PATCH 042/194] jobs info --- .../volume-anomalies.mdx | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 5990b989b..507f498c7 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -     timestamp_column: column name> -     where_expression: sql expression> -     anomaly_sensitivity: int> -     anomaly_direction: [both | spike | drop]> -     days_back: int> -     backfill_days: int> -     min_training_set_size: int> -     time_bucket:> +      timestamp_column: column name> +      where_expression: sql expression> +      anomaly_sensitivity: int> +      anomaly_direction: [both | spike | drop]> +      days_back: int> +      backfill_days: int> +      min_training_set_size: int> +      time_bucket:>         period: [hour | day | week | month]         count: int -     seasonality: day_of_week> +      seasonality: day_of_week> From 8782034ce746097fe3504b21539e69337553e2c4 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:16:15 +0300 Subject: [PATCH 043/194] jobs info --- .../volume-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 507f498c7..77de63b93 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -      timestamp_column: column name> -      where_expression: sql expression> -      anomaly_sensitivity: int> -      anomaly_direction: [both | spike | drop]> -      days_back: int> -      backfill_days: int> -      min_training_set_size: int> -      time_bucket:> -         period: [hour | day | week | month] -         count: int -      seasonality: day_of_week> +       timestamp_column: column name> +       where_expression: sql expression> +       anomaly_sensitivity: int> +       anomaly_direction: [both | spike | drop]> +       days_back: int> +       backfill_days: int> +       min_training_set_size: int> +       time_bucket:> +          period: [hour | day | week | month] +          count: int +       seasonality: day_of_week> From b1221345f59f7a2906bd9deef351ce4e8011d34e Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:18:14 +0300 Subject: [PATCH 044/194] jobs info --- .../volume-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 77de63b93..b6f05cbb7 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -       timestamp_column: column name> -       where_expression: sql expression> -       anomaly_sensitivity: int> -       anomaly_direction: [both | spike | drop]> -       days_back: int> -       backfill_days: int> -       min_training_set_size: int> -       time_bucket:> -          period: [hour | day | week | month] -          count: int -       seasonality: day_of_week> +         timestamp_column: column name> +         where_expression: sql expression> +         anomaly_sensitivity: int> +         anomaly_direction: [both | spike | drop]> +         days_back: int> +         backfill_days: int> +         min_training_set_size: int> +         time_bucket:> +            period: [hour | day | week | month] +            count: int +         seasonality: day_of_week> From b7ef2da07f8241bc678b2bf7423a205532456ec8 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:18:41 +0300 Subject: [PATCH 045/194] jobs info --- .../volume-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index b6f05cbb7..d4b1fd77e 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -         timestamp_column: column name> -         where_expression: sql expression> -         anomaly_sensitivity: int> -         anomaly_direction: [both | spike | drop]> -         days_back: int> -         backfill_days: int> -         min_training_set_size: int> -         time_bucket:> -            period: [hour | day | week | month] -            count: int -         seasonality: day_of_week> +        timestamp_column: column name> +        where_expression: sql expression> +        anomaly_sensitivity: int> +        anomaly_direction: [both | spike | drop]> +        days_back: int> +        backfill_days: int> +        min_training_set_size: int> +        time_bucket:> +           period: [hour | day | week | month] +           count: int +        seasonality: day_of_week> From 3ac229f7e34559e62163177b6cd79e58ba277c2c Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:28:27 +0300 Subject: [PATCH 046/194] tests config formating --- .../all-columns-anomalies.mdx | 31 ++++++++++--------- .../column-anomalies.mdx | 27 ++++++++-------- .../dimension-anomalies.mdx | 27 ++++++++-------- .../event-freshness-anomalies.mdx | 26 ++++++++-------- .../freshness-anomalies.mdx | 24 +++++++------- 5 files changed, 69 insertions(+), 66 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index eec0ccd00..30642fe87 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -20,21 +20,22 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
-  tests:
-    elementary.volume_anomalies:
-      -- column_anomalies: column monitors list>
-      -- exclude_prefix: string>
-      -- exclude_regexp: regex>
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-               period: [hour | day | week | month]
-               count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.all_columns_anomalies:
+             column_anomalies: column monitors list>
+             exclude_prefix: string>
+             exclude_regexp: regex>
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                     
+             anomaly_direction: [both | spike | drop]>                                       
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>           
  
 
diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 7fcb550cb..98ca6294e 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -19,19 +19,20 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
-  tests:
-    elementary.volume_anomalies:
-      -- column_anomalies: column monitors list>
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-               period: [hour | day | week | month]
-               count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.column_anomalies:
+             column_anomalies: column monitors list>
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                     
+             anomaly_direction: [both | spike | drop]>                                       
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>                                                             
  
 
diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index a8f78dde6..61c2cdd8c 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -18,19 +18,20 @@ _Required configuration: `dimensions`_
  
-  tests:
-    elementary.volume_anomalies:
-      -- dimensions: sql expression>
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-               period: [hour | day | week | month]
-               count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.dimension_anomalies:
+             dimensions: sql expression>
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                     
+             anomaly_direction: [both | spike | drop]>                                       
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>                                                             
  
 
diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index d97e2e4b5..1a0b2bee6 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -25,19 +25,19 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:
-    elementary.volume_anomalies:
-      -- event_timestamp_column: column name>
-      -- update_timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-              period: [hour | day | week | month]
-              count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.event_freshness_anomalies`:                                                                                                                                                                                
+             event_timestamp_column: column name>
+             update_timestamp_column: column name>
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                      
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>        
  
 
diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 9c6093642..7d03782f9 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,18 +22,18 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:
-    elementary.volume_anomalies:
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-                period: [hour | day | week | month]
-                count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.freshness_anomalies:                                                                                                                                                                                
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                      
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>                                                             
  
 
From 6ab1988fde13a1c9b6161c838666ceb1a19e343a Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:59:45 +0300 Subject: [PATCH 047/194] tests config formating --- .../all-columns-anomalies.mdx | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 30642fe87..90b0d3526 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -39,6 +39,24 @@ No mandatory configuration, however it is highly recommended to configure a `tim +
                                                                                                                                                                                                 
+                                                                                                                                                                                                
+  tests:                                                                                                                                                                                              
+       -- elementary.volume_anomalies:                                                                                                                                                      
+             timestamp_column: column name>                         
+             where_expression: sql expression>                      
+             anomaly_sensitivity: int>                           
+             anomaly_direction: [both | spike | drop]>             
+             days_back: int>                                               
+             backfill_days: int>                                       
+             min_training_set_size: int>                       
+             time_bucket:>                                               
+                period: [hour | day | week | month]       
+                count: int                                
+             seasonality: day_of_week>                                   
+                                                                                                                                                                                               
+
+ From 62cac384d92c4b5ef315cde334d1782a26963f61 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:01:13 +0300 Subject: [PATCH 048/194] tests config formating --- .../all-columns-anomalies.mdx | 50 ++++++------------- 1 file changed, 16 insertions(+), 34 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 90b0d3526..89b340366 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -21,41 +21,23 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:                                                                                                                                                                                                                        
-       -- elementary.all_columns_anomalies:
-             column_anomalies: column monitors list>
-             exclude_prefix: string>
-             exclude_regexp: regex>
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                     
-             anomaly_direction: [both | spike | drop]>                                       
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>           
+      -- elementary.all_columns_anomalies:
+            column_anomalies: column monitors list>
+            exclude_prefix: string>
+            exclude_regexp: regex>
+            timestamp_column: column name>                                                   
+            where_expression: sql expression>                                                
+            anomaly_sensitivity: int>                                                     
+            anomaly_direction: [both | spike | drop]>                                       
+            days_back: int>                                                                         
+            backfill_days: int>                                                                 
+            min_training_set_size: int>                                                 
+            time_bucket:>                                                                         
+               period: [hour | day | week | month]                                 
+               count: int                                                          
+            seasonality: day_of_week>           
  
-
- -
                                                                                                                                                                                                 
-                                                                                                                                                                                                
-  tests:                                                                                                                                                                                              
-       -- elementary.volume_anomalies:                                                                                                                                                      
-             timestamp_column: column name>                         
-             where_expression: sql expression>                      
-             anomaly_sensitivity: int>                           
-             anomaly_direction: [both | spike | drop]>             
-             days_back: int>                                               
-             backfill_days: int>                                       
-             min_training_set_size: int>                       
-             time_bucket:>                                               
-                period: [hour | day | week | month]       
-                count: int                                
-             seasonality: day_of_week>                                   
-                                                                                                                                                                                               
-
+ From c4f4e9f401b65abff63a71973db79d1be18ea757 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:01:59 +0300 Subject: [PATCH 049/194] tests config formating --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 89b340366..99aa46bd3 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -21,7 +21,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:                                                                                                                                                                                                                        
-      -- elementary.all_columns_anomalies:
+    -- elementary.all_columns_anomalies:
             column_anomalies: column monitors list>
             exclude_prefix: string>
             exclude_regexp: regex>

From 5861c82f99c9dccef907c7b3904dd5f1f2763a15 Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Thu, 1 Jun 2023 19:02:35 +0300
Subject: [PATCH 050/194] tests config formating

---
 .../all-columns-anomalies.mdx                 | 31 +++++++++----------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx
index 99aa46bd3..75ee3aaa3 100644
--- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx
@@ -19,23 +19,22 @@ You can use `column_anomalies` param to override the default monitors, and `excl
 No mandatory configuration, however it is highly recommended to configure a `timestamp_column`.
 
 
- 
-  tests:                                                                                                                                                                                                                        
+                                                                                                                                                                                                                  
     -- elementary.all_columns_anomalies:
-            column_anomalies: column monitors list>
-            exclude_prefix: string>
-            exclude_regexp: regex>
-            timestamp_column: column name>                                                   
-            where_expression: sql expression>                                                
-            anomaly_sensitivity: int>                                                     
-            anomaly_direction: [both | spike | drop]>                                       
-            days_back: int>                                                                         
-            backfill_days: int>                                                                 
-            min_training_set_size: int>                                                 
-            time_bucket:>                                                                         
-               period: [hour | day | week | month]                                 
-               count: int                                                          
-            seasonality: day_of_week>           
+         column_anomalies: column monitors list>
+         exclude_prefix: string>
+         exclude_regexp: regex>
+         timestamp_column: column name>                                                   
+         where_expression: sql expression>                                                
+         anomaly_sensitivity: int>                                                     
+         anomaly_direction: [both | spike | drop]>                                       
+         days_back: int>                                                                         
+         backfill_days: int>                                                                 
+         min_training_set_size: int>                                                 
+         time_bucket:>                                                                         
+            period: [hour | day | week | month]                                 
+            count: int                                                          
+         seasonality: day_of_week>           
  
 
From a4707b6b5dbb4308b36bab9f6209b8d57dbbfa0f Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:03:07 +0300 Subject: [PATCH 051/194] tests config formating --- .../all-columns-anomalies.mdx | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 75ee3aaa3..6c5d90ec1 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -21,20 +21,20 @@ No mandatory configuration, however it is highly recommended to configure a `tim
                                                                                                                                                                                                                   
     -- elementary.all_columns_anomalies:
-         column_anomalies: column monitors list>
-         exclude_prefix: string>
-         exclude_regexp: regex>
-         timestamp_column: column name>                                                   
-         where_expression: sql expression>                                                
-         anomaly_sensitivity: int>                                                     
-         anomaly_direction: [both | spike | drop]>                                       
-         days_back: int>                                                                         
-         backfill_days: int>                                                                 
-         min_training_set_size: int>                                                 
-         time_bucket:>                                                                         
-            period: [hour | day | week | month]                                 
-            count: int                                                          
-         seasonality: day_of_week>           
+      column_anomalies: column monitors list>
+      exclude_prefix: string>
+      exclude_regexp: regex>
+      timestamp_column: column name>                                                   
+      where_expression: sql expression>                                                
+      anomaly_sensitivity: int>                                                     
+      anomaly_direction: [both | spike | drop]>                                       
+      days_back: int>                                                                         
+      backfill_days: int>                                                                 
+      min_training_set_size: int>                                                 
+      time_bucket:>                                                                         
+      nbsp;   period: [hour | day | week | month]                                 
+      nbsp;   count: int                                                          
+      seasonality: day_of_week>           
  
 
From 9fa5e17f2860e3dde373432b7a04998cfdb402b6 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:04:14 +0300 Subject: [PATCH 052/194] tests config formating --- .../all-columns-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 6c5d90ec1..ddd173535 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -24,17 +24,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim column_anomalies: column monitors list> exclude_prefix: string> exclude_regexp: regex> - timestamp_column: column name> - where_expression: sql expression> - anomaly_sensitivity: int> - anomaly_direction: [both | spike | drop]> - days_back: int> - backfill_days: int> - min_training_set_size: int> - time_bucket:> - nbsp;   period: [hour | day | week | month] - nbsp;   count: int - seasonality: day_of_week> + timestamp_column: column name> + where_expression: sql expression> + anomaly_sensitivity: int> + anomaly_direction: [both | spike | drop]> + days_back: int> + backfill_days: int> + min_training_set_size: int> + time_bucket:> + nbsp;   period: [hour | day | week | month] + nbsp;   count: int + seasonality: day_of_week>
From 984836c2b009ba2ba8524c0d8d7aa643a4789079 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sat, 3 Jun 2023 16:20:59 +0300 Subject: [PATCH 053/194] tests config formating --- .../anomaly-detection-configuration/anomaly-direction.mdx | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx index 4e1e22698..26ec52192 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx @@ -40,6 +40,10 @@ models: ``` +
+ + + ```yml model models: - name: this_is_a_model @@ -48,6 +52,10 @@ models: anomaly_direction: drop ``` + + + + ```yml dbt_project vars: anomaly_direction: both From 31f821dc45a72865c6dee5ac46bb6e3e16fba23c Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sun, 4 Jun 2023 14:16:43 +0300 Subject: [PATCH 054/194] tests config formating --- .../anomaly-detection-configuration/anomaly-direction.mdx | 8 -------- 1 file changed, 8 deletions(-) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx index 26ec52192..4e1e22698 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx @@ -40,10 +40,6 @@ models: ``` - - - - ```yml model models: - name: this_is_a_model @@ -52,10 +48,6 @@ models: anomaly_direction: drop ``` - - - - ```yml dbt_project vars: anomaly_direction: both From 98e6078505ef60b63bbfb8fb4b236e62f1859028 Mon Sep 17 00:00:00 2001 From: Hahnbee Lee <55263191+hahnbeelee@users.noreply.github.com> Date: Mon, 5 Jun 2023 04:13:43 -0700 Subject: [PATCH 055/194] Remove hidden new lines in code block --- .../event-freshness-anomalies.mdx | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index 1a0b2bee6..14361c2b6 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -25,19 +25,19 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:                                                                                                                                                                                                                        
-       -- elementary.event_freshness_anomalies`:                                                                                                                                                                                
+  tests:
+       -- elementary.event_freshness_anomalies:
              event_timestamp_column: column name>
              update_timestamp_column: column name>
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                      
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>        
+             where_expression: sql expression>
+             anomaly_sensitivity: int>     
+             days_back: int>
+             backfill_days: int>
+             min_training_set_size: int>
+             time_bucket:>
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week>
  
 
@@ -73,4 +73,4 @@ models: severity: warn ``` -
\ No newline at end of file +
From c43d75fcfd173903b712a2bd5facbeb1408276a9 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Mon, 5 Jun 2023 14:44:00 +0300 Subject: [PATCH 056/194] Update event-freshness-anomalies.mdx --- .../event-freshness-anomalies.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index 14361c2b6..a468fc4a0 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -27,17 +27,17 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ tests:    -- elementary.event_freshness_anomalies: -        event_timestamp_column: column name> -        update_timestamp_column: column name> -        where_expression: sql expression> -        anomaly_sensitivity: int> +        event_timestamp_column: column name +        update_timestamp_column: column name +        where_expression: sql expression +        anomaly_sensitivity: int        days_back: int> -        backfill_days: int> -        min_training_set_size: int> -        time_bucket:> +        backfill_days: int +        min_training_set_size: int +        time_bucket:           period: [hour | day | week | month]           count: int -        seasonality: day_of_week> +        seasonality: day_of_week From 1831eef5b40fa32950012ef5c414c2cd940a1e8c Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 5 Jun 2023 14:46:44 +0300 Subject: [PATCH 057/194] tests config formating --- .../event-freshness-anomalies.mdx | 4 ++-- .../volume-anomalies.mdx | 18 +++++++++--------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index a468fc4a0..a0d7558b5 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -30,8 +30,8 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._        event_timestamp_column: column name        update_timestamp_column: column name        where_expression: sql expression -        anomaly_sensitivity: int -        days_back: int> +        anomaly_sensitivity: int +        days_back: int        backfill_days: int        min_training_set_size: int        time_bucket: diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index d4b1fd77e..f9c640f66 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -        timestamp_column: column name> -        where_expression: sql expression> -        anomaly_sensitivity: int> -        anomaly_direction: [both | spike | drop]> -        days_back: int> -        backfill_days: int> -        min_training_set_size: int> -        time_bucket:> +        timestamp_column: column name +        where_expression: sql expression +        anomaly_sensitivity: int +        anomaly_direction: [both | spike | drop] +        days_back: int +        backfill_days: int +        min_training_set_size: int +        time_bucket:           period: [hour | day | week | month]           count: int -        seasonality: day_of_week> +        seasonality: day_of_week From db6b75459e70f1c7867638849156b74edae896ea Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 5 Jun 2023 14:49:50 +0300 Subject: [PATCH 058/194] tests config formating --- .../freshness-anomalies.mdx | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 7d03782f9..28f9eaf2a 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,18 +22,18 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:                                                                                                                                                                                                                        
-       -- elementary.freshness_anomalies:                                                                                                                                                                                
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                      
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>                                                             
+  tests:
+       -- elementary.freshness_anomalies:
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week
  
 
From 16a4013aae32d8e28cd263288638b33e6e707a50 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Mon, 5 Jun 2023 14:54:12 +0300 Subject: [PATCH 059/194] Update freshness-anomalies.mdx --- docs/guides/anomaly-detection-tests/freshness-anomalies.mdx | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 28f9eaf2a..5f8f72afe 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,7 +22,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:
+  tests:      
        -- elementary.freshness_anomalies:
              timestamp_column: column name
              where_expression: sql expression
@@ -33,7 +33,6 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
              time_bucket:
                 period: [hour | day | week | month]
                 count: int
-             seasonality: day_of_week
  
 
@@ -67,4 +66,4 @@ models: severity: warn ``` - \ No newline at end of file + From 7d2d8ddfab27461b9b9956bc65c5251760105cf6 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Mon, 5 Jun 2023 14:54:43 +0300 Subject: [PATCH 060/194] Update freshness-anomalies.mdx --- docs/guides/anomaly-detection-tests/freshness-anomalies.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 5f8f72afe..5817f79ac 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,7 +22,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:      
+  tests:
        -- elementary.freshness_anomalies:
              timestamp_column: column name
              where_expression: sql expression

From ae31724731eb5974e4326bf9a587196cc4e64dd0 Mon Sep 17 00:00:00 2001
From: Maayan Salom 
Date: Mon, 5 Jun 2023 19:12:40 +0300
Subject: [PATCH 061/194] Update column-anomalies.mdx

---
 .../column-anomalies.mdx                      | 28 +++++++++----------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx
index 98ca6294e..165ccb757 100644
--- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx
@@ -19,20 +19,20 @@ No mandatory configuration, however it is highly recommended to configure a `tim
 
 
  
-  tests:                                                                                                                                                                                                                        
+  tests:
        -- elementary.column_anomalies:
-             column_anomalies: column monitors list>
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                     
-             anomaly_direction: [both | spike | drop]>                                       
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>                                                             
+             column_anomalies: column monitors list
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             anomaly_direction: [both | spike | drop]
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week
  
 
@@ -110,4 +110,4 @@ models: tags: ['elementary'] ``` - \ No newline at end of file + From 976bffdf5a982c7fa50cc91b38bf9d7bf21e48e2 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 5 Jun 2023 19:19:22 +0300 Subject: [PATCH 062/194] tests config formating --- .../all-columns-anomalies.mdx | 33 +++++++++---------- .../dimension-anomalies.mdx | 26 +++++++-------- 2 files changed, 29 insertions(+), 30 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index ddd173535..2b768d18f 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -19,25 +19,24 @@ You can use `column_anomalies` param to override the default monitors, and `excl No mandatory configuration, however it is highly recommended to configure a `timestamp_column`.
-                                                                                                                                                                                                                  
-    -- elementary.all_columns_anomalies:
-      column_anomalies: column monitors list>
-      exclude_prefix: string>
-      exclude_regexp: regex>
-      timestamp_column: column name>
-      where_expression: sql expression>
-      anomaly_sensitivity: int>
-      anomaly_direction: [both | spike | drop]>
-      days_back: int>
-      backfill_days: int>
-      min_training_set_size: int>
-      time_bucket:>
-      nbsp;   period: [hour | day | week | month]
-      nbsp;   count: int
-      seasonality: day_of_week>
+ 
+       -- elementary.all_columns_anomalies:
+             column_anomalies: column monitors list
+             exclude_prefix: string
+             exclude_regexp: regex
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             anomaly_direction: [both | spike | drop]
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+             nbsp;   period: [hour | day | week | month]
+             nbsp;   count: int
+             seasonality: day_of_week
  
 
- diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index 61c2cdd8c..42965de14 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -18,20 +18,20 @@ _Required configuration: `dimensions`_
  
-  tests:                                                                                                                                                                                                                        
+  tests:
        -- elementary.dimension_anomalies:
-             dimensions: sql expression>
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                     
-             anomaly_direction: [both | spike | drop]>                                       
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>                                                             
+             dimensions: sql expression
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             anomaly_direction: [both | spike | drop]
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week
  
 
From 79c6fba947610ae50e8f48f4db730e3bf0291107 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 6 Jun 2023 11:21:09 +0300 Subject: [PATCH 063/194] Update all-columns-anomalies.mdx --- .../anomaly-detection-tests/all-columns-anomalies.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 2b768d18f..d8548e969 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim        backfill_days: int        min_training_set_size: int        time_bucket: -        nbsp;   period: [hour | day | week | month] -        nbsp;   count: int +          period: [hour | day | week | month] +          count: int        seasonality: day_of_week
@@ -72,4 +72,4 @@ models: sensitivity: 3.5 ``` - \ No newline at end of file + From 41d5500c5c8001ff7a855a361d9251084b0b88c4 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 6 Jun 2023 11:21:55 +0300 Subject: [PATCH 064/194] Update all-columns-anomalies.mdx --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index d8548e969..b4ed53821 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim        backfill_days: int        min_training_set_size: int        time_bucket: -          period: [hour | day | week | month] -          count: int +           period: [hour | day | week | month] +           count: int        seasonality: day_of_week From f53529b18eb3f1e5fe28872ffec0fbbe60d188f6 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 7 Jun 2023 13:55:34 +0300 Subject: [PATCH 065/194] tests config formating --- docs/mint.json | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/mint.json b/docs/mint.json index 55e6f0daa..9c98b5b42 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -63,7 +63,10 @@ "introduction", { "group": "Quickstart", - "pages": ["quickstart", "quickstart-cli"] + "pages": [ + "quickstart", + "quickstart-cli" + ] }, { "group": "Tutorial", @@ -213,14 +216,19 @@ ] }, { - "group": "Getting Started", + "group": "Elementary Cloud", "pages": [ "cloud/introduction", + "cloud/general/security-and-privacy" + ] + }, + { + "group": "Onboarding", + "pages": [ "cloud/onboarding/quickstart-dbt-package", "cloud/onboarding/signup", "cloud/onboarding/connect-data-warehouse", - "cloud/manage-team", - "cloud/general/security-and-privacy" + "cloud/manage-team" ] } ], From 72a7f6858ba4ec1446f69349d0b310afb0d37d27 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:50:05 +0300 Subject: [PATCH 066/194] cloud docs changes --- docs/cloud/introduction.mdx | 11 +- .../onboarding/connect-data-warehouse.mdx | 33 ++--- docs/cloud/onboarding/create-profile.mdx | 128 ++++++++++++++++++ .../onboarding/quickstart-dbt-package.mdx | 2 +- docs/mint.json | 3 +- 5 files changed, 143 insertions(+), 34 deletions(-) create mode 100644 docs/cloud/onboarding/create-profile.mdx diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index ce75c603a..d9de592b7 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -39,13 +39,4 @@ alt="Elementary Managed high level flow" 2. [Signup and setup integrations](/cloud/onboarding/signup). - - -## Security and privacy - -Elementary cloud requires access only to the Elementary schema and the tables in it. -The data in the schema in full is stored in the client's data warehouse. - -We secure Elementary cloud infrastructure with the highest standards. -You can delete your account at any time, and all your configuration and reports will be deleted immediately and permanently from Elementary servers. -For details, refer to our [Terms of Service](https://www.elementary-data.com/terms-of-service). +
\ No newline at end of file diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 723e3cb09..98132d01a 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -5,34 +5,23 @@ sidebarTitle: "Data warehouse" You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). -Here are the steps needed to enable the connection: +Elementary Cloud needs: +- [`profiles.yml`](/cloud/onboarding/create-profile) with connection details +- Read permissions to the Elementary schema (and not the rest of your data) +- Network access (might require to allowlist Elementary IP address) -### Authentication and IP Allowlist - -Elementary needs authentication details, permissions to read the Elementary schema (and not the rest of your data), and network access enabled by adding the cloud IPs to your data warehouse allowlist. - -Elementary IP for allowlist: `3.126.156.226` - -### Create a `profiles.yml` file - -You will need to provide the connection and authentication details by uploading a YML file with a connection profile named `elementary`. -The profile needs to point at the database and schema name where your elementary tables are. - -The easiest way to generate the profile is to run the following command within the dbt project where you deployed the elementary dbt package (works in dbt cloud as well): +### Connect Elementary cloud -```shell -dbt run-operation elementary.generate_elementary_cli_profile -``` +On the `Account settings` under `Integrations`, press `Connect` on the "Connect Your data warehouse" section. -Save the output to a YML file, update the missing details, and you are ready. +Provide an environment name, select a data warehouse type, and upload the `profiles.yml` file with the `elementary` profile. -Here are the formats of profile for each supported data warehouse: - +### Allowlist Elementary IP +Elementary IP for allowlist: `3.126.156.226` -### Connect Elementary cloud -On the `Account settings` under `Integrations`, press `Connect` on the "Connect Your data warehouse" section. +### Need help with onboarding? -Provide an env name, select a data warehouse type, and upload the `profiles.yml` file with the `elementary` profile. +We can provide [support on Slack](https://join.slack.com/t/elementary-community/shared_invite/zt-1b9vogqmq-y~IRhc2396CbHNBXLsrXcA) or hop on an [onboarding call](https://savvycal.com/MaayanSa/df29881c). diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx new file mode 100644 index 000000000..f52b98cc1 --- /dev/null +++ b/docs/cloud/onboarding/create-profile.mdx @@ -0,0 +1,128 @@ +--- +title: "Create `profiles.yml` file" +sidebarTitle: "Create profiles.yml" +--- + +You will need to provide Elementary cloud a `profiles.yml` file with a connection profile named `elementary`. + +- The profile needs to point at the database and schema name where your elementary tables are. +- The provided credentials need to have read permissions to the elementary schema. + +The easiest way to generate the profile is: +1. Run the following command in the dbt project where elementary dbt package is deployed (works in dbt cloud as well): + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +2. Copy and save the output to a `profiles.yml` file, update the missing details, and you are ready. + +### Permissions and security + +**Elementary cloud doesn't need permissions to your sensitive data.** + +It is recommended to create a read only user for the elementary schema only, and provide it to Elementary Cloud in the profile. +For more details, refer to [security and privacy](/cloud/security-and-privacy). + +### `profiles.yml` examples + +Here is the format of `profiles.yml` for each supported data warehouse: + + + +```yml Snowflake +## SNOWFLAKE ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: snowflake + account: [account id] + + ## User/password auth ## + user: [username] + password: [password] + + role: [user role] + database: [database name] + warehouse: [warehouse name] + schema: [schema name]_elementary + threads: 4 + +``` + +```yml BigQuery +## BIGQUERY ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: bigquery + + ## Service account auth ## + method: service-account + keyfile: empty + + project: [project id] + dataset: [dataset name] # elementary dataset, usually [dataset name]_elementary + threads: 4 + location: [dataset location] + priority: interactive +``` + +```yml Redshift +## REDSHIFT ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: redshift + host: [hostname, like hostname.region.redshift.amazonaws.com] + + ## User/password auth ## + user: [username] + password: [password] + + dbname: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: 4 +``` + +```yml Databricks +## DATABRICKS ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: databricks + host: [hostname, like .cloud.databricks.com] + http_path: [like /sql/1.0/endpoints/] + schema: [schema name] # elementary schema, usually [schema name]_elementary + token: [token] + threads: [number of threads like 8] +``` + +```yml Postgres +## POSTGRES ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: postgres + host: [hostname] + user: [username] + password: [password] + port: [port] + dbname: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] + +``` + + + diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index 2dc452b11..af67fe76f 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -1,5 +1,5 @@ --- -title: "Quickstart: Install Elementary dbt package" +title: "Install Elementary dbt package" sidebarTitle: "Install dbt package" --- diff --git a/docs/mint.json b/docs/mint.json index 9c98b5b42..6622a419a 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -223,9 +223,10 @@ ] }, { - "group": "Onboarding", + "group": "Getting Started", "pages": [ "cloud/onboarding/quickstart-dbt-package", + "create-profile", "cloud/onboarding/signup", "cloud/onboarding/connect-data-warehouse", "cloud/manage-team" From 4f83076a934bc3a8684cf70291f4feac0b4161a7 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:53:20 +0300 Subject: [PATCH 067/194] cloud docs changes --- docs/cloud/onboarding/connect-data-warehouse.mdx | 2 +- docs/mint.json | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 98132d01a..4831821f7 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -1,6 +1,6 @@ --- title: "Connect your data warehouse" -sidebarTitle: "Data warehouse" +sidebarTitle: "Connect data warehouse" --- You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). diff --git a/docs/mint.json b/docs/mint.json index 6622a419a..e8c27c1c1 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -226,7 +226,7 @@ "group": "Getting Started", "pages": [ "cloud/onboarding/quickstart-dbt-package", - "create-profile", + "cloud/onboarding/create-profile", "cloud/onboarding/signup", "cloud/onboarding/connect-data-warehouse", "cloud/manage-team" From 8ef7b64996191485ca0b202a513c0781203bfe60 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:56:33 +0300 Subject: [PATCH 068/194] cloud docs changes --- docs/cloud/onboarding/create-profile.mdx | 4 ++++ docs/cloud/onboarding/quickstart-dbt-package.mdx | 5 +++-- docs/cloud/onboarding/signup.mdx | 2 +- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index f52b98cc1..413aa9372 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -126,3 +126,7 @@ elementary: +### What's next? + +1. [Singup to Elementary cloud](/cloud/sonboarding/signup). +2. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). \ No newline at end of file diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index af67fe76f..9aeeddac2 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -52,5 +52,6 @@ If you see data in these models you completed the package deployment (Congrats! ### What's next? -1. [Singup to Elementary cloud](/cloud/saas-onboarding/signup). -2. [Connect your Elementary schema to Elementary cloud](/cloud/saas-onboarding/connect-data-warehouse). \ No newline at end of file +1. [Create a connection profile](/cloud/onboarding/create-profile). +2. [Singup to Elementary cloud](/cloud/sonboarding/signup). +3. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). \ No newline at end of file diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index 119fb45ab..189ddebbc 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,5 +1,5 @@ --- -title: "Quickstart: Signup and connect" +title: "Signup and login" sidebarTitle: "Signup and login" --- From cf95c8af925ada6420cadda04e2b73580ae0f84a Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:58:46 +0300 Subject: [PATCH 069/194] cloud docs changes --- docs/cloud/manage-team.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 6c999433a..35bd956fe 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -1,6 +1,6 @@ --- -title: "Quickstart: Invite and remove users" -sidebarTitle: "Team settings" +title: "Invite and remove users" +sidebarTitle: "Invite users" --- ### Invite users From b63035f8eef1ecbc92af1b44f40673b22c12fe4b Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 16:00:00 +0300 Subject: [PATCH 070/194] cloud docs changes --- docs/cloud/manage-team.mdx | 2 +- docs/cloud/onboarding/connect-data-warehouse.mdx | 2 +- docs/cloud/onboarding/create-profile.mdx | 2 +- docs/cloud/onboarding/quickstart-dbt-package.mdx | 2 +- docs/cloud/onboarding/signup.mdx | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 35bd956fe..380c6ab7c 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -1,6 +1,6 @@ --- title: "Invite and remove users" -sidebarTitle: "Invite users" +sidebarTitle: "5️⃣ Invite users" --- ### Invite users diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 4831821f7..4e550fc5b 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -1,6 +1,6 @@ --- title: "Connect your data warehouse" -sidebarTitle: "Connect data warehouse" +sidebarTitle: "4️⃣ Connect data warehouse" --- You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index 413aa9372..d1687ef0b 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -1,6 +1,6 @@ --- title: "Create `profiles.yml` file" -sidebarTitle: "Create profiles.yml" +sidebarTitle: "2️⃣ Create profiles.yml" --- You will need to provide Elementary cloud a `profiles.yml` file with a connection profile named `elementary`. diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index 9aeeddac2..3fc71aac1 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -1,6 +1,6 @@ --- title: "Install Elementary dbt package" -sidebarTitle: "Install dbt package" +sidebarTitle: "1️⃣ Install dbt package" --- diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index 189ddebbc..e6326ea01 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,6 +1,6 @@ --- title: "Signup and login" -sidebarTitle: "Signup and login" +sidebarTitle: "3️⃣ Signup and login" --- ### Signup to Elementary cloud From 1cd72459e42aef6d7156f5b03ae668c5242aae66 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sat, 10 Jun 2023 13:59:08 +0300 Subject: [PATCH 071/194] cloud docs changes --- docs/cloud/manage-team.mdx | 2 +- docs/cloud/onboarding/connect-data-warehouse.mdx | 2 +- docs/cloud/onboarding/create-profile.mdx | 2 +- docs/cloud/onboarding/quickstart-dbt-package.mdx | 2 +- docs/cloud/onboarding/signup.mdx | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 380c6ab7c..4213e0966 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -1,6 +1,6 @@ --- title: "Invite and remove users" -sidebarTitle: "5️⃣ Invite users" +sidebarTitle: "5. Invite users" --- ### Invite users diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 4e550fc5b..8ddcfd327 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -1,6 +1,6 @@ --- title: "Connect your data warehouse" -sidebarTitle: "4️⃣ Connect data warehouse" +sidebarTitle: "4. Connect data warehouse" --- You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index d1687ef0b..1d842baa2 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -1,6 +1,6 @@ --- title: "Create `profiles.yml` file" -sidebarTitle: "2️⃣ Create profiles.yml" +sidebarTitle: "2. Create profiles.yml" --- You will need to provide Elementary cloud a `profiles.yml` file with a connection profile named `elementary`. diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index 3fc71aac1..23afdc280 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -1,6 +1,6 @@ --- title: "Install Elementary dbt package" -sidebarTitle: "1️⃣ Install dbt package" +sidebarTitle: "1. Install dbt package" --- diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index e6326ea01..a977beb38 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,6 +1,6 @@ --- title: "Signup and login" -sidebarTitle: "3️⃣ Signup and login" +sidebarTitle: "3. Signup and login" --- ### Signup to Elementary cloud From 2c5a170f1267caa9340b7ef047025ecad9042933 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 11 Jun 2023 13:39:07 +0300 Subject: [PATCH 072/194] Update manage-team.mdx --- docs/cloud/manage-team.mdx | 7 ------- 1 file changed, 7 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 4213e0966..97f9eb3f6 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -14,10 +14,3 @@ Users you invite will recieve an Email saying you invited them, and will need to - - -### Remove users - -On the top left button select `Account settings`, and select the `Team` screen. - -You can remove users by clicking selecting this option under the user options. From e737ccf2859ba24902d406f1c88c1b8675320005 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sun, 11 Jun 2023 16:38:05 +0300 Subject: [PATCH 073/194] new release notes --- docs/mint.json | 2 ++ docs/release-notes/releases/0.7.10.mdx | 43 ++++++++++++++++++++++++++ docs/release-notes/releases/0.8.0.mdx | 32 +++++++++++++++++++ 3 files changed, 77 insertions(+) create mode 100644 docs/release-notes/releases/0.7.10.mdx create mode 100644 docs/release-notes/releases/0.8.0.mdx diff --git a/docs/mint.json b/docs/mint.json index e8c27c1c1..a42bd58d0 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -201,6 +201,8 @@ { "group": "Releases", "pages": [ + "release-notes/releases/0.8.0", + "release-notes/releases/0.7.10", "release-notes/releases/0.7.7", "release-notes/releases/0.7.6", "release-notes/releases/0.7.5", diff --git a/docs/release-notes/releases/0.7.10.mdx b/docs/release-notes/releases/0.7.10.mdx new file mode 100644 index 000000000..5132d5c86 --- /dev/null +++ b/docs/release-notes/releases/0.7.10.mdx @@ -0,0 +1,43 @@ +--- +title: "Elementary 0.7.10" +sidebarTitle: "0.7.10" +--- + +_May 17, 2023: [v0.7.10 Python](https://github.com/elementary-data/elementary/releases/tag/v0.7.10), [v0.7.8 dbt package](https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.7.8)_ + +### 🔥 What's new? + +- **New artifacts model - dbt_columns 🆕** + - Many users requested that we add this model with useful information about the columns in your project. + - We also plan to add features to the report and alerts based on this data. Stay tuned... :coming-soon: + - (This is not yet supported for Databricks, [working on it](https://github.com/elementary-data/elementary/issues/872)) + +- **New lineage filters - Tags and owners #️⃣👥** + - You can now filter the lineage to see only the nodes that are relevant to a business department or a specific owner, and their upstream and downstream nodes. + - We also made some general improvements to the filters usability like adding search and clear all. + +- **Test results sample size in the report can be changed 🔢** + - The report is no longer limited to 5 results, you can change this by adding the var: + - `test_sample_row_count: 10` + +- New flag - `edr --version` 🏁 + - Thank you [@Manul Patel](https://elementary-community.slack.com/team/U054N92MU11) for contributing this  🤩 + + +### 💫 More changes + +- Alerts suppression interval is no longer limited to 24 hours. +- Added indicative exceptions to Elementary tests. +- `time_bucket` can now be configured in model and var levels as well. +- Created workarounds to solve breaking changes in dbt 1.5.0 adapters: +- Not rely on Databricks adapter to create temp tables - Thanks [@Joseph Berni](https://elementary-community.slack.com/team/U03NWMS0Y93), [@fitz](https://elementary-community.slack.com/team/U03V5KGR3U3) and [@Dharit Sura](https://elementary-community.slack.com/team/U047ZFZRDCH) for reporting! +- Run queries from `run` instead of `run-operation` due to bug in Redshift adapter - Thank you [@Eugene Sobolev](https://elementary-community.slack.com/team/U054BV7MR0T) for reporting and investigating with us! + + +### 🐞 Bug fixes +- Support dbt run results compiled sql with % on Redshift new adapter - Thanks [@Fabien Traventhal](https://elementary-community.slack.com/team/U03G693L05R) for reporting! +- Fixed ignored backfill_days in no timestamp tests - Thanks [@leila](https://elementary-community.slack.com/team/U04HFCUM2G6) and [@Roland Baranovic](https://elementary-community.slack.com/team/U04K5SUJS8Z) for reporting! +- Fixed alerts `-group-by` empty value - Thank you [@Dimosthenis Schizas](https://elementary-community.slack.com/team/U054B4PNACE) for contributing 🤩 +- Owners accept dict format - Thank you [@Stephen Lloyd](https://elementary-community.slack.com/team/U03FQELBBV1) for reporting and [@Manul Patel](https://elementary-community.slack.com/team/U054N92MU11) for contributing 🤩 +- Paginate upload of source freshness data for large results - Thank you [@Fred](https://elementary-community.slack.com/team/U03QXQ3VCF8) for reporting and fixing 🤩 +- Thank you [@winzee](https://github.com/winzee) and [@vinooganesh](https://github.com/vinooganesh) for helping keep our docs accurate and typos free 🎉e Melhuish](https://elementary-community.slack.com/team/U04KWBDTP4J)! \ No newline at end of file diff --git a/docs/release-notes/releases/0.8.0.mdx b/docs/release-notes/releases/0.8.0.mdx new file mode 100644 index 000000000..e80b19e18 --- /dev/null +++ b/docs/release-notes/releases/0.8.0.mdx @@ -0,0 +1,32 @@ +--- +title: "Elementary 0.8.0" +sidebarTitle: "0.8.0" +--- + +_June 1, 2023: [v0.8.0 Python](https://github.com/elementary-data/elementary/releases/tag/v0.8.0), [v0.8.0 dbt package](https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.8.0)_ +_As this is a minor version bump, you need to run `dbt run -s elementary`_ + +### 🔥 What's new? + +- **🆕 Jobs info from Orchestrator 🆕** + - Elementary now supports collecting metadata about your jobs from your orchestration tool! + - The goal is to provide context that is useful to triage and resolve data issues: + - As a first step, you could filter the lineage by job in the Elementary report. + - More orchestrator related features are coming soon 😎 + - Here is the [guide for enabling jobs info collection.](https://docs.elementary-data.com/deployment-and-configuration/collect-job-data). + +- **You can now configure all test params in the project / model / test level 🤯** + - Why is it useful? + - It enables you to tailor the tests to the dataset and get higher level of accuracy! + - You can leverage inheritance, configure at a higher level (like folder of models) and save the need to configure by test. + - Some examples: + - You can set `days_back: 90` to tests with `time_bucket: period: week`, and `days_back: 7` to tests with `time_bucket: period: hour` . + - You can set `timestamp_column: updated_at` in your dbt_project.yml if this is your convention, and override it for models where it's different. + - You can set `seasonality`, `time_bucket` and `timestamp_column` at the source level, and it will apply for all the tests you add to tables of this source. + - We also upgraded our documentation of the [tests configuration](https://docs.elementary-data.com/guides/elementary-tests-configuration) and [how the tests work](https://docs.elementary-data.com/guides/how-anomaly-detection-works), to make it clearer 😇 + + +### 💫 More changes + +- Added `materialization` field to models run results, thank you [@Aril Mavinkere](https://elementary-community.slack.com/team/U058SJFFTEU) for contributing! 🤩 +- Removed `env` from report summary. \ No newline at end of file From 409f5ab538d38c1ec663d4d6ab188f9bcbfbccd9 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:24:38 +0300 Subject: [PATCH 074/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 4bac22a9f..3e0c1e4a6 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,7 +18,8 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests - +#### Volume anomalies + ``` elementary.volume_anomalies ``` From 550731c7d47f1420a50acb139011f865cfc9aa14 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:25:24 +0300 Subject: [PATCH 075/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 3e0c1e4a6..33ebc5725 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,7 +18,7 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests -#### Volume anomalies +#### Volume anomalies ``` elementary.volume_anomalies From 1ca9cd2c347f392b127c6b410c6646a773e04fc3 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:26:08 +0300 Subject: [PATCH 076/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 33ebc5725..e033509ed 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,7 +18,7 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests -#### Volume anomalies +#### - Volume anomalies ``` elementary.volume_anomalies From 8f0452cb51e891765c63e0c2cb7cd6c696a3ed84 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:28:08 +0300 Subject: [PATCH 077/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index e033509ed..0d76fab76 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,6 +18,7 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests + #### - Volume anomalies ``` @@ -26,8 +27,8 @@ The tests are configured and executed like any other tests in your project. Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). - - +#### - Freshness anomalies + ``` elementary.freshness_anomalies ``` @@ -35,7 +36,8 @@ The tests are configured and executed like any other tests in your project. Requires a [`timestamp_column`](/guides/anomaly-detection-configuration/timestamp-column) configuration. - +#### - Event freshness anomalies + ``` elementary.event_freshness_anomalies ``` @@ -44,7 +46,8 @@ The tests are configured and executed like any other tests in your project. database (the `update timestamp`). Configuring `event_timestamp_column` is required, and `update_timestamp_column` is optional. - +#### - Dimension anomalies + ``` elementary.dimension_anomalies ``` @@ -53,7 +56,8 @@ The tests are configured and executed like any other tests in your project. The test counts rows grouped by given `dimensions` (columns/expressions). - +#### - All columns anomalies + ``` elementary.all_columns_anomalies ``` @@ -65,7 +69,9 @@ The tests are configured and executed like any other tests in your project. ### Column tests - + +#### - Columns anomalies + ``` elementary.column_anomalies ``` @@ -75,7 +81,7 @@ The tests are configured and executed like any other tests in your project. -#### Adding tests examples: +### Adding tests examples From 3247847f2f79f39492aebce9721ee32f4c7a7c97 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:28:42 +0300 Subject: [PATCH 078/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 0d76fab76..5450eca19 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -14,8 +14,6 @@ The tests are configured and executed like any other tests in your project. Demo -## Available anomaly detection tests - ### Table (model / source) tests From 2365e27f7ebb157bd6b68e3c01e390d1e012570e Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 18:02:04 +0300 Subject: [PATCH 079/194] Update where-expression.mdx --- .../anomaly-detection-configuration/where-expression.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/guides/anomaly-detection-configuration/where-expression.mdx b/docs/guides/anomaly-detection-configuration/where-expression.mdx index 7413e4482..8a70b8995 100644 --- a/docs/guides/anomaly-detection-configuration/where-expression.mdx +++ b/docs/guides/anomaly-detection-configuration/where-expression.mdx @@ -28,10 +28,10 @@ models: where_expression: "loaded_at is not null" ``` -```yml dbt_project.yml +```yml dbt_project.yml vars: - timestamp_column: "loaded_at > '2022-01-01'" + where_expression: "loaded_at > '2022-01-01'" ``` - \ No newline at end of file + From 4c30fac71077e96e6f06b2e3a2684bb03c0c1db2 Mon Sep 17 00:00:00 2001 From: Alex Alves Date: Wed, 14 Jun 2023 10:05:17 +0200 Subject: [PATCH 080/194] Update how-anomaly-detection-works.mdx Link to guides/data-anomaly-detection was not working --- docs/guides/how-anomaly-detection-works.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/how-anomaly-detection-works.mdx b/docs/guides/how-anomaly-detection-works.mdx index eff6bc2be..aa492a2de 100644 --- a/docs/guides/how-anomaly-detection-works.mdx +++ b/docs/guides/how-anomaly-detection-works.mdx @@ -54,7 +54,7 @@ To calculate how data changes over time and detect issues, we split the data int For example, if we use daily time bucket and monitor for row count anomalies, we will count new rows per day. ### Detection algorithm -Read about it in [data anomaly detection](/guides/data_anomaly_detection). +Read about it in [data anomaly detection](/guides/data-anomaly-detection). @@ -85,4 +85,4 @@ Configuration params related directly to the test's core concepts: **Monitored data set** - [where_expression](/guides/anomaly-detection-configuration/where-expression) -- [dimensions](/guides/anomaly-detection-configuration/dimensions) \ No newline at end of file +- [dimensions](/guides/anomaly-detection-configuration/dimensions) From 22bf1081ab32080dd39958b51b670d3b6dbdaf03 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 15:51:51 +0300 Subject: [PATCH 081/194] new release notes --- docs/guides/alerts-configuration.mdx | 489 ++++++++++++++++++++ docs/guides/data-anomaly-detection.mdx | 3 +- docs/guides/how-anomaly-detection-works.mdx | 2 +- docs/mint.json | 8 +- docs/quickstart/send-slack-alerts.mdx | 450 ++---------------- 5 files changed, 527 insertions(+), 425 deletions(-) create mode 100644 docs/guides/alerts-configuration.mdx diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx new file mode 100644 index 000000000..529ebdba5 --- /dev/null +++ b/docs/guides/alerts-configuration.mdx @@ -0,0 +1,489 @@ +--- +title: "Alerts Configuration and Customization" +sidebarTitle: "Alerts configuration" +--- + +You can enrich your alerts by adding properties to tests and models in your `.yml` files. +The supported attributes are: description, tags, owner, subscribers. + +You can configure and customize your alerts by configuring: +custom channel, alert fields, alert grouping, alert filters, suppression interval. + + +## Alert properties in `.yml` files + +Elementary prioritizes configuration in the following order: + +**For models / sources:** +1. Model config block. +2. Model properties. +3. Model path configuration under `models` key in `dbt_project.yml`. + +**For tests:** +1. Test properties. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Parent model configuration. + +
+ 
+  meta:
+       owner: "@jessica.jones"
+       subscribers: ["@jessica.jones", "@joe.joseph"]
+       tags: ["#marketing", "#data_ops"]
+       channel: data_ops
+       description: "This is the test description"
+       alert_suppression_interval: 24
+       alert_fields: ["description", "owners", "tags", "subscribers"]
+       slack_group_alerts_by: table
+ 
+
+ + +### Alert content + +#### Owner + +Elementary enriches alerts with [owners for models or tests](https://docs.getdbt.com/reference/resource-configs/meta#designate-a-model-owner)). +- If you want the owner to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). +- You can configure a single owner or a list of owners (`["@jessica.jones", "@joe.joseph"]`). + + + +```yml model +models: + - name: my_model_name + meta: + owner: "@jessica.jones" +``` + +```yml test +tests: + - not_null: + meta: + owner: ["@jessica.jones", "@joe.joseph"] +``` + +```yml test/model config block +{{ config( + tags=["Tag1","Tag2"] + meta={ + "description": "This is a description", + "owner": "@jessica.jones" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +owner: "@jessica.jones" + +tests: + path: + subfolder: + +meta: + +owner: "@jessica.jones" +``` + + + +#### Subscribers + +If you want additional users besides the owner to be tagged on an alert, add them as subscribers. +- If you want the subscriber to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). +- You can configure a single subscriber or a list (`["@jessica.jones", "@joe.joseph"]`). + + + +```yml model +models: + - name: my_model_name + meta: + subscribers: "@jessica.jones" +``` + +```yml test +tests: + - not_null: + meta: + subscribers: ["@jessica.jones", "@joe.joseph"] +``` + +```yml test/model config block +{{ config( + meta={ + "subscribers": "@jessica.jones" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +subscribers: "@jessica.jones" + +tests: + path: + subfolder: + +meta: + +subscribers: "@jessica.jones" +``` + + + + +#### Test description + +Elementary supports configuring description for tests that are included in alerts. +It's recommended to add an explanation of what does it mean if this test fails, so alert will include this context. + + + +```yml test +tests: + - not_null: + meta: + description: "This is the test description" +``` + +```yml test config block +{{ config( + tags=["Tag1","Tag2"] + meta={ + description: "This is the test description" + } +) }} +``` + +```yml dbt_project.yml +tests: + path: + subfolder: + +meta: + +description: "This is the test description" +``` + + + +#### Tags + +You can use [tags](https://docs.getdbt.com/reference/resource-configs/tags) to provide context to your alerts. + +- You can tag a group or a channel in a slack alert by adding `#channel_name` as a tag. +- Tags are aggregated,so a test alert will include both the test and the parent model tags. + + + +```yml model +models: + - name: my_model_name + tags: ["#marketing", "#data_ops"] +``` + +```yml test +tests: + - not_null: + tags: ["#marketing", "#data_ops"] +``` + +```yml test/model config block +{{ config( + tags=["#marketing", "#data_ops"] + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +tags: ["#marketing", "#data_ops"] + +tests: + path: + subfolder: + +tags: ["#marketing", "#data_ops"] +``` + + + + +### Alerts distribution + +Elementary allows you to customize alerts to distribute the right information to the right people. +This way you can ensure your alerts are valuable and avoid alert fatigue. + +#### Custom channel + +Elementary supports configuring custom Slack channels for models and tests. +By default, Elementary uses the Slack channel that was configured in the Slack integration. + + + +```yml model +models: + - name: my_model_name + meta: + channel: data_ops +``` + +```yml test +tests: + - not_null: + meta: + channel: data_ops +``` + +```yml test/model config block +{{ config( + meta={ + "channel": "data_ops" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +channel: data_ops + +tests: + path: + subfolder: + +meta: + +channel: data_ops +``` + + + +#### Suppression interval + +Don’t want to get multiple alerts if the same test keeps failing? +You can now configure an `alert_suppression_interval`, this is a “snooze” period for alerts on the same issue. + +The accepted value is in hours, so 1 day snooze is `alert_suppression_interval: 24`. +Elementary won't send new alerts on the same issue that are generated within suppression interval. + + + +```yml model +models: + - name: my_model_name + meta: + alert_suppression_interval: 24 +``` + +```yml test +tests: + - not_null: + meta: + alert_suppression_interval: 12 +``` + +```yml test/model config block +{{ config( + meta={ + "alert_suppression_interval": 24 + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +alert_suppression_interval: 24 + +tests: + path: + subfolder: + +meta: + +alert_suppression_interval: 48 +``` + + + +#### Group alerts by table + +By default, Elementary sends a single alert to notify on each failure with extensive information for fast triage. + +Elementary also supports grouping alerts by table. +In this case, a single Slack notification will be generated containing all issues associated with this table. +The created notification will contain a union of the relevant owners, tags and subscribers. + +Due to their nature, grouped alerts will contain less information on each issue. + + + + +```yml model +models: + - name: my_model_name + meta: + slack_group_alerts_by: table +``` + +```yml test +tests: + - not_null: + meta: + slack_group_alerts_by: table +``` + +```yml test/model config block +{{ config( + meta={ + "slack_group_alerts_by": "table" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +slack_group_alerts_by: table + +tests: + path: + subfolder: + +meta: + +slack_group_alerts_by: table +``` + + + + +#### Alert fields + + +**Currently this feature is supported only by test alerts!** + + +You can decide which fields to include in the alert, and create a format of alert that fits your use case and recipients. +By default, all the fields are included in the alert. + +Supported alert fields: + +- table: Displays the table name of the test +- column: Displays the column name of the test +- description: Displays the description of the test +- owners: Displays the owners of the model on which the test is running +- tags: Displays the dbt tags of the test/model +- subscribers: Displays the subscribers of the test/model +- result_message: Displays the returned message from the test result +- test_parameters: Displays the parameters that were provided to the test +- test_query: Displays the query of the test +- test_results_sample: Displays a sample of the test results + + + +```yml model +models: + - name: my_model_name + meta: + alert_fields: ["description", "owners", "tags", "subscribers"] +``` + +```yml test +tests: + - not_null: + meta: + alert_fields: ["description", "owners", "tags", "subscribers"] +``` + +```yml test/model config block +{{ config( + meta={ + "alert_fields": "['description', 'owners', 'tags', 'subscribers']" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +alert_fields: ["description", "owners", "tags", "subscribers"] + +tests: + path: + subfolder: + +meta: + +alert_fields: ["description", "owners", "tags", "subscribers"] +``` + + + +## Alerts global configuration + +#### Enable/disable alerts + +You can choose to enable / disable alert types by adding a var to your `dbt_project.yml`. + +Below are the available vars and their default config: + +```yml dbt_project.yml +vars: + disable_model_alerts: false + disable_test_alerts: false + disable_warn_alerts: false + disable_skipped_model_alerts: true + disable_skipped_test_alerts: true +``` + +## Alerts CLI flags + +#### Filter alerts + +Elementary supports filtering alerts using a selector, and sending only the selected alerts. +You can filter the alerts by tag, owner or model. + +If you run `edr` from the dbt project directory (or pass `--project-dir`), you can use any of the dbt selectors. + + + +```shell tag filter +edr monitor --select tag:critical +edr monitor --select tag:finance +``` + +```shell owner filter +edr monitor --select config.meta.owner:@jeff +edr monitor --select config.meta.owner:@jessy +``` + +```shell model filter +edr monitor --select model:customers +edr monitor --select model:orders + +edr monitor --select customers +edr monitor --select orders +``` + + + + +#### Group alerts by table + +By default, Elementary sends a single alert to notify on each failure with extensive information for fast triage. + +Elementary also supports grouping alerts by table. +In this case, a single Slack notification will be generated containing all issues associated with this table. +The created notification will contain a union of the relevant owners, tags and subscribers. + +Due to their nature, grouped alerts will contain less information on each issue. + +```shell +edr monitor --group-by table +``` + diff --git a/docs/guides/data-anomaly-detection.mdx b/docs/guides/data-anomaly-detection.mdx index 796c31e95..795996025 100644 --- a/docs/guides/data-anomaly-detection.mdx +++ b/docs/guides/data-anomaly-detection.mdx @@ -1,5 +1,6 @@ --- -title: "Data anomaly detection" +title: "Data anomaly detection method" +sidebarTitle: "Detection method" --- Elementary uses "[standard score](https://en.wikipedia.org/wiki/Standard_score)", also known as "Z-score" for anomaly detection. This score represents the number of standard deviations of a value from the average of a set of values. diff --git a/docs/guides/how-anomaly-detection-works.mdx b/docs/guides/how-anomaly-detection-works.mdx index aa492a2de..04b16330b 100644 --- a/docs/guides/how-anomaly-detection-works.mdx +++ b/docs/guides/how-anomaly-detection-works.mdx @@ -1,6 +1,6 @@ --- title: "Elementary anomaly detection tests" -sidebarTitle: "Core concepts" +sidebarTitle: "Data anomaly detection" --- Elementary dbt package includes **anomaly detection tests, implemented as [dbt tests](https://docs.getdbt.com/docs/building-a-dbt-project/tests)**. diff --git a/docs/mint.json b/docs/mint.json index a42bd58d0..1171f3e4e 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -95,7 +95,13 @@ "guides/share-observability-report/send-report-summary" ] }, - "quickstart/send-slack-alerts", + { + "group": "Send Slack alerts", + "pages": [ + "quickstart/send-slack-alerts", + "guides/alerts-configuration" + ] + }, "guides/add-elementary-tests", "guides/add-schema-tests", "guides/python-tests" diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 436d5b935..e0226e07f 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -1,50 +1,45 @@ --- -title: "Send Slack alerts" +title: "Setup Slack alerts" --- -Elementary has a Slack integration to send alerts about failures of dbt tests, Elementary tests, model runs, and source freshness. +Elementary has a Slack integration to send alerts about: +- Failures and/or of dbt tests +- Failures and/or Elementary tests +- Model runs failures +- Source freshness issues -You can customize the alerts in your `.yml` files by configuring: +You can enrich your alerts by adding properties to tests and models in your `.yml` files. +The supported attributes are: description, tags, owner, subscribers. -- **Description** -- **Tags** -- **Owner** -- **Subscribers** -- **Custom channel** -- **Alert fields** -- **Alert filters** -- **Alert grouping** -- **Suppression interval** +You can configure and customize your alerts by configuring: +custom channel, alert fields, alert grouping, alert filters, suppression interval. -New Slack alert format + +
+ New Slack alert format +
+ -## Before you start -Before you can start using the alerts, make sure to install the dbt package, configure a profile and install the CLI. -This is **required for the alerts to work.** - - - - - - - - +## Setup Slack Integration - + - +**Before you start** - +Before you can start using the alerts, make sure to install the dbt package, configure a profile and install the CLI. +This is **required for the alerts to work.** - +1. A working Python installation +2. [pip installer](https://pip.pypa.io/en/stable/) for Python +3. Access and credentials to a data warehouse supported by Elementary - +We also recommend you work with a [Python virtual environment](https://docs.python.org/3/library/venv.html). -## Setup Slack Integration + @@ -61,395 +56,6 @@ Or just `edr monitor` if you used `config.yml`. --- -## Enable/disable alerts - -By default, alerts are sent on failed tests, errored models and errored snapshots. -You can choose to enable / disable alert types by adding a var to your `dbt_project.yml`. - -Below are the available vars and their default config: - -```yml dbt_project.yml -vars: - # Alerts configuration vars # - # All set to false by default # - disable_model_alerts: false - disable_test_alerts: false - disable_warn_alerts: false - disable_skipped_model_alerts: true - disable_skipped_test_alerts: true -``` - - -## Alert properties - -In your `.yml` files, add the following properties to models / tests: - - - - - -Elementary enriches alerts with [table owners](https://docs.getdbt.com/reference/resource-configs/meta#designate-a-model-owner)). - -If you want to tag a model owner in a slack alert: -- Use '@' and the email prefix of the slack user. -- For example, if we want to tag a user named Jessica with an email jessica.jones@marvel.com in our Slack workspace, simply add the email prefix (with lower case) jessica.jones as follows to your model schema.yml / properties.yml: - -```yml properties.yml -models: - - name: my_model_name - meta: - owner: "@jessica.jones" -``` - -It is possible to tag multiple owners as well: - -```yml properties.yml -models: - - name: my_model_name - meta: - owner: ["@jessica.jones", "@joe.joseph"] -``` - - - - - -Elementary supports configuring description to tests alerts. - -To set it up, simply add the description to your test in the `properties.yml` - -```yml properties.yml -tests: - - test_name: - meta: - description: "This is the test description" -``` - - - - - - -You can use [tags](https://docs.getdbt.com/reference/resource-configs/tags) to provide context to your alerts. - -You can also use it to tag a group or a channel in a slack alert: - -- Add it as model tag and use '#' as the prefix of the channel name. -- For example, to tag the marketing team's data ops channel add the following to your `model schema.yml` - / `properties.yml`. - -```yml properties.yml -tests: - - test_name: - meta: - tags: ["#marketing", "#support"] -``` - - - - - - If you want to tag users on an alert: -- Use '@' and the email prefix of your slack user, and to 'subscribers' under a meta field to your `properties.yml` file. -- For example, if we want to tag a user named Jessica with an email jessica.jones@marvel.com in our Slack workspace, use "@jessica.jones". - -```yml properties.yml -models: - - name: my_model_name - meta: - alerts_config: - subscribers: "@jessica.jones" - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - subscribers: "@luke.cage" -``` - -It is possible to tag multiple subscribers as well: - -```yml properties.yml -models: - - name: my_model_name - meta: - alerts_config: - subscribers: ["@jessica.jones", "@luke.cage"] -``` - - - - - - -## Alert configuration - -Elementary allows you to customize alerts to distribute the right information to the right people. This way you can ensure your alerts are valuable and to avoid alert fatigue. - - - - - - -By default Elementary uses the Slack channel that was configured in the Slack integration. -Elementary supports configuring custom slack channels that are configured on your models / sources / tests and snapshots. - -- If you configure a custom slack channel for a model, all the test alerts that belong to this model will be sent to this custom slack channel. -- If you configure a custom slack channel for both a model and a test, the test channel will override the model channel. -- If you configure a custom slack channel and you decide to group your alerts by table into a single message, it will be sent to the model channel (even if a differnt channel was configured on the test level) - - -To set it up, simply add the relevant channel to your models in the `properties.yml`: - -```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - channel: marketing_data_ops -``` - -If your models / tests are in folders by department / team, another useful option is to configure the channel in -your `dbt_project.yml` file: - -```yml dbt_project.yml -models: - marketing_bi: - +meta: - alerts_config: - channel: marketing_data_ops - -tests: - marketing_bi: - +meta: - alerts_config: - channel: marketing_data_ops -``` - -You can also configure a custom slack channel for a specific test: - -```yml properties.yml -models: - - name: marketing_leads - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - channel: marketing_data_ops -``` - - - - - -**Currently this feature is supported only by test alerts!** - - -Elementary supports the following alert fields: - -- table: Displays the table name of the test -- column: Displays the column name of the test -- description: Displays the description of the test -- owners: Displays the owners of the model on which the test is running -- tags: Displays the dbt tags of the test/model -- subscribers: Displays the subscribers of the test/model -- result_message: Displays the returned message from the test result -- test_parameters: Displays the parameters that were provided to the test -- test_query: Displays the query of the test -- test_results_sample: Displays a sample of the test results - -By default, all of the fields are shown in the alerts. -Elementary supports configuring alert fields on your dbt project / models and tests. -- If you configure alert fields on your dbt project, all the test alerts of all of your tests will display only the configured alert fields. -- If you configure alert fields for a model, all the test alerts that belong to this model will display only the configured alert fields. -- If you configure alert fields for both a model and a test, the test configured alert fields will override the model configured alert fields (same as for the dbt project configured alert fields). - -To set it up globaly for your project, add the desired alert fields to your models and tests in the `dbt_project.yml` file: - -```yml dbt_project.yml -models: - marketing_leads: - +meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] - -tests: - marketing_leads: - +meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] -``` - -To set it up for a model, add the desired alert fields to your model in the properties.yml: - -```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] -``` - -You can also configure alert fields for a specific test: - -```yml properties.yml -models: - - name: marketing_leads - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] -``` - - - - - - - -Elementary supports filtering alerts using a selector. -Elementry `edr monitor` command will notify only on the selector's matched alerts. - -There are 3 selectors supported by elementary: - -- tag - Notify on models/sources/tests that are tagged with the provided tag selector (notice that tests can be matched on their model's/source's tag). -- owner - Notify on models/sources/tests that their owner is provided owner selector (notice that tests can be matched on their model's/source's owner). -- model - Notify on the model/source and its tests. - -To filter alerts by tag: - -```shell -edr monitor --select tag:critical -edr monitor --select tag:finance -``` - -To filter alerts by owner: - -```shell -edr monitor --select config.meta.owner:@jeff -edr monitor --select config.meta.owner:@jessy -``` - -To filter alerts by model: - -```shell -edr monitor --select model:customers -edr monitor --select model:orders - -edr monitor --select customers -edr monitor --select orders -``` - - - - - -Elementary support configuring suppression interval for alerts. -By default, the suppression interval for all of the alerts is set to 0. -Elementary won't send any alert that is generated within suppression interval. - -`alert_suppression_interval` can accept values greater than 0, including unrounded numbers - this number represents the number of hours for which alerts will be skipped. - -To set it up globaly for your project, add the alert suppression interval to your models and tests in the `dbt_project.yml` file: - -```yml dbt_project.yml -models: - marketing_leads: - +meta: - alerts_config: - alert_suppression_interval: 24 - - -tests: - marketing_leads: - +meta: - alerts_config: - alert_suppression_interval: 24 -``` - -To set it up for a model, add the desired alert suppression interval to your model in the properties.yml: - -```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - alert_suppression_interval: 24 -``` - -You can also configure alert suppression interval for a specific test: - -```yml properties.yml -models: - - name: marketing_leads - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - alert_suppression_interval: 24 -``` - - - - -By default, Elementary sends a single alert to notify on each failure. When using single alerts, the alert will include extensive information for fast triage. - -Elementary also supports grouping alerts by table. In this case, a single Slack notification will be generated containing all test warnings/failures/errors as well as the errors associated with the model. The created notification will contain a union of the relevant owners, tags and subscribers. Due to their nature, grouped alerts will contain less information on each issue. As always, you can use our ([detailed report](/quickstart/generate-report-ui)) for easy triage. - -To group alerts by table: - -```shell -edr monitor --group-by table -``` - -Grouping can also be configured through the yml files. To set it up globaly for your project, add the configuration to your models in the dbt_project.yml file: - - ```yml dbt_project.yml -models: - marketing_bi: - +meta: - alerts_config: - # alerts on models in marekting_bi should be grouped by table: - slack_group_alerts_by: table - -``` - -To set it up for a model, add the configuration to your model in the properties.yml: - - ```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - # all alerts on marketing_leads should group together to one slack message: - slack_group_alerts_by: table -``` - -Grouping by table can be configured globally (in the dbt_project.yml) but if you wish to override it for a specific model where you want a single alert for each failure, you can add the configuration to your model in the properties.yml: - - ```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - # alerts on marketing_leads will not be grouped: - slack_group_alerts_by: alert -``` - - - - - - - ## Alert on source freshness failures _Not supported in dbt cloud_ From 03b3d2e4ae174cc670eba054d808b9b5d3c2f9ff Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 15:53:58 +0300 Subject: [PATCH 082/194] new release notes --- docs/guides/alerts-configuration.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx index 529ebdba5..8cc8f8783 100644 --- a/docs/guides/alerts-configuration.mdx +++ b/docs/guides/alerts-configuration.mdx @@ -26,7 +26,7 @@ Elementary prioritizes configuration in the following order:
  
-  meta:
+        meta:
        owner: "@jessica.jones"
        subscribers: ["@jessica.jones", "@joe.joseph"]
        tags: ["#marketing", "#data_ops"]

From b2492b436a0119c55babb85f5379a02036debe7c Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Wed, 14 Jun 2023 15:58:26 +0300
Subject: [PATCH 083/194] new release notes

---
 docs/guides/alerts-configuration.mdx  | 12 ++++++------
 docs/quickstart/send-slack-alerts.mdx |  8 ++++----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx
index 8cc8f8783..9930ccd47 100644
--- a/docs/guides/alerts-configuration.mdx
+++ b/docs/guides/alerts-configuration.mdx
@@ -4,10 +4,10 @@ sidebarTitle: "Alerts configuration"
 ---
 
 You can enrich your alerts by adding properties to tests and models in your `.yml` files.
-The supported attributes are: description, tags, owner, subscribers.
+The supported attributes are: [owner](/guides/alerts-configuration#owner), [subscribers](/guides/alerts-configuration#subscribers), [description](/guides/alerts-configuration#test-description), [tags](/guides/alerts-configuration#tags).
 
 You can configure and customize your alerts by configuring:
-custom channel, alert fields, alert grouping, alert filters, suppression interval.
+[custom channel](/guides/alerts-configuration#custom-channel), [suppression interval](/guides/alerts-configuration#suppression_interval), [alert fields](/guides/alerts-configuration#alert_fields), [alert grouping](/guides/alerts-configuration#group-alerts-by-table), [alert filters](/guides/alerts-configuration#filter-alerts).
 
 
 ## Alert properties in `.yml` files
@@ -29,12 +29,12 @@ Elementary prioritizes configuration in the following order:
         meta:
        owner: "@jessica.jones"
        subscribers: ["@jessica.jones", "@joe.joseph"]
+       description: "This is the test description"
        tags: ["#marketing", "#data_ops"]
-       channel: data_ops
-       description: "This is the test description"
-       alert_suppression_interval: 24
+       channel: data_ops
+       alert_suppression_interval: 24
+       slack_group_alerts_by: table
        alert_fields: ["description", "owners", "tags", "subscribers"]
-       slack_group_alerts_by: table
  
 
diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index e0226e07f..6a1518d42 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -8,11 +8,11 @@ Elementary has a Slack integration to send alerts about: - Model runs failures - Source freshness issues -You can enrich your alerts by adding properties to tests and models in your `.yml` files. -The supported attributes are: description, tags, owner, subscribers. +You can enrich your alerts by adding properties to tests and models in your `.yml` files. +The supported attributes are: [owner](/guides/alerts-configuration#owner), [subscribers](/guides/alerts-configuration#subscribers), [description](/guides/alerts-configuration#test-description), [tags](/guides/alerts-configuration#tags). -You can configure and customize your alerts by configuring: -custom channel, alert fields, alert grouping, alert filters, suppression interval. +You can configure and customize your alerts by configuring: +[custom channel](/guides/alerts-configuration#custom-channel), [suppression interval](/guides/alerts-configuration#suppression_interval), [alert fields](/guides/alerts-configuration#alert_fields), [alert grouping](/guides/alerts-configuration#group-alerts-by-table), [alert filters](/guides/alerts-configuration#filter-alerts).
From 6fc40773cb1cf7e538674ac5978a3c760a73e351 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 15:59:38 +0300 Subject: [PATCH 084/194] new release notes --- docs/guides/alerts-configuration.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx index 9930ccd47..9ff47dc1d 100644 --- a/docs/guides/alerts-configuration.mdx +++ b/docs/guides/alerts-configuration.mdx @@ -26,7 +26,7 @@ Elementary prioritizes configuration in the following order:
  
-        meta:
+     meta:
        owner: "@jessica.jones"
        subscribers: ["@jessica.jones", "@joe.joseph"]
        description: "This is the test description"

From 773b1f49ff86cacb537bc6e32c370f8fc1c4275b Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Wed, 14 Jun 2023 16:00:14 +0300
Subject: [PATCH 085/194] new release notes

---
 docs/guides/alerts-configuration.mdx | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx
index 9ff47dc1d..18a454d00 100644
--- a/docs/guides/alerts-configuration.mdx
+++ b/docs/guides/alerts-configuration.mdx
@@ -26,15 +26,15 @@ Elementary prioritizes configuration in the following order:
 
 
  
-     meta:
-       owner: "@jessica.jones"
-       subscribers: ["@jessica.jones", "@joe.joseph"]
-       description: "This is the test description"
-       tags: ["#marketing", "#data_ops"]
-       channel: data_ops
-       alert_suppression_interval: 24
-       slack_group_alerts_by: table
-       alert_fields: ["description", "owners", "tags", "subscribers"]
+   meta:
+     owner: "@jessica.jones"
+     subscribers: ["@jessica.jones", "@joe.joseph"]
+     description: "This is the test description"
+     tags: ["#marketing", "#data_ops"]
+     channel: data_ops
+     alert_suppression_interval: 24
+     slack_group_alerts_by: table
+     alert_fields: ["description", "owners", "tags", "subscribers"]
  
 
From 093e3168bb1b5dc30c40cff658b05a96e05ff091 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 17:13:39 +0300 Subject: [PATCH 086/194] new release notes --- docs/quickstart/send-slack-alerts.mdx | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 6a1518d42..6d1a2c3f8 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -30,15 +30,9 @@ You can configure and customize your alerts by configuring: **Before you start** -Before you can start using the alerts, make sure to install the dbt package, configure a profile and install the CLI. +Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** -1. A working Python installation -2. [pip installer](https://pip.pypa.io/en/stable/) for Python -3. Access and credentials to a data warehouse supported by Elementary - -We also recommend you work with a [Python virtual environment](https://docs.python.org/3/library/venv.html). - From cb263085ef274940d85e07f01a6d7d054434d2a0 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Wed, 14 Jun 2023 17:16:08 +0300 Subject: [PATCH 087/194] Update send-report-summary.mdx --- docs/guides/share-observability-report/send-report-summary.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/share-observability-report/send-report-summary.mdx b/docs/guides/share-observability-report/send-report-summary.mdx index da5fb59d2..c12be8f59 100644 --- a/docs/guides/share-observability-report/send-report-summary.mdx +++ b/docs/guides/share-observability-report/send-report-summary.mdx @@ -21,7 +21,7 @@ After you [set up a Slack app and token](/integrations/slack#slack-integration-s AWS S3: ```shell -edr send-report --aws-profile-name --s3-bucket-name --slack-token --slack-channel-name +edr send-report --aws-profile-name --s3-bucket-name --slack-token --slack-channel-name --update-bucket-website true ``` GCS: From fed3bf91cbb59671d8426a4941ceaa2141ef9667 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 17:16:45 +0300 Subject: [PATCH 088/194] new release notes --- docs/quickstart/send-slack-alerts.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 6d1a2c3f8..8ce5b8680 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -32,7 +32,7 @@ You can configure and customize your alerts by configuring: Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** - +
From 0da61c6658980e2b22aa8d3b4f7dc14ce155b526 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Wed, 14 Jun 2023 22:00:18 +0300 Subject: [PATCH 089/194] Update send-slack-alerts.mdx --- docs/quickstart/send-slack-alerts.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 8ce5b8680..7dbe7f713 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -32,7 +32,8 @@ You can configure and customize your alerts by configuring: Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** -
+ + From cc3ad2c17abc3578f4f405ea86e32b91f225f67f Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Wed, 14 Jun 2023 22:23:50 +0300 Subject: [PATCH 090/194] Update send-slack-alerts.mdx --- docs/quickstart/send-slack-alerts.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 7dbe7f713..c75a6f1e1 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -32,8 +32,7 @@ You can configure and customize your alerts by configuring: Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** - - +
From 9518a41456a9cee5062ca71cd4d0daa929aa08e8 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 13:12:12 +0300 Subject: [PATCH 091/194] Update create-profile.mdx --- docs/cloud/onboarding/create-profile.mdx | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index 1d842baa2..b7a7cb6f6 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -43,7 +43,8 @@ elementary: ## User/password auth ## user: [username] password: [password] - + + port: 5439 role: [user role] database: [database name] warehouse: [warehouse name] @@ -129,4 +130,4 @@ elementary: ### What's next? 1. [Singup to Elementary cloud](/cloud/sonboarding/signup). -2. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). \ No newline at end of file +2. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). From 65f2ff4d951a7862442178ae136cff988028b111 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 13:12:30 +0300 Subject: [PATCH 092/194] Update redshift-profile.mdx --- docs/_snippets/profiles/redshift-profile.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/_snippets/profiles/redshift-profile.mdx b/docs/_snippets/profiles/redshift-profile.mdx index be90541e1..ccd08aecd 100644 --- a/docs/_snippets/profiles/redshift-profile.mdx +++ b/docs/_snippets/profiles/redshift-profile.mdx @@ -14,6 +14,7 @@ elementary: user: [username] password: [password] + port: 5439 dbname: [database name] schema: [schema name]_elementary threads: 4 From 912f221716ddb161c3b8526130c624fb6f27b2ca Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:34:58 +0300 Subject: [PATCH 093/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index 8eeecdfd3..6c1013112 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -45,9 +45,14 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro ## Compliance + + **SOC 2** + Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance. + + [Contact us](mailto:legal@elementary-data.com) for auditing reports and penetration testing results. ## Have more questions? We would be happy to answer! -Reach out to us on [email](mailto:legal@elementary-data.com) or [Slack](https://join.slack.com/t/elementary-community/shared_invite/zt-1b9vogqmq-y~IRhc2396CbHNBXLsrXcA). \ No newline at end of file +Reach out to us on [email](mailto:legal@elementary-data.com) or [Slack](https://join.slack.com/t/elementary-community/shared_invite/zt-1b9vogqmq-y~IRhc2396CbHNBXLsrXcA). From 62885584cbb493dd8c43210ce643f9a0e193d9d8 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:35:38 +0300 Subject: [PATCH 094/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index 6c1013112..ff2ae1979 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -46,7 +46,8 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro ## Compliance - **SOC 2** + **SOC 2 certification** +
Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance.
From c69078d69843c8571137b214d504d3d21ed13ea3 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:36:00 +0300 Subject: [PATCH 095/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index ff2ae1979..ff8e48bc1 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -47,7 +47,7 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro **SOC 2 certification** -
+ Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance.
From 2deeb54fb5cda3daaaef1e709ab6ef02322ef34f Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:36:31 +0300 Subject: [PATCH 096/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index ff8e48bc1..646da0720 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -46,9 +46,7 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro ## Compliance - **SOC 2 certification** - - Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance. + **SOC 2 certification:** Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance. [Contact us](mailto:legal@elementary-data.com) for auditing reports and penetration testing results. From 8a22d94e31e36e9e1d4d3dd638bbc7180ecd3688 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:17:26 +0300 Subject: [PATCH 097/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 67 +++++++++++++++++++++++++++++++++++ docs/mint.json | 1 + 2 files changed, 68 insertions(+) create mode 100644 docs/dbt/on-run-end_hooks.mdx diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx new file mode 100644 index 000000000..019e1d6ca --- /dev/null +++ b/docs/dbt/on-run-end_hooks.mdx @@ -0,0 +1,67 @@ +--- +title: "Elementary dbt package on-run-end hooks" +sidebarTitle: "on-run-end hooks" +--- + +Elementary dbt package uses `on-run-end` [hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) to log results and metadata to tables in the Elementary schema. + +## What happens on the `on-run-end` hooks? + +On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `graph` objects, and runs SQL queries to load this data to the Elementary models. + +There are 2 types of models that Elementary updates : + +1. Metadata models - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. +2. Result models - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. + +#### Updates of metadata models + +These models store the current resources and configuration in your dbt projects (models, snapshots, sources, tests, etc.). +The metadata in the models only represents the project state on the latest run, so upon changes the metadata is replaced. +The `on-run-end` hook runs SQL queries with the new metadata and updates the relevant tables. + +#### Updates of result models +These models store a log of results of dbt invocations, and of the specific executed resources. +The `on-run-end` hook runs SQL queries with the run results and invocation details. + + +## What's the performance impact of `on-run-end` hooks? + +We give a lot of thought and effort to making Elementary efficient in both cost and performance. +We only run the hooks that are relevant to each run, and each hook creates a minimal amount of queries possible. + +**Metadata models** +For `dbt 1.4.0` and above, we maintain a metadata cache. +This means each of these models are only updated with changes in your project (new model, change in config, etc.). +For this reason, on the first time you execute Elementary the initial update might take a while, but the following updates should be quick. +The performance impact of this update depends on the frequency and volume of changes to your dbt project. + +If you are using `dbt 1.3.0` or lower, these models would be fully updated on each run. +The performance impact depends on the size of your dbt project. +You can also disable the metadata autoupload, and run the same update using the command `dbt run --select elementary.edr.dbt_artifacts`. + +**Result models** +The size of the queries depends on the amount of models/tests executed in the run. +The time the run results adds to the invocation shouldn't be significant. + + +## Can I disable the `on-run-end` hooks? + +Yes, but note that this may cause missing results and/or outdated metadata in Elementary report and alerts. + +**Disable metadata models updates** +Configure the following var: +```yaml dbt_project.yml +vars: + disable_dbt_artifacts_autoupload: true +``` +If you disable the artifacts autoupload, we recommend your run `dbt run --select elementary.edr.dbt_artifacts` every time you deploy changes to your project. + +**Disable result models updates** +Configure the following vars (you can also disable with conditions): +```yaml dbt_project.yml +vars: + disable_run_results: true + disable_tests_results: true + disable_dbt_invocation_autoupload: "{{ target.name != 'prod' }}" +``` diff --git a/docs/mint.json b/docs/mint.json index ca9e0c222..d859ce8b9 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -138,6 +138,7 @@ "pages": [ "understand-elementary/elementary-overview", "guides/modules-overview/dbt-package", + "dbt/on-run-end_hooks", "dbt/dbt-artifacts", "understand-elementary/elementary-report-ui", "understand-elementary/elementary-alerts" From 566a119d999aac9ee8efa4a1d38ca8a289325d4b Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:22:12 +0300 Subject: [PATCH 098/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index 019e1d6ca..e2633b002 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -11,8 +11,8 @@ On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `g There are 2 types of models that Elementary updates : -1. Metadata models - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. -2. Result models - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. +**1. Metadata models** - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. +**2. Result models** - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. #### Updates of metadata models @@ -25,22 +25,21 @@ These models store a log of results of dbt invocations, and of the specific exec The `on-run-end` hook runs SQL queries with the run results and invocation details. -## What's the performance impact of `on-run-end` hooks? +## Performance impact of `on-run-end` hooks We give a lot of thought and effort to making Elementary efficient in both cost and performance. We only run the hooks that are relevant to each run, and each hook creates a minimal amount of queries possible. **Metadata models** -For `dbt 1.4.0` and above, we maintain a metadata cache. -This means each of these models are only updated with changes in your project (new model, change in config, etc.). -For this reason, on the first time you execute Elementary the initial update might take a while, but the following updates should be quick. -The performance impact of this update depends on the frequency and volume of changes to your dbt project. -If you are using `dbt 1.3.0` or lower, these models would be fully updated on each run. -The performance impact depends on the size of your dbt project. -You can also disable the metadata autoupload, and run the same update using the command `dbt run --select elementary.edr.dbt_artifacts`. +**For `dbt 1.4.0` and above**, we maintain a metadata cache. This means each of these models are only updated with changes in your project (new model, change in config, etc.). +The first time you execute Elementary the initial update might take a while, but the following updates should be quick. + +**If you are using `dbt 1.3.0`** or lower, these models would be fully updated on each run. +The performance impact depends on the size of your dbt project. **Result models** + The size of the queries depends on the amount of models/tests executed in the run. The time the run results adds to the invocation shouldn't be significant. From e128a0cbf139a0bb368a9d0b85a77eb91acdb3a4 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:24:01 +0300 Subject: [PATCH 099/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index e2633b002..093a4d689 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -30,15 +30,15 @@ The `on-run-end` hook runs SQL queries with the run results and invocation detai We give a lot of thought and effort to making Elementary efficient in both cost and performance. We only run the hooks that are relevant to each run, and each hook creates a minimal amount of queries possible. -**Metadata models** +#### Metadata models **For `dbt 1.4.0` and above**, we maintain a metadata cache. This means each of these models are only updated with changes in your project (new model, change in config, etc.). The first time you execute Elementary the initial update might take a while, but the following updates should be quick. -**If you are using `dbt 1.3.0`** or lower, these models would be fully updated on each run. +**For `dbt 1.3.0` and lower**, these models would be fully updated on each run. The performance impact depends on the size of your dbt project. -**Result models** +#### Result models The size of the queries depends on the amount of models/tests executed in the run. The time the run results adds to the invocation shouldn't be significant. @@ -48,7 +48,8 @@ The time the run results adds to the invocation shouldn't be significant. Yes, but note that this may cause missing results and/or outdated metadata in Elementary report and alerts. -**Disable metadata models updates** +#### Disable metadata models updates + Configure the following var: ```yaml dbt_project.yml vars: @@ -56,7 +57,8 @@ vars: ``` If you disable the artifacts autoupload, we recommend your run `dbt run --select elementary.edr.dbt_artifacts` every time you deploy changes to your project. -**Disable result models updates** +#### Disable result models updates + Configure the following vars (you can also disable with conditions): ```yaml dbt_project.yml vars: From c8ae9e3a315fb463dafb7ddecc9aafbce769a502 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:28:49 +0300 Subject: [PATCH 100/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index 093a4d689..e984c3c8c 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -11,8 +11,8 @@ On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `g There are 2 types of models that Elementary updates : -**1. Metadata models** - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. -**2. Result models** - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. +1. **Metadata models** - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. +2. **Result models** - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. #### Updates of metadata models From 5f87b957c7a655f1afc8347baf9f7907566f1648 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:36:33 +0300 Subject: [PATCH 101/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index e984c3c8c..01f243a4b 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -5,6 +5,15 @@ sidebarTitle: "on-run-end hooks" Elementary dbt package uses `on-run-end` [hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) to log results and metadata to tables in the Elementary schema. +## Why Elementary uses `on-run-end` hooks? + +As a data observability solution, the completeness and freshness of the results Elementary collects is critical. + +By leveraging `on-run-end` hooks, we add a built-in collection of the latest results and metadata as part of your runs. +This means the results you see in Elementary report and the alerts you receive are full, up-to-date and accurate. + +We stringly recommend not to disable the hooks for environments you want to minitor using Elementary. + ## What happens on the `on-run-end` hooks? On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `graph` objects, and runs SQL queries to load this data to the Elementary models. From f1ff72eac367cf7f6bf130b4c6d70351d696928d Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 16 May 2023 15:43:40 +0300 Subject: [PATCH 102/194] on-run-end hooks --- docs/dbt/on-run-end_hooks.mdx | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index 01f243a4b..9deb9223d 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -7,12 +7,13 @@ Elementary dbt package uses `on-run-end` [hooks](https://docs.getdbt.com/referen ## Why Elementary uses `on-run-end` hooks? -As a data observability solution, the completeness and freshness of the results Elementary collects is critical. +Elementary report and alerts are generated from the data in the Elementary schema. +The solution relies on the Elementary schema being up-to-date and complete to be able to provide reliable and accurate observability. By leveraging `on-run-end` hooks, we add a built-in collection of the latest results and metadata as part of your runs. This means the results you see in Elementary report and the alerts you receive are full, up-to-date and accurate. -We stringly recommend not to disable the hooks for environments you want to minitor using Elementary. +We strongly recommend not to disable the hooks for environments you want to monitor using Elementary. ## What happens on the `on-run-end` hooks? From 946210638162ff20bee84464b569744b758cd7db Mon Sep 17 00:00:00 2001 From: Elon Gliksberg Date: Wed, 17 May 2023 18:14:16 +0300 Subject: [PATCH 103/194] Fixed typos. --- docs/cloud/manage-team.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index d4d5162c7..6c999433a 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -7,7 +7,7 @@ sidebarTitle: "Team settings" After you signup, you could invite team members to join you! 🎉 -On the top left buttun select `Account settings`, and you can invite users on the `Team` screen. +On the top left button select `Account settings`, and you can invite users on the `Team` screen. Users you invite will recieve an Email saying you invited them, and will need to accept and activate their account. @@ -18,6 +18,6 @@ Users you invite will recieve an Email saying you invited them, and will need to ### Remove users -On the top left buttun select `Account settings`, and select the `Team` screen. +On the top left button select `Account settings`, and select the `Team` screen. You can remove users by clicking selecting this option under the user options. From b6097ba3dc3e818c4bdb41e72733ad3c41f5cc70 Mon Sep 17 00:00:00 2001 From: Elon Gliksberg Date: Thu, 18 May 2023 15:38:37 +0300 Subject: [PATCH 104/194] Fixed incorrect test argument name. --- .../anomaly-detection-configuration/anomaly-sensitivity.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx index be9d46a9c..12a635746 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx @@ -34,7 +34,7 @@ models: - name: this_is_a_model tests: - elementary.volume_anomalies: - anomaly_sensitivity: 3 + sensitivity: 3 ``` From e0cff97462332718c01f04c6ebf0d4e28e04af58 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:07:51 +0300 Subject: [PATCH 105/194] Update signup.mdx --- docs/cloud/onboarding/signup.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index 0b3f00145..119fb45ab 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,6 +1,6 @@ --- title: "Quickstart: Signup and connect" -sidebarTitle: "Signup and connect" +sidebarTitle: "Signup and login" --- ### Signup to Elementary cloud @@ -28,4 +28,4 @@ After you connect a data warehouse with an Elementary schema in it, you can star ### What's next? -[Connect your Elementary schema to Elementary cloud](/cloud/saas-onboarding/connect-data-warehouse). +[Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). From 56c646c4271d0a61691b268905ba3e4d6ad55fa4 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:08:10 +0300 Subject: [PATCH 106/194] Update connect-data-warehouse.mdx --- docs/cloud/onboarding/connect-data-warehouse.mdx | 8 -------- 1 file changed, 8 deletions(-) diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 9783eae96..723e3cb09 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -11,14 +11,6 @@ Here are the steps needed to enable the connection: Elementary needs authentication details, permissions to read the Elementary schema (and not the rest of your data), and network access enabled by adding the cloud IPs to your data warehouse allowlist. -Here are the guides on how to configure these on each supported data warehouse: - -- Bigquery -- Snowflake -- Redshift -- Databricks -- Postgres - Elementary IP for allowlist: `3.126.156.226` ### Create a `profiles.yml` file From 04c25c9be108bcafd894ad6808c713be638d0810 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:29:18 +0300 Subject: [PATCH 107/194] Update mint.json --- docs/mint.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/mint.json b/docs/mint.json index d859ce8b9..42b2f70c4 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -22,7 +22,7 @@ }, "topbarCtaButton": { "name": "Try Elementary Cloud", - "url": "https://www.elementary-data.com/cloud-beta" + "url": "https://t2taztilhde.typeform.com/to/oevDtdJn?utm_source=docs&utm_medium=cta&utm_content=v1" }, "topbarLinks": [ { From 326114deeb47869b24b5a12a4f9eda7dfb7d6aab Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:30:57 +0300 Subject: [PATCH 108/194] Update introduction.mdx --- docs/cloud/introduction.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index ccac2e097..ca36a55ef 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -6,7 +6,7 @@ title: "Introduction" Elementary Cloud is the easiest and fastest way to get the most out of Elementary. - + _The service is currently in private beta_ @@ -39,7 +39,7 @@ alt="Elementary Managed high level flow" 1. [Install the Elementary dbt package in your project](/cloud/onboarding/quickstart-dbt-package). 2. [Signup and setup integrations](/cloud/onboarding/signup). - + _The service is currently in private beta_ From 4b8dafd8858cf3df56065dc646391aa0da2a11ea Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 21 May 2023 16:31:41 +0300 Subject: [PATCH 109/194] Update elementary-in-production.mdx --- docs/deployment-and-configuration/elementary-in-production.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/deployment-and-configuration/elementary-in-production.mdx b/docs/deployment-and-configuration/elementary-in-production.mdx index bca6a2fc0..7d9f6dd32 100644 --- a/docs/deployment-and-configuration/elementary-in-production.mdx +++ b/docs/deployment-and-configuration/elementary-in-production.mdx @@ -2,8 +2,7 @@ title: "Elementary in production" --- - - _The service is currently in private beta_ + Running Elementary in production means to include the dbt package in your production dbt project, From 91450e5b250ff40781384c62bfe525d268d4efd2 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sun, 21 May 2023 16:36:34 +0300 Subject: [PATCH 110/194] on-run-end hooks --- docs/cloud/introduction.mdx | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index ca36a55ef..83acc9c5c 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -6,8 +6,7 @@ title: "Introduction" Elementary Cloud is the easiest and fastest way to get the most out of Elementary. - - _The service is currently in private beta_ + ## @@ -39,8 +38,7 @@ alt="Elementary Managed high level flow" 1. [Install the Elementary dbt package in your project](/cloud/onboarding/quickstart-dbt-package). 2. [Signup and setup integrations](/cloud/onboarding/signup). - - _The service is currently in private beta_ + ## Security and privacy From 4ce6fa164de76907c158b1e4f48103b123036e05 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 24 May 2023 13:10:24 +0300 Subject: [PATCH 111/194] new typeform --- docs/cloud/introduction.mdx | 4 ++-- .../deployment-and-configuration/elementary-in-production.mdx | 2 +- docs/mint.json | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index 83acc9c5c..ce75c603a 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -6,7 +6,7 @@ title: "Introduction" Elementary Cloud is the easiest and fastest way to get the most out of Elementary. - + ## @@ -38,7 +38,7 @@ alt="Elementary Managed high level flow" 1. [Install the Elementary dbt package in your project](/cloud/onboarding/quickstart-dbt-package). 2. [Signup and setup integrations](/cloud/onboarding/signup). - + ## Security and privacy diff --git a/docs/deployment-and-configuration/elementary-in-production.mdx b/docs/deployment-and-configuration/elementary-in-production.mdx index 7d9f6dd32..22ae484eb 100644 --- a/docs/deployment-and-configuration/elementary-in-production.mdx +++ b/docs/deployment-and-configuration/elementary-in-production.mdx @@ -2,7 +2,7 @@ title: "Elementary in production" --- - + Running Elementary in production means to include the dbt package in your production dbt project, diff --git a/docs/mint.json b/docs/mint.json index 42b2f70c4..7f1a38ff8 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -22,7 +22,7 @@ }, "topbarCtaButton": { "name": "Try Elementary Cloud", - "url": "https://t2taztilhde.typeform.com/to/oevDtdJn?utm_source=docs&utm_medium=cta&utm_content=v1" + "url": "https://t2taztilhde.typeform.com/to/ObfMbxB5?utm_source=docs&utm_medium=cta&utm_content=v1" }, "topbarLinks": [ { From 613663f8ebf78658143c825993470d582dfa564e Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 10:59:59 +0300 Subject: [PATCH 112/194] jobs info --- .../collect-job-data.mdx | 74 +++++++++++++++++++ docs/mint.json | 1 + 2 files changed, 75 insertions(+) create mode 100644 docs/deployment-and-configuration/collect-job-data.mdx diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx new file mode 100644 index 000000000..c8f8f5c40 --- /dev/null +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -0,0 +1,74 @@ +--- +title: "Collect jobs info from orchestrator" +sidebarTitle: "Jobs name & info" +--- + +🚧 _Under development_ 🚧 + +Elementary can collect metadata about your jobs from the orchestrator you are using, and enrich the Elementary report with this information. + +The goal is to provide context that is useful to triage and resolve data issues, such as: +- Is my freshness / volume issue related to a job that didn't complete? Which job? +- Which tables were built as part of the job that loaded data with issues? +- Which job should I rerun to resolve? + + +Elementary supports collecting the following job details: +- Orchestrator name: `orchestrator` +- Job name: `job_name` +- Job ID: `job_id` +- Job URL: `job_url` +- Job run ID: `job_run_id` + +### How Elementary collects jobs metadata? + +**Environment variables** +Elementary collects jobs metadata in run time from `env_vars`. +Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. + +To configure `env_var` for your orchestrator, refer to your orchestrator's docs. + +**dbt vars** +Elementary also supports passing job metadata as dbt vars. If `env_var` and `var` exist, the `var` will be prioritized. + +To pass job data to elementary using `var`, use the `--vars` flag in your invocations: +```shell +dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' +``` + +### Which orchestrators are supported? + +Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. +These are the default env var that are collected: + +| Orchestrator | Env vars | +|--------------|--------------------------------------------------------| +| Any | `ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID` | + +The following orchestrators and their default environment variables are supported out of the box: + +| Orchestrator | Env vars | +|----------------|--------------------------------------------------------------------------------------------------------------------| +| dbt cloud | orchestrator name, job_id: `DBT_CLOUD_JOB_ID`, job_run_id: `DBT_CLOUD_RUN_ID` | +| Github actions | orchestrator name, job_run_id: `GITHUB_RUN_ID`, job_url: generated from `GITHUB_SERVER_URL`, `GITHUB_REPOSITORY`, `GITHUB_RUN_ID` | +| Airflow | orchestrator name | + + +### What if I use dbt cloud + orchestrator? + +By default, Elementary will collect the dbt cloud jobs info. +If you wish to override that, change your dbt cloud invocations to pass the orchestrator job info using `--vars`: +```shell +dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' +``` + +### Where can I see my job info? + +- In your Elementary schema, the fields are stored in the table `dbt_invocations`. +- In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. + + +### Can't find your orchestrator? Missing info? + +We would love to support more orchestrators and collect more useful info! +Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. \ No newline at end of file diff --git a/docs/mint.json b/docs/mint.json index 7f1a38ff8..6a2c79a6b 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -102,6 +102,7 @@ "group": "Deployment and Configuration", "pages": [ "deployment-and-configuration/elementary-in-production", + "deployment-and-configuration/collect-job-data", "understand-elementary/cli-install", "understand-elementary/cli-commands" ] From 9a48a4b7b60b36bb95ac7d6eaa26f7aabd394449 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:03:06 +0300 Subject: [PATCH 113/194] jobs info --- .../collect-job-data.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index c8f8f5c40..9a8a8e799 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -13,7 +13,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Which job should I rerun to resolve? -Elementary supports collecting the following job details: +#### Elementary supports collecting the following job details: - Orchestrator name: `orchestrator` - Job name: `job_name` - Job ID: `job_id` @@ -22,13 +22,13 @@ Elementary supports collecting the following job details: ### How Elementary collects jobs metadata? -**Environment variables** +#### Environment variables Elementary collects jobs metadata in run time from `env_vars`. Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. To configure `env_var` for your orchestrator, refer to your orchestrator's docs. -**dbt vars** +#### dbt vars Elementary also supports passing job metadata as dbt vars. If `env_var` and `var` exist, the `var` will be prioritized. To pass job data to elementary using `var`, use the `--vars` flag in your invocations: @@ -47,11 +47,11 @@ These are the default env var that are collected: The following orchestrators and their default environment variables are supported out of the box: -| Orchestrator | Env vars | -|----------------|--------------------------------------------------------------------------------------------------------------------| -| dbt cloud | orchestrator name, job_id: `DBT_CLOUD_JOB_ID`, job_run_id: `DBT_CLOUD_RUN_ID` | -| Github actions | orchestrator name, job_run_id: `GITHUB_RUN_ID`, job_url: generated from `GITHUB_SERVER_URL`, `GITHUB_REPOSITORY`, `GITHUB_RUN_ID` | -| Airflow | orchestrator name | +| Orchestrator | Env vars | +|----------------|----------------------------------------------------------------------------------------------------------------------------| +| dbt cloud | orchestrator
job_id: `DBT_CLOUD_JOB_ID`
job_run_id: `DBT_CLOUD_RUN_ID` | +| Github actions | orchestrator
job_run_id: `GITHUB_RUN_ID`
job_url: generated from `GITHUB_SERVER_URL`, `GITHUB_REPOSITORY`, `GITHUB_RUN_ID` | +| Airflow | orchestrator | ### What if I use dbt cloud + orchestrator? From 36516730300dbd1fd52c5189d4e3a552a2fde115 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:03:38 +0300 Subject: [PATCH 114/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 9a8a8e799..b17f8871a 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -20,7 +20,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Job URL: `job_url` - Job run ID: `job_run_id` -### How Elementary collects jobs metadata? +## How Elementary collects jobs metadata? #### Environment variables Elementary collects jobs metadata in run time from `env_vars`. @@ -36,7 +36,7 @@ To pass job data to elementary using `var`, use the `--vars` flag in your invoca dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' ``` -### Which orchestrators are supported? +## Which orchestrators are supported? Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. These are the default env var that are collected: @@ -54,7 +54,7 @@ The following orchestrators and their default environment variables are supporte | Airflow | orchestrator | -### What if I use dbt cloud + orchestrator? +## What if I use dbt cloud + orchestrator? By default, Elementary will collect the dbt cloud jobs info. If you wish to override that, change your dbt cloud invocations to pass the orchestrator job info using `--vars`: @@ -62,13 +62,13 @@ If you wish to override that, change your dbt cloud invocations to pass the orch dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' ``` -### Where can I see my job info? +## Where can I see my job info? - In your Elementary schema, the fields are stored in the table `dbt_invocations`. - In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. -### Can't find your orchestrator? Missing info? +## Can't find your orchestrator? Missing info? We would love to support more orchestrators and collect more useful info! Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. \ No newline at end of file From 15f392e00c32925cd608b30c9ddefda5ca7d817b Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:04:30 +0300 Subject: [PATCH 115/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index b17f8871a..6370826ce 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -13,7 +13,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Which job should I rerun to resolve? -#### Elementary supports collecting the following job details: +**Elementary collects the following job details:** - Orchestrator name: `orchestrator` - Job name: `job_name` - Job ID: `job_id` From 3e6119ac86b714b003715d833b3bf353f280a76c Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:11:37 +0300 Subject: [PATCH 116/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 6370826ce..d91686fc9 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -17,7 +17,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Orchestrator name: `orchestrator` - Job name: `job_name` - Job ID: `job_id` -- Job URL: `job_url` +- Job results URL: `job_url` - Job run ID: `job_run_id` ## How Elementary collects jobs metadata? @@ -64,7 +64,7 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa ## Where can I see my job info? -- In your Elementary schema, the fields are stored in the table `dbt_invocations`. +- In your Elementary schema, the raw fields are stored in the table `dbt_invocations`. You could also use the view `job_run_results` which groups invocation by job. - In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. From 1a730964f03e72e30c66955de6ca69c2cb0e48c7 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:21:36 +0300 Subject: [PATCH 117/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index d91686fc9..2bfbedd31 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -36,6 +36,16 @@ To pass job data to elementary using `var`, use the `--vars` flag in your invoca dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' ``` +#### Variables supported format + +| var / env_var | Format | +|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| +| orchestrator | one of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | +| job_name, job_id, job_run_d | string | +| job_url | valid HTTP URL | + + + ## Which orchestrators are supported? Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. From 993cea94e0d992e4520173f891fc2994db29021f Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:22:09 +0300 Subject: [PATCH 118/194] jobs info --- docs/deployment-and-configuration/collect-job-data.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 2bfbedd31..b46cd4fa6 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -40,9 +40,9 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa | var / env_var | Format | |-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| -| orchestrator | one of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | -| job_name, job_id, job_run_d | string | -| job_url | valid HTTP URL | +| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | +| job_name, job_id, job_run_d | String | +| job_url | Valid HTTP URL | From 7192caafddaeec6b99def0b6dd048ee26dbd23a4 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 30 May 2023 11:32:19 +0300 Subject: [PATCH 119/194] jobs info --- .../collect-job-data.mdx | 26 +++++++++---------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index b46cd4fa6..5990bb2ae 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -18,7 +18,7 @@ The goal is to provide context that is useful to triage and resolve data issues, - Job name: `job_name` - Job ID: `job_id` - Job results URL: `job_url` -- Job run ID: `job_run_id` +- The ID of a specific run execution: `job_run_id` ## How Elementary collects jobs metadata? @@ -26,7 +26,10 @@ The goal is to provide context that is useful to triage and resolve data issues, Elementary collects jobs metadata in run time from `env_vars`. Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. -To configure `env_var` for your orchestrator, refer to your orchestrator's docs. +These are the env vars that are collected: +`ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID` + +To configure `env_var` for your orchestrator, refer to your orchestrator's docs. #### dbt vars Elementary also supports passing job metadata as dbt vars. If `env_var` and `var` exist, the `var` will be prioritized. @@ -38,24 +41,19 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa #### Variables supported format -| var / env_var | Format | -|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| -| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | -| job_name, job_id, job_run_d | String | -| job_url | Valid HTTP URL | +| var / env_var | Format | +|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| +| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | +| job_name, job_id, job_run_id | String | +| job_url | Valid HTTP URL | ## Which orchestrators are supported? -Technically you can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. -These are the default env var that are collected: - -| Orchestrator | Env vars | -|--------------|--------------------------------------------------------| -| Any | `ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID` | +You can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. -The following orchestrators and their default environment variables are supported out of the box: +The following default environment variables are supported out of the box: | Orchestrator | Env vars | |----------------|----------------------------------------------------------------------------------------------------------------------------| From 3e9cb8d72501ad45f1360a41f60581a1ec7cfdee Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 11:28:29 +0300 Subject: [PATCH 120/194] tests docs changes --- docs/guides/add-elementary-tests.mdx | 386 ++---------------- ...umn_anomalies.mdx => column-anomalies.mdx} | 0 .../all-columns-anomalies.mdx | 75 ++++ .../column-anomalies.mdx | 111 +++++ .../dimension-anomalies.mdx | 88 ++++ .../event-freshness-anomalies.mdx | 75 ++++ .../freshness-anomalies.mdx | 69 ++++ .../volume-anomalies.mdx | 81 ++++ .../guides/elementary-tests-configuration.mdx | 4 +- docs/guides/how-anomaly-detection-works.mdx | 2 +- docs/mint.json | 13 +- 11 files changed, 547 insertions(+), 357 deletions(-) rename docs/guides/anomaly-detection-configuration/{column_anomalies.mdx => column-anomalies.mdx} (100%) create mode 100644 docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/column-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/dimension-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/freshness-anomalies.mdx create mode 100644 docs/guides/anomaly-detection-tests/volume-anomalies.mdx diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 1852e328f..0bfaf105f 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,381 +18,61 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests -#### Volume (row count) anomalies - -`elementary.volume_anomalies` - -Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). - -Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` -field), and then we compute the row count per bucket for the last [`days_back`](/guides/anomaly-detection-configuration/days-back) days (by default 14). - -The test then compares the row count of each bucket buckets within the detection period (last 2 days by default, controlled by the -`backfill_days` var), and compares it to the row count of the previous time buckets. -If there were any anomalies during the detection period, the test will fail. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - tests: - - elementary.volume_anomalies: - timestamp_column: < timestamp column > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: "loaded_at" - tests: - - elementary.volume_anomalies: - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn - - - name: users - # if no timestamp is configured, elementary will monitor without time filtering - tests: - - elementary.volume_anomalies: - tags: ["elementary"] + elementary.volume_anomalies ``` + Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). + - - -#### Freshness anomalies - -`elementary.freshness_anomalies` - -Monitors the freshness of your table over time, as the expected time between data updates. - -Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` -field), and then we compute the maximum freshness value per bucket for the last `days_back` days (by default 14). - -The test then compares the freshness of each bucket within the detection period (last 2 days by default, controlled by the -`backfill_days` var), and compares it to the freshness of the previous time buckets. -If there were any anomalies during the detection period, the test will fail. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 -models: - - name: < model name > - tests: - - elementary.freshness_anomalies: - timestamp_column: < timestamp column > # Mandatory - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - tests: - - elementary.freshness_anomalies: - timestamp_column: "updated_at" - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn + elementary.freshness_anomalies ``` + Monitors the freshness of your table over time, as the expected time between data updates. + Requires a [`timestamp_column`](/guides/anomaly-detection-configuration/timestamp-column) configuration. + - - -#### Event freshness anomalies - -`elementary.event_freshness_anomalies` - -Monitors the freshness of event data over time, as the expected time it takes each event to load - -that is, the time between the when the event actually occurs (the event timestamp), and when it is loaded to the -database (the update timestamp). - -This test compliments the `freshness_anomalies` test and is primarily intended for data that is updated in a -continuous / streaming fashion. - -The test can work in a couple of modes: - -- If only an `event_timestamp_column` is supplied, the test measures over time the difference between the current - timestamp ("now") and the most recent event timestamp. -- If both an `event_timestamp_column` and an `update_timestamp_column` are provided, the test will measure over time - the difference between these two columns. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - tests: - - elementary.event_freshness_anomalies: - event_timestamp_column: < timestamp column > # Mandatory - update_timestamp_column: < timestamp column > # Optional - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - tests: - - elementary.event_freshness_anomalies: - event_timestamp_column: "occurred_at" - update_timestamp_column: "updated_at" - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn + elementary.event_freshness_anomalies ``` + Monitors the freshness of event data over time, as the expected time it takes each event to load - + that is, the time between when the event actually occurs (the `event timestamp`), and when it is loaded to the + database (the `update timestamp`). The configuration `event_timestamp_column` is required, and `update_timestamp_column` is optional. + - - -#### Dimension anomalies - -`elementary.dimension_anomalies` - -This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in -the distribution. -It is best to configure it on low-cardinality fields. -The test counts rows grouped by given columns/expressions, and can be configured using the `dimensions` -and `where_expression` keys. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - config: - elementary: - timestamp_column: < timestamp column > - tests: - - elementary.dimension_anomalies: - dimensions: < columns or sql expressions of columns > - # optional - configure a where a expression to accurate the dimension monitoring - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: "loaded_at" - tests: - - elementary.dimension_anomalies: - dimensions: - - event_type - - country_name - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: hour - count: 4 - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] - config: - # optional - change severity - severity: warn - - - name: users - # if no timestamp is configured, elementary will monitor without time filtering - tests: - - elementary.dimension_anomalies: - dimensions: - - event_type - tags: ["elementary"] + elementary.dimension_anomalies ``` + This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in the distribution. + It is best to configure it on low-cardinality fields. + The test counts rows grouped by given `dimensions` (columns/expressions). + - - -#### All columns anomalies - -`elementary.all_columns_anomalies` - -Executes column level monitors and anomaly detection on all the columns of the table. Specific monitors -are [detailed here](/guides/data-anomaly-detection#tests-and-monitors-types) and can be configured using -the `all_columns_anomalies` key. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -```yml Models -version: 2 - -models: - - name: < model name > - config: - elementary: - timestamp_column: < timestamp column > - tests: - - elementary.all_columns_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: "loaded_at" - tests: - - elementary.all_columns_anomalies: - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - tags: ["elementary"] - # optional - change global sensitivity - sensitivity: 3.5 + elementary.all_columns_anomalies ``` + Executes column level monitors and anomaly detection on all the columns of the table. + Specific monitors are [detailed here](/guides/anomaly-detection-configuration/column-anomalies). + You can use `column_anomalies` param to override the default monitors, and `exclude_prefix` / `exclude_regexp` to exclude columns from the test. + - - - ### Column tests -#### Column anomalies - -`elementary.column_anomalies` - -Executes column level monitors and anomaly detection. Specific monitors -are [detailed here](/guides/data-anomaly-detection#tests-and-monitors-types) and can be configured using -the `column_anomalies` key. - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - - - -For advanced configuration of Elementary anomaly tests, refer to [tests configuration](/guides/elementary-tests-configuration). - -```yml Models -version: 2 - -models: - - name: < model name > - config: - elementary: - timestamp_column: < timestamp column > - columns: - - name: < column name > - tests: - - elementary.column_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > - - - name: < model name > - ## if no timestamp is configured, elementary will monitor without time filtering - columns: - - name: < column name > - tests: - - elementary.column_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > + ``` - -```yml Models example -version: 2 - -models: - - name: login_events - config: - elementary: - timestamp_column: 'loaded_at' - columns: - - name: user_name - tests: - - elementary.column_anomalies: - column_anomalies: - - missing_count - - min_length - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - tags: ['elementary'] - - - name: users - ## if no timestamp is configured, elementary will monitor without time filtering - tests: - elementary.table_anomalies - tags: ['elementary'] - columns: - - name: user_id - tests: - - elementary.column_anomalies: - tags: ['elementary'] - timestamp_column: 'updated_at' - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: < time period > - count: < number of periods > - - name: user_name - tests: - - elementary.column_anomalies: - column_anomalies: - - missing_count - - min_length - tags: ['elementary'] + elementary.column_anomalies ``` + Executes column level monitors and anomaly detection on the column. + Specific monitors are [detailed here](/guides/anomaly-detection-configuration/column-anomalies) and can be configured using + the `columns_anomalies` configuration. + - - -#### Column anomalies - - #### Adding tests examples: diff --git a/docs/guides/anomaly-detection-configuration/column_anomalies.mdx b/docs/guides/anomaly-detection-configuration/column-anomalies.mdx similarity index 100% rename from docs/guides/anomaly-detection-configuration/column_anomalies.mdx rename to docs/guides/anomaly-detection-configuration/column-anomalies.mdx diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx new file mode 100644 index 000000000..b3b9a182b --- /dev/null +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -0,0 +1,75 @@ +--- +title: "all_columns_anomalies" +sidebarTitle: "all_columns_anomalies" +--- + +`elementary.all_columns_anomalies` + +Executes column level monitors and anomaly detection on all the columns of the table. +Specific monitors are detailed in the table below and can be configured using the `columns_anomalies` configuration. + +The test checks the data type of each column and only executes monitors that are relevant to it. +You can use `column_anomalies` param to override the default monitors, and `exclude_prefix` / `exclude_regexp` to exclude columns from the test. + + + + +### Test configuration + +No mandatory configuration, however it is highly recommended to configure a `timestamp_column`. + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- column_anomalies: column monitors list>
+      -- exclude_prefix: string>
+      -- exclude_regexp: regex>
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models +models: + - name: < model name > + config: + elementary: + timestamp_column: < timestamp column > + tests: + - elementary.all_columns_anomalies: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + tests: + - elementary.all_columns_anomalies: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + tags: ["elementary"] + # optional - change global sensitivity + sensitivity: 3.5 +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx new file mode 100644 index 000000000..7572b6831 --- /dev/null +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -0,0 +1,111 @@ +--- +title: "column_anomalies" +sidebarTitle: "column_anomalies" +--- + +`elementary.column_anomalies` + +Executes column level monitors and anomaly detection on the column. +Specific monitors are detailed in the table below and can be configured using the `columns_anomalies` configuration. + +The test checks the data type of the column and only executes monitors that are relevant to it. + + + + +### Test configuration + +No mandatory configuration, however it is highly recommended to configure a `timestamp_column`. + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- column_anomalies: column monitors list>
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models + +models: + - name: < model name > + config: + elementary: + timestamp_column: < timestamp column > + columns: + - name: < column name > + tests: + - elementary.column_anomalies: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > + + - name: < model name > + ## if no timestamp is configured, elementary will monitor without time filtering + columns: + - name: < column name > + tests: + - elementary.column_anomalies: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > +``` + +```yml Models example + +models: + - name: login_events + config: + elementary: + timestamp_column: 'loaded_at' + columns: + - name: user_name + tests: + - elementary.column_anomalies: + column_anomalies: + - missing_count + - min_length + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + tags: ['elementary'] + + - name: users + ## if no timestamp is configured, elementary will monitor without time filtering + tests: + elementary.table_anomalies + tags: ['elementary'] + columns: + - name: user_id + tests: + - elementary.column_anomalies: + tags: ['elementary'] + timestamp_column: 'updated_at' + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: < time period > + count: < number of periods > + - name: user_name + tests: + - elementary.column_anomalies: + column_anomalies: + - missing_count + - min_length + tags: ['elementary'] +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx new file mode 100644 index 000000000..7155ae525 --- /dev/null +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -0,0 +1,88 @@ +--- +title: "dimension_anomalies" +sidebarTitle: "dimension_anomalies" +--- + +`elementary.dimension_anomalies` + +This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in the distribution. +It is best to configure it on low-cardinality fields. +The test counts rows grouped by given `dimensions` (columns/expressions). + +If `timestamp_column` is configured, the distribution is collected per `time_bucket`. If not, it counts the total rows per dimension. + + +### Test configuration + +_Required configuration: `dimensions`_ + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- dimensions: sql expression>
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + + +```yml Models + +models: + - name: < model name > + config: + elementary: + timestamp_column: < timestamp column > + tests: + - elementary.dimension_anomalies: + dimensions: < columns or sql expressions of columns > + # optional - configure a where a expression to accurate the dimension monitoring + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example + +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + tests: + - elementary.dimension_anomalies: + dimensions: + - event_type + - country_name + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: hour + count: 4 + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn + + - name: users + # if no timestamp is configured, elementary will monitor without time filtering + tests: + - elementary.dimension_anomalies: + dimensions: + - event_type + tags: ["elementary"] +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx new file mode 100644 index 000000000..c76b159de --- /dev/null +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -0,0 +1,75 @@ +--- +title: "event_freshness_anomalies" +sidebarTitle: "event_freshness_anomalies" +--- + +`elementary.event_freshness_anomalies` + +Monitors the freshness of event data over time, as the expected time it takes each event to load - +that is, the time between when the event actually occurs (the `event timestamp`), and when it is loaded to the +database (the `update timestamp`). + +This test compliments the `freshness_anomalies` test and is primarily intended for data that is updated in a continuous / streaming fashion. + +The test can work in a couple of modes: + +- If only an `event_timestamp_column` is supplied, the test measures over time the difference between the current + timestamp ("now") and the most recent event timestamp. +- If both an `event_timestamp_column` and an `update_timestamp_column` are provided, the test will measure over time + the difference between these two columns. + +### Test configuration + +_Required configuration: `event_timestamp_column`_ +_Default configuration: `anomaly_direction: spike` to alert only on delays._ + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- event_timestamp_column: column name>
+      -- update_timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models + +models: + - name: < model name > + tests: + - elementary.event_freshness_anomalies: + event_timestamp_column: < timestamp column > # Mandatory + update_timestamp_column: < timestamp column > # Optional + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example + +models: + - name: login_events + tests: + - elementary.event_freshness_anomalies: + event_timestamp_column: "occurred_at" + update_timestamp_column: "updated_at" + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx new file mode 100644 index 000000000..47fa9a787 --- /dev/null +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -0,0 +1,69 @@ +--- +title: "freshness_anomalies" +sidebarTitle: "freshness_anomalies" +--- + +`elementary.freshness_anomalies` + +Monitors the freshness of your table over time, as the expected time between data updates. + +Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` field), +and then we compute the maximum freshness value per bucket for the last `days_back` days (by default 14). + +The test then compares the freshness of each bucket within the detection period (last 2 days by default, controlled by the +`backfill_days` var), and compares it to the freshness of the previous time buckets. +If there were any anomalies during the detection period, the test will fail. + + +### Test configuration + +_Required configuration: `timestamp_column`_ +_Default configuration: `anomaly_direction: spike` to alert only on delays._ + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models + +models: + - name: < model name > + tests: + - elementary.freshness_anomalies: + timestamp_column: < timestamp column > # Mandatory + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example + +models: + - name: login_events + tests: + - elementary.freshness_anomalies: + timestamp_column: "updated_at" + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn +``` + + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx new file mode 100644 index 000000000..5921e41af --- /dev/null +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -0,0 +1,81 @@ +--- +title: "volume_anomalies" +sidebarTitle: "volume_anomalies" +--- + +`elementary.volume_anomalies` + +Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). + +Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` field), +and then we compute the row count per bucket for the last [`days_back`](/guides/anomaly-detection-configuration/days-back) days (by default 14). + +The test then compares the row count of each bucket within the detection period (last 2 days by default, configured as [`backfill_days`](/guides/anomaly-detection-configuration/backfill-days)), +and compares it to the row count of the previous time buckets. + +**The test will only run on completed time buckets**, so if you run it with daily buckets in the middle of today, the test would only count yesterday as a complete bucket. +If there were any anomalies during the detection period, the test will fail. + + +### Test configuration + +No mandatory configuration, however it is highly recommended to configure a `timestamp_column`. + +
+ 
+  tests:
+    elementary.volume_anomalies:
+      -- timestamp_column: column name>
+      -- where_expression: sql expression>
+      -- anomaly_sensitivity: int>
+      -- anomaly_direction: [both | spike | drop]>
+      -- days_back: int>
+      -- backfill_days: int>
+      -- time_bucket:>
+              period: [hour | day | week | month]
+              count: int
+      -- seasonality: day_of_week>
+ 
+
+ + + + +```yml Models +models: + - name: < model name > + tests: + - elementary.volume_anomalies: + timestamp_column: < timestamp column > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > +``` + +```yml Models example +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + tests: + - elementary.volume_anomalies: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] + config: + # optional - change severity + severity: warn + + - name: users + # if no timestamp is configured, elementary will monitor without time filtering + tests: + - elementary.volume_anomalies: + tags: ["elementary"] +``` + + \ No newline at end of file diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index e9a3401c7..236567676 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -35,7 +35,7 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt -- seasonality: day_of_week> all_columns_anomalies test: - -- column_anomalies: column monitors list> + -- column_anomalies: column monitors list> -- exclude_prefix: string> -- exclude_regexp: regex> @@ -70,7 +70,7 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt -- seasonality: day_of_week> -- anomaly_sensitivity: int> -- anomaly_direction: [both | spike | drop]> - -- column_anomalies: column monitors list> + -- column_anomalies: column monitors list> -- exclude_prefix: string> -- exclude_regexp: regex> -- dimensions: sql expression> diff --git a/docs/guides/how-anomaly-detection-works.mdx b/docs/guides/how-anomaly-detection-works.mdx index 7d76c18ce..eff6bc2be 100644 --- a/docs/guides/how-anomaly-detection-works.mdx +++ b/docs/guides/how-anomaly-detection-works.mdx @@ -66,7 +66,7 @@ To detect data issues with high accuracy, it is important to leverage the config Configuration params related directly to the test's core concepts: **Data monitors** -- [column_anomalies](/guides/anomaly-detection-configuration/column_anomalies) +- [column_anomalies](/guides/anomaly-detection-configuration/column-anomalies) **Expected range** - [anomaly_sensitivity](/guides/anomaly-detection-configuration/anomaly-sensitivity) diff --git a/docs/mint.json b/docs/mint.json index 6a2c79a6b..63d6245fe 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -112,6 +112,17 @@ "pages": [ "guides/how-anomaly-detection-works", "guides/data-anomaly-detection", + { + "group": "Anomaly detection tests", + "pages": [ + "guides/anomaly-detection-tests/volume-anomalies", + "guides/anomaly-detection-tests/freshness-anomalies", + "guides/anomaly-detection-tests/event-freshness-anomalies", + "guides/anomaly-detection-tests/dimension-anomalies", + "guides/anomaly-detection-tests/all-columns-anomalies", + "guides/anomaly-detection-tests/column-anomalies" + ] + }, "guides/elementary-tests-configuration", { "group": "Tests params", @@ -124,7 +135,7 @@ "guides/anomaly-detection-configuration/backfill-days", "guides/anomaly-detection-configuration/time-bucket", "guides/anomaly-detection-configuration/seasonality", - "guides/anomaly-detection-configuration/column_anomalies", + "guides/anomaly-detection-configuration/column-anomalies", "guides/anomaly-detection-configuration/exclude_prefix", "guides/anomaly-detection-configuration/exclude_regexp", "guides/anomaly-detection-configuration/dimensions", From 3f26e655e9ba23aeedae75615c989cdb0c8b46f9 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 11:59:29 +0300 Subject: [PATCH 121/194] test config at all levels --- ...uestion-tests-configuration-priorities.mdx | 22 ++++---- docs/guides/add-elementary-tests.mdx | 2 +- .../guides/elementary-tests-configuration.mdx | 53 ++++++++++++++----- 3 files changed, 52 insertions(+), 25 deletions(-) diff --git a/docs/_snippets/faq/question-tests-configuration-priorities.mdx b/docs/_snippets/faq/question-tests-configuration-priorities.mdx index 90d9d5b7e..4c81a2315 100644 --- a/docs/_snippets/faq/question-tests-configuration-priorities.mdx +++ b/docs/_snippets/faq/question-tests-configuration-priorities.mdx @@ -1,20 +1,22 @@ -The configuration of Elementary is dbt native and follows the same priorities of `dbt configuration`. -The more granular and specific configuration overrides the less granular one. +The configuration of Elementary is dbt native and follows the same priorities and inheritance. +The more granular and specific configuration overrides the less granular one. Elementary searches and prioritizes configuration in the following order: -For models: +**For models tests:** 1. Test arguments. -2. Model configuration. -3. Global vars in `dbt_project.yml`. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Model configuration. +4. Path configuration under `models` key in `dbt_project.yml`. +5. Global vars in `dbt_project.yml`. -For sources: +**For sources tests:** 1. Test arguments. -2. Table configuration. -3. Source configuration. -4. Global vars in `dbt_project.yml`. - +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Table configuration. +4. Source configuration. +5. Global vars in `dbt_project.yml`. \ No newline at end of file diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 0bfaf105f..4bd341013 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -40,7 +40,7 @@ The tests are configured and executed like any other tests in your project. ``` Monitors the freshness of event data over time, as the expected time it takes each event to load - that is, the time between when the event actually occurs (the `event timestamp`), and when it is loaded to the - database (the `update timestamp`). The configuration `event_timestamp_column` is required, and `update_timestamp_column` is optional. + database (the `update timestamp`). Configuring `event_timestamp_column` is required, and `update_timestamp_column` is optional.
diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index 236567676..39538ef2f 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -5,7 +5,24 @@ sidebarTitle: "Tests configuration" The anomaly detection tests configuration is defined in `.yml` files in your dbt project, just like in native dbt tests. - +The configuration of Elementary is dbt native and follows the same priorities and inheritance. +The more granular and specific configuration overrides the less granular one. + +Elementary searches and prioritizes configuration in the following order: + +**For models tests:** +1. Test arguments. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Model configuration. +4. Path configuration under `models` key in `dbt_project.yml`. +5. Global vars in `dbt_project.yml`. + +**For sources tests:** +1. Test arguments. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Table configuration. +4. Source configuration. +5. Global vars in `dbt_project.yml`. --- @@ -17,7 +34,7 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt - +
      
       All anomaly detection tests:
@@ -50,32 +67,40 @@ The anomaly detection tests configuration is defined in `.yml` files in your dbt
 
   
 
-  
+  
     
      
-      dbt_project.yml vars:
+      Expected range:
        -- anomaly_sensitivity: int>
-       -- days_back: int>
+       -- anomaly_direction: [both | spike | drop]>
+
+      Detection period and detection set:
        -- backfill_days: int>
+       -- seasonality: day_of_week>
 
-      Model / source level:
-       -- timestamp_column: column name>
+      Training period and training set:
+       -- days_back: int>
+       -- seasonality: day_of_week>
 
-      Test level:
+      Time buckets:
        -- timestamp_column: column name>
-       -- where_expression: sql expression>
        -- time_bucket:>
                 period: [hour | day | week | month]
                 count: int
-       -- seasonality: day_of_week>
-       -- anomaly_sensitivity: int>
-       -- anomaly_direction: [both | spike | drop]>
-       -- column_anomalies: column monitors list>
+
+      Monitored data set:
+       -- where_expression: sql expression>
        -- exclude_prefix: string>
        -- exclude_regexp: regex>
        -- dimensions: sql expression>
+      
+      Data monitors:
+       -- column_anomalies: column monitors list>
+
+      Other:
        -- event_timestamp_column: column name>
-       -- update_timestamp_column: column name>
+       -- update_timestamp_column: column name> 
+
      
     
From 645dc3b9c047f4db38f6d9ba106563610854c869 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 12:45:35 +0300 Subject: [PATCH 122/194] test config at all levels --- .../anomaly-direction.mdx | 19 +++++++++--- .../anomaly-sensitivity.mdx | 31 ++++++++++++++----- .../backfill-days.mdx | 21 ++++++++----- .../column-anomalies.mdx | 2 +- .../days-back.mdx | 21 +++++++++++-- .../dimensions.mdx | 2 +- .../event_timestamp_column.mdx | 2 +- .../exclude_prefix.mdx | 2 +- .../exclude_regexp.mdx | 2 +- .../seasonality.mdx | 16 ++++++++-- .../time-bucket.mdx | 24 +++++++++++--- .../timestamp-column.mdx | 26 +++++++++------- .../update_timestamp_column.mdx | 2 +- .../where-expression.mdx | 17 ++++++++-- 14 files changed, 140 insertions(+), 47 deletions(-) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx index c4b4abf71..4e1e22698 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx @@ -14,7 +14,6 @@ The anomaly_direction configuration is used to configure the direction of the ex - _Default: `both`_ - _Supported values: `both`, `spike`, `drop`_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model - tests: - + tests: - elementary.volume_anomalies: anomaly_direction: drop @@ -42,4 +40,17 @@ models: ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + anomaly_direction: drop +``` + +```yml dbt_project +vars: + anomaly_direction: both +``` + \ No newline at end of file diff --git a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx index 12a635746..7e21fa6e2 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-sensitivity.mdx @@ -13,7 +13,6 @@ Larger values will have the opposite effect and will reduce the number of anomal - _Default: 3_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: var, test config_ -```yaml dbt_project.yml -vars: - anomaly_sensitivity: 3 +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + anomaly_sensitivity: 2.5 + + - elementary.all_columns_anomalies: + column_anomalies: + - null_count + - missing_count + - zero_count + anomaly_sensitivity: 4 + ``` -```yaml test +```yml model models: - name: this_is_a_model - tests: - - elementary.volume_anomalies: - sensitivity: 3 + config: + elementary: + anomaly_sensitivity: 3.5 +``` + +```yml dbt_project +vars: + anomaly_sensitivity: 3 ``` diff --git a/docs/guides/anomaly-detection-configuration/backfill-days.mdx b/docs/guides/anomaly-detection-configuration/backfill-days.mdx index 29636d74c..37a831839 100644 --- a/docs/guides/anomaly-detection-configuration/backfill-days.mdx +++ b/docs/guides/anomaly-detection-configuration/backfill-days.mdx @@ -15,7 +15,6 @@ This configuration should be changed according to your data delays. - _Default: 2_ - _Relevant tests: Anomaly detection tests with `timestamp_column`_ -- _Configuration level: test, var_ -```yaml dbt_project.yml -vars: - backfill_days: 2 -``` - -```yaml test +```yml test models: - name: this_is_a_model tests: @@ -40,6 +34,19 @@ models: backfill_days: 7 ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + backfill_days: 4 +``` + +```yml dbt_project.yml +vars: + backfill_days: 2 +``` + diff --git a/docs/guides/anomaly-detection-configuration/column-anomalies.mdx b/docs/guides/anomaly-detection-configuration/column-anomalies.mdx index 122230c19..2ab2d5435 100644 --- a/docs/guides/anomaly-detection-configuration/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-configuration/column-anomalies.mdx @@ -13,7 +13,7 @@ Select which monitors to activate as part of the test. -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/days-back.mdx b/docs/guides/anomaly-detection-configuration/days-back.mdx index c58e4dba0..f174a6f79 100644 --- a/docs/guides/anomaly-detection-configuration/days-back.mdx +++ b/docs/guides/anomaly-detection-configuration/days-back.mdx @@ -11,7 +11,6 @@ This timeframe includes the training period and detection period. - _Default: 14_ - _Relevant tests: Anomaly detection tests with `timestamp_column`_ -- _Configuration level: var_ -```yaml dbt_project.yml +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + days_back: 30 +``` + +```yml model +models: + - name: this_is_a_model + config: + elementary: + days_back: 60 +``` + +```yml dbt_project.yml vars: - days_back: 14 + days_back: 45 ``` diff --git a/docs/guides/anomaly-detection-configuration/dimensions.mdx b/docs/guides/anomaly-detection-configuration/dimensions.mdx index 0aadf066b..1aae222ef 100644 --- a/docs/guides/anomaly-detection-configuration/dimensions.mdx +++ b/docs/guides/anomaly-detection-configuration/dimensions.mdx @@ -18,7 +18,7 @@ It is best to configure it on low-cardinality fields. -```yaml test +```yml test models: - name: model_name config: diff --git a/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx b/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx index cfbb6701d..35acc4785 100644 --- a/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx +++ b/docs/guides/anomaly-detection-configuration/event_timestamp_column.mdx @@ -19,7 +19,7 @@ The test can work in a couple of modes: -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx b/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx index 49432676f..eea95815f 100644 --- a/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx +++ b/docs/guides/anomaly-detection-configuration/exclude_prefix.mdx @@ -13,7 +13,7 @@ Param for the `all_columns_anomalies` test only, which enables to exclude a colu -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx b/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx index 1ebf3a1c4..f41b5616f 100644 --- a/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx +++ b/docs/guides/anomaly-detection-configuration/exclude_regexp.mdx @@ -13,7 +13,7 @@ Param for the `all_columns_anomalies` test only, which enables to exclude a colu -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/seasonality.mdx b/docs/guides/anomaly-detection-configuration/seasonality.mdx index eae00084a..0c071d1aa 100644 --- a/docs/guides/anomaly-detection-configuration/seasonality.mdx +++ b/docs/guides/anomaly-detection-configuration/seasonality.mdx @@ -20,7 +20,6 @@ The expected range for Monday will be based on a training set of previous Monday - _Default: none_ - _Supported values: `day_of_week`_ - _Relevant tests: Anomaly detection tests with `timestamp_column` and 1 day `time_bucket`_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model tests: @@ -39,6 +38,19 @@ models: seasonality: day_of_week ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + seasonality: day_of_week +``` + +```yml dbt_project.yml +vars: + seasonality: day_of_week +``` + diff --git a/docs/guides/anomaly-detection-configuration/time-bucket.mdx b/docs/guides/anomaly-detection-configuration/time-bucket.mdx index 19478817d..a5e748e49 100644 --- a/docs/guides/anomaly-detection-configuration/time-bucket.mdx +++ b/docs/guides/anomaly-detection-configuration/time-bucket.mdx @@ -19,7 +19,6 @@ For example, if you want to detect volume anomalies in an hourly resolution, you - _Default: daily buckets. `time_bucket: {period: day, count: 1}`_ - _Relevant tests: Anomaly detection tests with `timestamp_column`_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model tests: - elementary.volume_anomalies: time_bucket: - period: hour - count: 4 + period: day + count: 2 +``` + +```yml model +models: + - name: this_is_a_model + config: + elementary: + time_bucket: + period: hour + count: 4 +``` + +```yml dbt_project.yml +vars: + time_bucket: + period: hour + count: 12 ``` diff --git a/docs/guides/anomaly-detection-configuration/timestamp-column.mdx b/docs/guides/anomaly-detection-configuration/timestamp-column.mdx index 54b332ed7..d72390686 100644 --- a/docs/guides/anomaly-detection-configuration/timestamp-column.mdx +++ b/docs/guides/anomaly-detection-configuration/timestamp-column.mdx @@ -15,11 +15,18 @@ If undefined, default is null (no time buckets). - _Default: none_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: model config, test config_ -```yaml model +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + timestamp_column: created_at +``` + +```yml model models: - name: this_is_a_model config: @@ -27,9 +34,8 @@ models: timestamp_column: updated_at ``` - -```yaml source -ources: +```yml source +sources: - name: my_non_dbt_tables schema: raw tables: @@ -39,12 +45,10 @@ ources: timestamp_column: loaded_at ``` -```yaml test -models: - - name: this_is_a_model - tests: - - elementary.volume_anomalies: - timestamp_column: created_at +```yml dbt_project.yml +vars: + timestamp_column: loaded_at ``` + diff --git a/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx b/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx index 46542f66d..e7068b1af 100644 --- a/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx +++ b/docs/guides/anomaly-detection-configuration/update_timestamp_column.mdx @@ -19,7 +19,7 @@ The test can work in a couple of modes: -```yaml test +```yml test models: - name: this_is_a_model tests: diff --git a/docs/guides/anomaly-detection-configuration/where-expression.mdx b/docs/guides/anomaly-detection-configuration/where-expression.mdx index 8f0096c6f..7413e4482 100644 --- a/docs/guides/anomaly-detection-configuration/where-expression.mdx +++ b/docs/guides/anomaly-detection-configuration/where-expression.mdx @@ -9,11 +9,10 @@ Filter the tested data using a valid sql expression. - _Default: None_ - _Relevant tests: All anomaly detection tests_ -- _Configuration level: test_ -```yaml test +```yml test models: - name: this_is_a_model tests: @@ -21,4 +20,18 @@ models: where_expression: "user_name != 'test'" ``` +```yml model +models: + - name: this_is_a_model + config: + elementary: + where_expression: "loaded_at is not null" +``` + +```yml dbt_project.yml +vars: + timestamp_column: "loaded_at > '2022-01-01'" +``` + + \ No newline at end of file From 7d51aa2cde8741734a4d43b78ffad22a648904cf Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 15:17:09 +0300 Subject: [PATCH 123/194] removed `table_anomalies` --- docs/guides/add-elementary-tests.mdx | 13 +++------ .../column-anomalies.mdx | 2 +- .../guides/elementary-tests-configuration.mdx | 6 ++-- docs/tutorial/adding-elementary-tests.mdx | 28 ++++++------------- 4 files changed, 17 insertions(+), 32 deletions(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 4bd341013..4bac22a9f 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -87,10 +87,8 @@ models: elementary: timestamp_column: < timestamp column > tests: - - elementary.table_anomalies: - table_anomalies: < specific monitors, all if null > + - elementary.freshness_anomalies: # optional - configure different freshness column than timestamp column - freshness_column: < freshness_column > where_expression: < sql expression > time_bucket: period: < time period > @@ -128,10 +126,7 @@ models: elementary: timestamp_column: 'loaded_at' tests: - - elementary.table_anomalies: - table_anomalies: - - row_count - - freshness + - elementary.volume_anomalies: # optional - use tags to run elementary tests on a dedicated run tags: ['elementary'] config: @@ -160,7 +155,7 @@ models: - name: users ## if no timestamp is configured, elementary will monitor without time filtering tests: - elementary.table_anomalies + elementary.volume_anomalies tags: ['elementary'] columns: - name: user_id @@ -203,7 +198,7 @@ sources: elementary: timestamp_column: "loaded_at" tests: - - elementary.table_anomalies + - elementary.freshness_anomalies - elementary.dimension_anomalies: dimensions: - event_type diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 7572b6831..4a828dbc7 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -87,7 +87,7 @@ models: - name: users ## if no timestamp is configured, elementary will monitor without time filtering tests: - elementary.table_anomalies + elementary.volume_anomalies tags: ['elementary'] columns: - name: user_id diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index 39538ef2f..a085c52ae 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -136,7 +136,7 @@ models: elementary: timestamp_column: updated_at tests: - - elementary.table_anomalies: + - elementary.freshness_anomalies: tags: ["elementary"] - elementary.all_columns_anomalies: tags: ["elementary"] @@ -144,7 +144,7 @@ models: - name: users ## if no timestamp is configured, elementary will monitor without time filtering tests: - - elementary.table_anomalies: + - elementary.volume_anomalies: tags: ["elementary"] ``` @@ -174,7 +174,7 @@ sources: elementary: timestamp_column: "loaded_at" tests: - - elementary.table_anomalies + - elementary.volume_anomalies - elementary.all_columns_anomalies: column_anomalies: - null_count diff --git a/docs/tutorial/adding-elementary-tests.mdx b/docs/tutorial/adding-elementary-tests.mdx index 7d4b0a4d1..0d104f130 100644 --- a/docs/tutorial/adding-elementary-tests.mdx +++ b/docs/tutorial/adding-elementary-tests.mdx @@ -21,7 +21,7 @@ A `schema.yml` file that includes all the below tests can be found at the bottom First, we will use the **tables_anomalies** test to perform a **row_count**. This tests counts the number of rows created in a given period to determine if there have been any anomalies in the number of signups in a given time period. -We will add the **table_anomalies** test using **row_count** as a monitor as follows: +We will add the **volume_anomalies** test as follows: ```yaml models: @@ -30,12 +30,10 @@ models: config: tags: ["PII"] tests: - - elementary.table_anomalies: - table_anomalies: - - row_count + - elementary.volume_anomalies ``` -Now that we have selected row_count as our monitor, we must define a column to use for our timestamp. This will be used to create time buckets for anomaly detection. We select the **signup_date** column as seen below: +Now that we have configured a test, we should define a column to use for our timestamp. This will be used to create time buckets for anomaly detection. We select the **signup_date** column as seen below: ```yaml models: @@ -46,9 +44,7 @@ models: elementary: timestamp_column: "signup_date" tests: - - elementary.table_anomalies: - table_anomalies: - - row_count + - elementary.volume_anomalies ``` This test will fail if there are any days (as defined by **signup_date**) where the number of rows exceeds 3 standard deviations above/below the mean. @@ -56,7 +52,7 @@ This test will fail if there are any days (as defined by **signup_date**) where
-Similar to Test 1, we will use the **table_anomalies** test and **row_count** to detect an anomalous number of returned orders in a given time period. In this test, however, we will define the timestamp column at the test level - instead of at the model level. +Similar to Test 1, we will use the **volume_anomalies** test to detect an anomalous number of returned orders in a given time period. In this test, however, we will define the timestamp column at the test level - instead of at the model level. ```yaml - name: returned_orders description: This table contains all of the returned orders @@ -64,10 +60,8 @@ Similar to Test 1, we will use the **table_anomalies** test and **row_count** to tags: ["finance"] tests: - - elementary.table_anomalies: + - elementary.volume_anomalies tags: ["table_anomalies"] - table_anomalies: - - row_count timestamp_column: "order_date" ```` @@ -140,10 +134,8 @@ models: elementary: timestamp_column: "signup_date" tests: - - elementary.table_anomalies: - table_anomalies: - - row_count - + - elementary.volume_anomalies + columns: - name: customer_id description: This is a unique identifier for a customer @@ -225,10 +217,8 @@ models: tags: ["finance"] tests: - - elementary.table_anomalies: + - elementary.volume_anomalies: tags: ["table_anomalies"] - table_anomalies: - - row_count timestamp_column: "order_date" columns: From 7b34e5923bf01378631aa051499d1cabc8edf03e Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 29 May 2023 15:50:49 +0300 Subject: [PATCH 124/194] docs min training set size --- .../min-training-set-size.mdx | 59 +++++++++++++++++++ .../all-columns-anomalies.mdx | 1 + .../column-anomalies.mdx | 1 + .../dimension-anomalies.mdx | 1 + .../event-freshness-anomalies.mdx | 1 + .../freshness-anomalies.mdx | 1 + .../volume-anomalies.mdx | 1 + .../guides/elementary-tests-configuration.mdx | 2 + docs/mint.json | 1 + 9 files changed, 68 insertions(+) create mode 100644 docs/guides/anomaly-detection-configuration/min-training-set-size.mdx diff --git a/docs/guides/anomaly-detection-configuration/min-training-set-size.mdx b/docs/guides/anomaly-detection-configuration/min-training-set-size.mdx new file mode 100644 index 000000000..91f9c8ab1 --- /dev/null +++ b/docs/guides/anomaly-detection-configuration/min-training-set-size.mdx @@ -0,0 +1,59 @@ +--- +title: "min_training_set_size" +sidebarTitle: "min_training_set_size" +--- + +`min_training_set_size: [int]` + +The minimal amount of data points a test requires for calculating and detecting an anomaly. +It's recommended not to configure a value smaller than 14, so the result could be statistically significant. + +- _Default: 14_ +- _Relevant tests: All anomaly detection tests_ + + + min_training_set_size change impact + + + + + +```yml test +models: + - name: this_is_a_model + tests: + - elementary.volume_anomalies: + min_training_set_size: 20 +``` + +```yml model +models: + - name: this_is_a_model + config: + elementary: + min_training_set_size: 18 +``` + +```yml dbt_project.yml +vars: + min_training_set_size: 15 +``` + + + + + + +#### How it works? + +If the test won't have at least `min_training_set_size` it will pass, as there isn't enough data to determine if there is an anomaly. +The Elementary report will show a message saying "Not enough data to calculate anomaly score" instead of a graph. + +#### The impact of changing `min_training_set_size` + +If you **increase `min_training_set_size`** your test training set will be larger. This means a larger sample size for calculating the expected range, which should make the test less sensitive to outliers. This means less chance of false positive anomalies, but also less sensitivity so anomalies have a higher threshold. + +If you **decrease `min_training_set_size`** your test training set will be smaller. This means a smaller sample size for calculating the expected range, which might make the test more sensitive to outliers. This means more chance of false positive anomalies, but also more sensitivity as anomalies have a lower threshold. \ No newline at end of file diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index b3b9a182b..36a690af7 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -30,6 +30,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 4a828dbc7..164edac57 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -27,6 +27,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index 7155ae525..b1b1eedae 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -26,6 +26,7 @@ _Required configuration: `dimensions`_ -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index c76b159de..632813176 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -33,6 +33,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 47fa9a787..18bc080c8 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -29,6 +29,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- anomaly_sensitivity: int> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 5921e41af..0255d6d7c 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -31,6 +31,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- anomaly_direction: [both | spike | drop]> -- days_back: int> -- backfill_days: int> + -- min_training_set_size: int> -- time_bucket:>       period: [hour | day | week | month]       count: int diff --git a/docs/guides/elementary-tests-configuration.mdx b/docs/guides/elementary-tests-configuration.mdx index a085c52ae..dab5a401a 100644 --- a/docs/guides/elementary-tests-configuration.mdx +++ b/docs/guides/elementary-tests-configuration.mdx @@ -41,6 +41,7 @@ Elementary searches and prioritizes configuration in the following order: -- timestamp_column: column name> -- where_expression: sql expression> -- anomaly_sensitivity: int> + -- min_training_set_size: int> -- anomaly_direction: [both | spike | drop]> Anomaly detection tests with timestamp_column: @@ -80,6 +81,7 @@ Elementary searches and prioritizes configuration in the following order: Training period and training set: -- days_back: int> + -- min_training_set_size: int> -- seasonality: day_of_week> Time buckets: diff --git a/docs/mint.json b/docs/mint.json index 63d6245fe..55e6f0daa 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -135,6 +135,7 @@ "guides/anomaly-detection-configuration/backfill-days", "guides/anomaly-detection-configuration/time-bucket", "guides/anomaly-detection-configuration/seasonality", + "guides/anomaly-detection-configuration/min-training-set-size", "guides/anomaly-detection-configuration/column-anomalies", "guides/anomaly-detection-configuration/exclude_prefix", "guides/anomaly-detection-configuration/exclude_regexp", From fadd5f9b7652584899a37af0c2d8b9d4fa7711fe Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 1 Jun 2023 07:41:29 +0300 Subject: [PATCH 125/194] Update collect-job-data.mdx --- docs/deployment-and-configuration/collect-job-data.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 5990bb2ae..7ef4aa542 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -3,7 +3,7 @@ title: "Collect jobs info from orchestrator" sidebarTitle: "Jobs name & info" --- -🚧 _Under development_ 🚧 +_Supported in Elementary 0.8.0 and above_ Elementary can collect metadata about your jobs from the orchestrator you are using, and enrich the Elementary report with this information. @@ -79,4 +79,4 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa ## Can't find your orchestrator? Missing info? We would love to support more orchestrators and collect more useful info! -Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. \ No newline at end of file +Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add. From d409db8267150fb63de66ab191a860f5f168131f Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:43:19 +0300 Subject: [PATCH 126/194] jobs info --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 36a690af7..dd729dc95 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int +      period: [hour | day | week | month] +      count: int -- seasonality: day_of_week>
From 134b0289c45f883665820884a77baec7c5540fed Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:43:59 +0300 Subject: [PATCH 127/194] jobs info --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index dd729dc95..1c79fceab 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -      period: [hour | day | week | month] -      count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week>
From c24d888627bd143b6473e8f855c6eb300de5190c Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:44:19 +0300 Subject: [PATCH 128/194] jobs info --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 1c79fceab..eec0ccd00 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -   period: [hour | day | week | month] -   count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week> From 1150725e91cde02b9ad9a428c8bc369429620143 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 07:45:52 +0300 Subject: [PATCH 129/194] jobs info --- docs/guides/anomaly-detection-tests/column-anomalies.mdx | 4 ++-- docs/guides/anomaly-detection-tests/dimension-anomalies.mdx | 4 ++-- .../anomaly-detection-tests/event-freshness-anomalies.mdx | 4 ++-- docs/guides/anomaly-detection-tests/freshness-anomalies.mdx | 4 ++-- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 164edac57..7fcb550cb 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -29,8 +29,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index b1b1eedae..a8f78dde6 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -28,8 +28,8 @@ _Required configuration: `dimensions`_ -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int +   period: [hour | day | week | month] + count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index 632813176..d97e2e4b5 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -35,8 +35,8 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int + period: [hour | day | week | month] + count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 18bc080c8..9c6093642 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -31,8 +31,8 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int + period: [hour | day | week | month] + count: int -- seasonality: day_of_week> diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 0255d6d7c..9fcc87628 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -       period: [hour | day | week | month] -       count: int + period: [hour | day | week | month] + count: int -- seasonality: day_of_week> From 077304bc00d9a7dc7b2e614b4d26ec070c43e780 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:38:40 +0300 Subject: [PATCH 130/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 9fcc87628..cb0e62cca 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> - period: [hour | day | week | month] - count: int +   period: [hour | day | week | month] +   count: int -- seasonality: day_of_week> From 40d3bfb6d05518e47a65e7612e06cc6fe22d3a7a Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:41:38 +0300 Subject: [PATCH 131/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index cb0e62cca..21b114f51 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -   period: [hour | day | week | month] -   count: int +    period: [hour | day | week | month] +    count: int -- seasonality: day_of_week> From b17df3fd782a68b5d307f905fb5ffb89949baae8 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:42:10 +0300 Subject: [PATCH 132/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 21b114f51..10e68d3fc 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim -- backfill_days: int> -- min_training_set_size: int> -- time_bucket:> -    period: [hour | day | week | month] -    count: int +     period: [hour | day | week | month] +     count: int -- seasonality: day_of_week> From 5087841c030bf33b4a91d90d6ef1dcd498bb08e5 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 17:42:56 +0300 Subject: [PATCH 133/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 10e68d3fc..c7c03812f 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -24,8 +24,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:
-    elementary.volume_anomalies:
-      -- timestamp_column: column name>
+        elementary.volume_anomalies:
+          -- timestamp_column: column name>
       -- where_expression: sql expression>
       -- anomaly_sensitivity: int>
       -- anomaly_direction: [both | spike | drop]>

From 9cb909f3bae1688a5dd718c3dc74472242ff71fb Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Thu, 1 Jun 2023 17:44:39 +0300
Subject: [PATCH 134/194] jobs info

---
 docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
index c7c03812f..b89113aa6 100644
--- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
@@ -24,7 +24,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim
 
  
   tests:
-        elementary.volume_anomalies:
+       elementary.volume_anomalies:
           -- timestamp_column: column name>
       -- where_expression: sql expression>
       -- anomaly_sensitivity: int>

From b67c5345c4c28bbb575426bf1a67bd84f5a47aa8 Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Thu, 1 Jun 2023 17:45:35 +0300
Subject: [PATCH 135/194] jobs info

---
 .../volume-anomalies.mdx                      | 20 +++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
index b89113aa6..a078083f9 100644
--- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx
@@ -26,16 +26,16 @@ No mandatory configuration, however it is highly recommended to configure a `tim
   tests:
        elementary.volume_anomalies:
           -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- anomaly_direction: [both | spike | drop]>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-          period: [hour | day | week | month]
-          count: int
-      -- seasonality: day_of_week>
+          -- where_expression: sql expression>
+          -- anomaly_sensitivity: int>
+          -- anomaly_direction: [both | spike | drop]>
+          -- days_back: int>
+          -- backfill_days: int>
+          -- min_training_set_size: int>
+          -- time_bucket:>
+             period: [hour | day | week | month]
+             count: int
+          -- seasonality: day_of_week>
  
 
From cfa9e8778720660626082fca819d0379e05bf9a7 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:08:38 +0300 Subject: [PATCH 136/194] jobs info --- docs/guides/anomaly-detection-tests/volume-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index a078083f9..24b552351 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -33,8 +33,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim     -- backfill_days: int>     -- min_training_set_size: int>     -- time_bucket:> -        period: [hour | day | week | month] -        count: int +         period: [hour | day | week | month] +         count: int     -- seasonality: day_of_week>
From 87257e8c203c7d2d0d025ba7fd4c14c6147f4420 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:09:35 +0300 Subject: [PATCH 137/194] jobs info --- .../volume-anomalies.mdx | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 24b552351..5990b989b 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -24,18 +24,18 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:
-       elementary.volume_anomalies:
-          -- timestamp_column: column name>
-          -- where_expression: sql expression>
-          -- anomaly_sensitivity: int>
-          -- anomaly_direction: [both | spike | drop]>
-          -- days_back: int>
-          -- backfill_days: int>
-          -- min_training_set_size: int>
-          -- time_bucket:>
+       -- elementary.volume_anomalies:
+          timestamp_column: column name>
+          where_expression: sql expression>
+          anomaly_sensitivity: int>
+          anomaly_direction: [both | spike | drop]>
+          days_back: int>
+          backfill_days: int>
+          min_training_set_size: int>
+          time_bucket:>
               period: [hour | day | week | month]
               count: int
-          -- seasonality: day_of_week>
+          seasonality: day_of_week>
  
 
From ba2d0a6882d8f9dae44e758570a628aaa0ca9266 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:10:33 +0300 Subject: [PATCH 138/194] jobs info --- .../volume-anomalies.mdx | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 5990b989b..507f498c7 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -     timestamp_column: column name> -     where_expression: sql expression> -     anomaly_sensitivity: int> -     anomaly_direction: [both | spike | drop]> -     days_back: int> -     backfill_days: int> -     min_training_set_size: int> -     time_bucket:> +      timestamp_column: column name> +      where_expression: sql expression> +      anomaly_sensitivity: int> +      anomaly_direction: [both | spike | drop]> +      days_back: int> +      backfill_days: int> +      min_training_set_size: int> +      time_bucket:>         period: [hour | day | week | month]         count: int -     seasonality: day_of_week> +      seasonality: day_of_week> From 1a169ba738a7497e4799b7e98679781503bb4e70 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:16:15 +0300 Subject: [PATCH 139/194] jobs info --- .../volume-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 507f498c7..77de63b93 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -      timestamp_column: column name> -      where_expression: sql expression> -      anomaly_sensitivity: int> -      anomaly_direction: [both | spike | drop]> -      days_back: int> -      backfill_days: int> -      min_training_set_size: int> -      time_bucket:> -         period: [hour | day | week | month] -         count: int -      seasonality: day_of_week> +       timestamp_column: column name> +       where_expression: sql expression> +       anomaly_sensitivity: int> +       anomaly_direction: [both | spike | drop]> +       days_back: int> +       backfill_days: int> +       min_training_set_size: int> +       time_bucket:> +          period: [hour | day | week | month] +          count: int +       seasonality: day_of_week> From a9a1c0d23351fa7a68ac0a7780b302fa6566cffc Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:18:14 +0300 Subject: [PATCH 140/194] jobs info --- .../volume-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index 77de63b93..b6f05cbb7 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -       timestamp_column: column name> -       where_expression: sql expression> -       anomaly_sensitivity: int> -       anomaly_direction: [both | spike | drop]> -       days_back: int> -       backfill_days: int> -       min_training_set_size: int> -       time_bucket:> -          period: [hour | day | week | month] -          count: int -       seasonality: day_of_week> +         timestamp_column: column name> +         where_expression: sql expression> +         anomaly_sensitivity: int> +         anomaly_direction: [both | spike | drop]> +         days_back: int> +         backfill_days: int> +         min_training_set_size: int> +         time_bucket:> +            period: [hour | day | week | month] +            count: int +         seasonality: day_of_week> From 57f8580bb76324f2ee48e8076bb86f09e8434f80 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:18:41 +0300 Subject: [PATCH 141/194] jobs info --- .../volume-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index b6f05cbb7..d4b1fd77e 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -         timestamp_column: column name> -         where_expression: sql expression> -         anomaly_sensitivity: int> -         anomaly_direction: [both | spike | drop]> -         days_back: int> -         backfill_days: int> -         min_training_set_size: int> -         time_bucket:> -            period: [hour | day | week | month] -            count: int -         seasonality: day_of_week> +        timestamp_column: column name> +        where_expression: sql expression> +        anomaly_sensitivity: int> +        anomaly_direction: [both | spike | drop]> +        days_back: int> +        backfill_days: int> +        min_training_set_size: int> +        time_bucket:> +           period: [hour | day | week | month] +           count: int +        seasonality: day_of_week> From 589f44c2d5abafefd297cc9003f2b4951c34d779 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:28:27 +0300 Subject: [PATCH 142/194] tests config formating --- .../all-columns-anomalies.mdx | 31 ++++++++++--------- .../column-anomalies.mdx | 27 ++++++++-------- .../dimension-anomalies.mdx | 27 ++++++++-------- .../event-freshness-anomalies.mdx | 26 ++++++++-------- .../freshness-anomalies.mdx | 24 +++++++------- 5 files changed, 69 insertions(+), 66 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index eec0ccd00..30642fe87 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -20,21 +20,22 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
-  tests:
-    elementary.volume_anomalies:
-      -- column_anomalies: column monitors list>
-      -- exclude_prefix: string>
-      -- exclude_regexp: regex>
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-               period: [hour | day | week | month]
-               count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.all_columns_anomalies:
+             column_anomalies: column monitors list>
+             exclude_prefix: string>
+             exclude_regexp: regex>
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                     
+             anomaly_direction: [both | spike | drop]>                                       
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>           
  
 
diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx index 7fcb550cb..98ca6294e 100644 --- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx @@ -19,19 +19,20 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
-  tests:
-    elementary.volume_anomalies:
-      -- column_anomalies: column monitors list>
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-               period: [hour | day | week | month]
-               count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.column_anomalies:
+             column_anomalies: column monitors list>
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                     
+             anomaly_direction: [both | spike | drop]>                                       
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>                                                             
  
 
diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index a8f78dde6..61c2cdd8c 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -18,19 +18,20 @@ _Required configuration: `dimensions`_
  
-  tests:
-    elementary.volume_anomalies:
-      -- dimensions: sql expression>
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-               period: [hour | day | week | month]
-               count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.dimension_anomalies:
+             dimensions: sql expression>
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                     
+             anomaly_direction: [both | spike | drop]>                                       
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>                                                             
  
 
diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index d97e2e4b5..1a0b2bee6 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -25,19 +25,19 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:
-    elementary.volume_anomalies:
-      -- event_timestamp_column: column name>
-      -- update_timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-              period: [hour | day | week | month]
-              count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.event_freshness_anomalies`:                                                                                                                                                                                
+             event_timestamp_column: column name>
+             update_timestamp_column: column name>
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                      
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>        
  
 
diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 9c6093642..7d03782f9 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,18 +22,18 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:
-    elementary.volume_anomalies:
-      -- timestamp_column: column name>
-      -- where_expression: sql expression>
-      -- anomaly_sensitivity: int>
-      -- days_back: int>
-      -- backfill_days: int>
-      -- min_training_set_size: int>
-      -- time_bucket:>
-                period: [hour | day | week | month]
-                count: int
-      -- seasonality: day_of_week>
+  tests:                                                                                                                                                                                                                        
+       -- elementary.freshness_anomalies:                                                                                                                                                                                
+             timestamp_column: column name>                                                   
+             where_expression: sql expression>                                                
+             anomaly_sensitivity: int>                                                      
+             days_back: int>                                                                         
+             backfill_days: int>                                                                 
+             min_training_set_size: int>                                                 
+             time_bucket:>                                                                         
+                period: [hour | day | week | month]                                 
+                count: int                                                          
+             seasonality: day_of_week>                                                             
  
 
From c3d260d43586042ceff3cdcc3282bb9024efa804 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 18:59:45 +0300 Subject: [PATCH 143/194] tests config formating --- .../all-columns-anomalies.mdx | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 30642fe87..90b0d3526 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -39,6 +39,24 @@ No mandatory configuration, however it is highly recommended to configure a `tim +
                                                                                                                                                                                                 
+                                                                                                                                                                                                
+  tests:                                                                                                                                                                                              
+       -- elementary.volume_anomalies:                                                                                                                                                      
+             timestamp_column: column name>                         
+             where_expression: sql expression>                      
+             anomaly_sensitivity: int>                           
+             anomaly_direction: [both | spike | drop]>             
+             days_back: int>                                               
+             backfill_days: int>                                       
+             min_training_set_size: int>                       
+             time_bucket:>                                               
+                period: [hour | day | week | month]       
+                count: int                                
+             seasonality: day_of_week>                                   
+                                                                                                                                                                                               
+
+ From cc4e3ca00c55557f8147705514388c8b72ff3dd3 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:01:13 +0300 Subject: [PATCH 144/194] tests config formating --- .../all-columns-anomalies.mdx | 50 ++++++------------- 1 file changed, 16 insertions(+), 34 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 90b0d3526..89b340366 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -21,41 +21,23 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:                                                                                                                                                                                                                        
-       -- elementary.all_columns_anomalies:
-             column_anomalies: column monitors list>
-             exclude_prefix: string>
-             exclude_regexp: regex>
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                     
-             anomaly_direction: [both | spike | drop]>                                       
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>           
+      -- elementary.all_columns_anomalies:
+            column_anomalies: column monitors list>
+            exclude_prefix: string>
+            exclude_regexp: regex>
+            timestamp_column: column name>                                                   
+            where_expression: sql expression>                                                
+            anomaly_sensitivity: int>                                                     
+            anomaly_direction: [both | spike | drop]>                                       
+            days_back: int>                                                                         
+            backfill_days: int>                                                                 
+            min_training_set_size: int>                                                 
+            time_bucket:>                                                                         
+               period: [hour | day | week | month]                                 
+               count: int                                                          
+            seasonality: day_of_week>           
  
-
- -
                                                                                                                                                                                                 
-                                                                                                                                                                                                
-  tests:                                                                                                                                                                                              
-       -- elementary.volume_anomalies:                                                                                                                                                      
-             timestamp_column: column name>                         
-             where_expression: sql expression>                      
-             anomaly_sensitivity: int>                           
-             anomaly_direction: [both | spike | drop]>             
-             days_back: int>                                               
-             backfill_days: int>                                       
-             min_training_set_size: int>                       
-             time_bucket:>                                               
-                period: [hour | day | week | month]       
-                count: int                                
-             seasonality: day_of_week>                                   
-                                                                                                                                                                                               
-
+ From afb40daa6bf465db7d08921cf2446a8917e5b494 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:01:59 +0300 Subject: [PATCH 145/194] tests config formating --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 89b340366..99aa46bd3 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -21,7 +21,7 @@ No mandatory configuration, however it is highly recommended to configure a `tim
  
   tests:                                                                                                                                                                                                                        
-      -- elementary.all_columns_anomalies:
+    -- elementary.all_columns_anomalies:
             column_anomalies: column monitors list>
             exclude_prefix: string>
             exclude_regexp: regex>

From 311e52a609a6cc73d6559bd60a907e09388f5b5f Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Thu, 1 Jun 2023 19:02:35 +0300
Subject: [PATCH 146/194] tests config formating

---
 .../all-columns-anomalies.mdx                 | 31 +++++++++----------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx
index 99aa46bd3..75ee3aaa3 100644
--- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx
@@ -19,23 +19,22 @@ You can use `column_anomalies` param to override the default monitors, and `excl
 No mandatory configuration, however it is highly recommended to configure a `timestamp_column`.
 
 
- 
-  tests:                                                                                                                                                                                                                        
+                                                                                                                                                                                                                  
     -- elementary.all_columns_anomalies:
-            column_anomalies: column monitors list>
-            exclude_prefix: string>
-            exclude_regexp: regex>
-            timestamp_column: column name>                                                   
-            where_expression: sql expression>                                                
-            anomaly_sensitivity: int>                                                     
-            anomaly_direction: [both | spike | drop]>                                       
-            days_back: int>                                                                         
-            backfill_days: int>                                                                 
-            min_training_set_size: int>                                                 
-            time_bucket:>                                                                         
-               period: [hour | day | week | month]                                 
-               count: int                                                          
-            seasonality: day_of_week>           
+         column_anomalies: column monitors list>
+         exclude_prefix: string>
+         exclude_regexp: regex>
+         timestamp_column: column name>                                                   
+         where_expression: sql expression>                                                
+         anomaly_sensitivity: int>                                                     
+         anomaly_direction: [both | spike | drop]>                                       
+         days_back: int>                                                                         
+         backfill_days: int>                                                                 
+         min_training_set_size: int>                                                 
+         time_bucket:>                                                                         
+            period: [hour | day | week | month]                                 
+            count: int                                                          
+         seasonality: day_of_week>           
  
 
From 9719dd11dbf690cddd1e52e4842a0c773b9b1bb3 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:03:07 +0300 Subject: [PATCH 147/194] tests config formating --- .../all-columns-anomalies.mdx | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 75ee3aaa3..6c5d90ec1 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -21,20 +21,20 @@ No mandatory configuration, however it is highly recommended to configure a `tim
                                                                                                                                                                                                                   
     -- elementary.all_columns_anomalies:
-         column_anomalies: column monitors list>
-         exclude_prefix: string>
-         exclude_regexp: regex>
-         timestamp_column: column name>                                                   
-         where_expression: sql expression>                                                
-         anomaly_sensitivity: int>                                                     
-         anomaly_direction: [both | spike | drop]>                                       
-         days_back: int>                                                                         
-         backfill_days: int>                                                                 
-         min_training_set_size: int>                                                 
-         time_bucket:>                                                                         
-            period: [hour | day | week | month]                                 
-            count: int                                                          
-         seasonality: day_of_week>           
+      column_anomalies: column monitors list>
+      exclude_prefix: string>
+      exclude_regexp: regex>
+      timestamp_column: column name>                                                   
+      where_expression: sql expression>                                                
+      anomaly_sensitivity: int>                                                     
+      anomaly_direction: [both | spike | drop]>                                       
+      days_back: int>                                                                         
+      backfill_days: int>                                                                 
+      min_training_set_size: int>                                                 
+      time_bucket:>                                                                         
+      nbsp;   period: [hour | day | week | month]                                 
+      nbsp;   count: int                                                          
+      seasonality: day_of_week>           
  
 
From 3313b5ba6b7bd849b272a1b2e7d70087082a7448 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 1 Jun 2023 19:04:14 +0300 Subject: [PATCH 148/194] tests config formating --- .../all-columns-anomalies.mdx | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 6c5d90ec1..ddd173535 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -24,17 +24,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim column_anomalies: column monitors list> exclude_prefix: string> exclude_regexp: regex> - timestamp_column: column name> - where_expression: sql expression> - anomaly_sensitivity: int> - anomaly_direction: [both | spike | drop]> - days_back: int> - backfill_days: int> - min_training_set_size: int> - time_bucket:> - nbsp;   period: [hour | day | week | month] - nbsp;   count: int - seasonality: day_of_week> + timestamp_column: column name> + where_expression: sql expression> + anomaly_sensitivity: int> + anomaly_direction: [both | spike | drop]> + days_back: int> + backfill_days: int> + min_training_set_size: int> + time_bucket:> + nbsp;   period: [hour | day | week | month] + nbsp;   count: int + seasonality: day_of_week>
From bfd92e523eded9792427f4f4aa838d96bb05894d Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sat, 3 Jun 2023 16:20:59 +0300 Subject: [PATCH 149/194] tests config formating --- .../anomaly-detection-configuration/anomaly-direction.mdx | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx index 4e1e22698..26ec52192 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx @@ -40,6 +40,10 @@ models: ``` +
+ + + ```yml model models: - name: this_is_a_model @@ -48,6 +52,10 @@ models: anomaly_direction: drop ``` + + + + ```yml dbt_project vars: anomaly_direction: both From 56d9fe7a9b40aa730d5341bdbb9a1ea997a9f417 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sun, 4 Jun 2023 14:16:43 +0300 Subject: [PATCH 150/194] tests config formating --- .../anomaly-detection-configuration/anomaly-direction.mdx | 8 -------- 1 file changed, 8 deletions(-) diff --git a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx index 26ec52192..4e1e22698 100644 --- a/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx +++ b/docs/guides/anomaly-detection-configuration/anomaly-direction.mdx @@ -40,10 +40,6 @@ models: ``` - - - - ```yml model models: - name: this_is_a_model @@ -52,10 +48,6 @@ models: anomaly_direction: drop ``` - - - - ```yml dbt_project vars: anomaly_direction: both From ae75edf2d4026c76303fd3ad20298a4636e22208 Mon Sep 17 00:00:00 2001 From: Hahnbee Lee <55263191+hahnbeelee@users.noreply.github.com> Date: Mon, 5 Jun 2023 04:13:43 -0700 Subject: [PATCH 151/194] Remove hidden new lines in code block --- .../event-freshness-anomalies.mdx | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index 1a0b2bee6..14361c2b6 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -25,19 +25,19 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:                                                                                                                                                                                                                        
-       -- elementary.event_freshness_anomalies`:                                                                                                                                                                                
+  tests:
+       -- elementary.event_freshness_anomalies:
              event_timestamp_column: column name>
              update_timestamp_column: column name>
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                      
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>        
+             where_expression: sql expression>
+             anomaly_sensitivity: int>     
+             days_back: int>
+             backfill_days: int>
+             min_training_set_size: int>
+             time_bucket:>
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week>
  
 
@@ -73,4 +73,4 @@ models: severity: warn ``` -
\ No newline at end of file +
From d4a4b45e30edcb3f60579da933bc7aaa4f0627ce Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Mon, 5 Jun 2023 14:44:00 +0300 Subject: [PATCH 152/194] Update event-freshness-anomalies.mdx --- .../event-freshness-anomalies.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index 14361c2b6..a468fc4a0 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -27,17 +27,17 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ tests:    -- elementary.event_freshness_anomalies: -        event_timestamp_column: column name> -        update_timestamp_column: column name> -        where_expression: sql expression> -        anomaly_sensitivity: int> +        event_timestamp_column: column name +        update_timestamp_column: column name +        where_expression: sql expression +        anomaly_sensitivity: int        days_back: int> -        backfill_days: int> -        min_training_set_size: int> -        time_bucket:> +        backfill_days: int +        min_training_set_size: int +        time_bucket:           period: [hour | day | week | month]           count: int -        seasonality: day_of_week> +        seasonality: day_of_week From 77bfe91c54371eca82203eda140b6fece18f4d66 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 5 Jun 2023 14:46:44 +0300 Subject: [PATCH 153/194] tests config formating --- .../event-freshness-anomalies.mdx | 4 ++-- .../volume-anomalies.mdx | 18 +++++++++--------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx index a468fc4a0..a0d7558b5 100644 --- a/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -30,8 +30,8 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._        event_timestamp_column: column name        update_timestamp_column: column name        where_expression: sql expression -        anomaly_sensitivity: int -        days_back: int> +        anomaly_sensitivity: int +        days_back: int        backfill_days: int        min_training_set_size: int        time_bucket: diff --git a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx index d4b1fd77e..f9c640f66 100644 --- a/docs/guides/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/volume-anomalies.mdx @@ -25,17 +25,17 @@ No mandatory configuration, however it is highly recommended to configure a `tim tests:    -- elementary.volume_anomalies: -        timestamp_column: column name> -        where_expression: sql expression> -        anomaly_sensitivity: int> -        anomaly_direction: [both | spike | drop]> -        days_back: int> -        backfill_days: int> -        min_training_set_size: int> -        time_bucket:> +        timestamp_column: column name +        where_expression: sql expression +        anomaly_sensitivity: int +        anomaly_direction: [both | spike | drop] +        days_back: int +        backfill_days: int +        min_training_set_size: int +        time_bucket:           period: [hour | day | week | month]           count: int -        seasonality: day_of_week> +        seasonality: day_of_week From 036e7ce964351d9da90f41f5baadea4c91fa66bb Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 5 Jun 2023 14:49:50 +0300 Subject: [PATCH 154/194] tests config formating --- .../freshness-anomalies.mdx | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 7d03782f9..28f9eaf2a 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,18 +22,18 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:                                                                                                                                                                                                                        
-       -- elementary.freshness_anomalies:                                                                                                                                                                                
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                      
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>                                                             
+  tests:
+       -- elementary.freshness_anomalies:
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week
  
 
From 02a2c6af3558947d81ec0efc4e707b2987778e07 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Mon, 5 Jun 2023 14:54:12 +0300 Subject: [PATCH 155/194] Update freshness-anomalies.mdx --- docs/guides/anomaly-detection-tests/freshness-anomalies.mdx | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 28f9eaf2a..5f8f72afe 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,7 +22,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:
+  tests:      
        -- elementary.freshness_anomalies:
              timestamp_column: column name
              where_expression: sql expression
@@ -33,7 +33,6 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
              time_bucket:
                 period: [hour | day | week | month]
                 count: int
-             seasonality: day_of_week
  
 
@@ -67,4 +66,4 @@ models: severity: warn ``` - \ No newline at end of file + From fb16a95a289603375050f6c3df8ae01227ee5788 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Mon, 5 Jun 2023 14:54:43 +0300 Subject: [PATCH 156/194] Update freshness-anomalies.mdx --- docs/guides/anomaly-detection-tests/freshness-anomalies.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx index 5f8f72afe..5817f79ac 100644 --- a/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/freshness-anomalies.mdx @@ -22,7 +22,7 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._
  
-  tests:      
+  tests:
        -- elementary.freshness_anomalies:
              timestamp_column: column name
              where_expression: sql expression

From 0665d483c0a4798316d62a9797b9b20f9589b942 Mon Sep 17 00:00:00 2001
From: Maayan Salom 
Date: Mon, 5 Jun 2023 19:12:40 +0300
Subject: [PATCH 157/194] Update column-anomalies.mdx

---
 .../column-anomalies.mdx                      | 28 +++++++++----------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/docs/guides/anomaly-detection-tests/column-anomalies.mdx b/docs/guides/anomaly-detection-tests/column-anomalies.mdx
index 98ca6294e..165ccb757 100644
--- a/docs/guides/anomaly-detection-tests/column-anomalies.mdx
+++ b/docs/guides/anomaly-detection-tests/column-anomalies.mdx
@@ -19,20 +19,20 @@ No mandatory configuration, however it is highly recommended to configure a `tim
 
 
  
-  tests:                                                                                                                                                                                                                        
+  tests:
        -- elementary.column_anomalies:
-             column_anomalies: column monitors list>
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                     
-             anomaly_direction: [both | spike | drop]>                                       
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>                                                             
+             column_anomalies: column monitors list
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             anomaly_direction: [both | spike | drop]
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week
  
 
@@ -110,4 +110,4 @@ models: tags: ['elementary'] ``` - \ No newline at end of file + From 8f892b84967841663e25f4d572cbd52b7500f001 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Mon, 5 Jun 2023 19:19:22 +0300 Subject: [PATCH 158/194] tests config formating --- .../all-columns-anomalies.mdx | 33 +++++++++---------- .../dimension-anomalies.mdx | 26 +++++++-------- 2 files changed, 29 insertions(+), 30 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index ddd173535..2b768d18f 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -19,25 +19,24 @@ You can use `column_anomalies` param to override the default monitors, and `excl No mandatory configuration, however it is highly recommended to configure a `timestamp_column`.
-                                                                                                                                                                                                                  
-    -- elementary.all_columns_anomalies:
-      column_anomalies: column monitors list>
-      exclude_prefix: string>
-      exclude_regexp: regex>
-      timestamp_column: column name>
-      where_expression: sql expression>
-      anomaly_sensitivity: int>
-      anomaly_direction: [both | spike | drop]>
-      days_back: int>
-      backfill_days: int>
-      min_training_set_size: int>
-      time_bucket:>
-      nbsp;   period: [hour | day | week | month]
-      nbsp;   count: int
-      seasonality: day_of_week>
+ 
+       -- elementary.all_columns_anomalies:
+             column_anomalies: column monitors list
+             exclude_prefix: string
+             exclude_regexp: regex
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             anomaly_direction: [both | spike | drop]
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+             nbsp;   period: [hour | day | week | month]
+             nbsp;   count: int
+             seasonality: day_of_week
  
 
- diff --git a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx index 61c2cdd8c..42965de14 100644 --- a/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/dimension-anomalies.mdx @@ -18,20 +18,20 @@ _Required configuration: `dimensions`_
  
-  tests:                                                                                                                                                                                                                        
+  tests:
        -- elementary.dimension_anomalies:
-             dimensions: sql expression>
-             timestamp_column: column name>                                                   
-             where_expression: sql expression>                                                
-             anomaly_sensitivity: int>                                                     
-             anomaly_direction: [both | spike | drop]>                                       
-             days_back: int>                                                                         
-             backfill_days: int>                                                                 
-             min_training_set_size: int>                                                 
-             time_bucket:>                                                                         
-                period: [hour | day | week | month]                                 
-                count: int                                                          
-             seasonality: day_of_week>                                                             
+             dimensions: sql expression
+             timestamp_column: column name
+             where_expression: sql expression
+             anomaly_sensitivity: int
+             anomaly_direction: [both | spike | drop]
+             days_back: int
+             backfill_days: int
+             min_training_set_size: int
+             time_bucket:
+                period: [hour | day | week | month]
+                count: int
+             seasonality: day_of_week
  
 
From e6399cd06bca4e488e3b378c1df3423b583c96cf Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 6 Jun 2023 11:21:09 +0300 Subject: [PATCH 159/194] Update all-columns-anomalies.mdx --- .../anomaly-detection-tests/all-columns-anomalies.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index 2b768d18f..d8548e969 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim        backfill_days: int        min_training_set_size: int        time_bucket: -        nbsp;   period: [hour | day | week | month] -        nbsp;   count: int +          period: [hour | day | week | month] +          count: int        seasonality: day_of_week
@@ -72,4 +72,4 @@ models: sensitivity: 3.5 ``` - \ No newline at end of file + From 2943f95c06b1528d996b57e966cb3ac34b93a8ae Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 6 Jun 2023 11:21:55 +0300 Subject: [PATCH 160/194] Update all-columns-anomalies.mdx --- docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx index d8548e969..b4ed53821 100644 --- a/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/guides/anomaly-detection-tests/all-columns-anomalies.mdx @@ -32,8 +32,8 @@ No mandatory configuration, however it is highly recommended to configure a `tim        backfill_days: int        min_training_set_size: int        time_bucket: -          period: [hour | day | week | month] -          count: int +           period: [hour | day | week | month] +           count: int        seasonality: day_of_week From 168e4905efdaf654abfeff30fd20900b1ad1cd17 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 7 Jun 2023 13:55:34 +0300 Subject: [PATCH 161/194] tests config formating --- docs/mint.json | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/mint.json b/docs/mint.json index 55e6f0daa..9c98b5b42 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -63,7 +63,10 @@ "introduction", { "group": "Quickstart", - "pages": ["quickstart", "quickstart-cli"] + "pages": [ + "quickstart", + "quickstart-cli" + ] }, { "group": "Tutorial", @@ -213,14 +216,19 @@ ] }, { - "group": "Getting Started", + "group": "Elementary Cloud", "pages": [ "cloud/introduction", + "cloud/general/security-and-privacy" + ] + }, + { + "group": "Onboarding", + "pages": [ "cloud/onboarding/quickstart-dbt-package", "cloud/onboarding/signup", "cloud/onboarding/connect-data-warehouse", - "cloud/manage-team", - "cloud/general/security-and-privacy" + "cloud/manage-team" ] } ], From 60c64da333ca56144c588a0dee88bacd1b105bff Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:50:05 +0300 Subject: [PATCH 162/194] cloud docs changes --- docs/cloud/introduction.mdx | 11 +- .../onboarding/connect-data-warehouse.mdx | 33 ++--- docs/cloud/onboarding/create-profile.mdx | 128 ++++++++++++++++++ .../onboarding/quickstart-dbt-package.mdx | 2 +- docs/mint.json | 3 +- 5 files changed, 143 insertions(+), 34 deletions(-) create mode 100644 docs/cloud/onboarding/create-profile.mdx diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index ce75c603a..d9de592b7 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -39,13 +39,4 @@ alt="Elementary Managed high level flow" 2. [Signup and setup integrations](/cloud/onboarding/signup). - - -## Security and privacy - -Elementary cloud requires access only to the Elementary schema and the tables in it. -The data in the schema in full is stored in the client's data warehouse. - -We secure Elementary cloud infrastructure with the highest standards. -You can delete your account at any time, and all your configuration and reports will be deleted immediately and permanently from Elementary servers. -For details, refer to our [Terms of Service](https://www.elementary-data.com/terms-of-service). + \ No newline at end of file diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 723e3cb09..98132d01a 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -5,34 +5,23 @@ sidebarTitle: "Data warehouse" You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). -Here are the steps needed to enable the connection: +Elementary Cloud needs: +- [`profiles.yml`](/cloud/onboarding/create-profile) with connection details +- Read permissions to the Elementary schema (and not the rest of your data) +- Network access (might require to allowlist Elementary IP address) -### Authentication and IP Allowlist - -Elementary needs authentication details, permissions to read the Elementary schema (and not the rest of your data), and network access enabled by adding the cloud IPs to your data warehouse allowlist. - -Elementary IP for allowlist: `3.126.156.226` - -### Create a `profiles.yml` file - -You will need to provide the connection and authentication details by uploading a YML file with a connection profile named `elementary`. -The profile needs to point at the database and schema name where your elementary tables are. - -The easiest way to generate the profile is to run the following command within the dbt project where you deployed the elementary dbt package (works in dbt cloud as well): +### Connect Elementary cloud -```shell -dbt run-operation elementary.generate_elementary_cli_profile -``` +On the `Account settings` under `Integrations`, press `Connect` on the "Connect Your data warehouse" section. -Save the output to a YML file, update the missing details, and you are ready. +Provide an environment name, select a data warehouse type, and upload the `profiles.yml` file with the `elementary` profile. -Here are the formats of profile for each supported data warehouse: - +### Allowlist Elementary IP +Elementary IP for allowlist: `3.126.156.226` -### Connect Elementary cloud -On the `Account settings` under `Integrations`, press `Connect` on the "Connect Your data warehouse" section. +### Need help with onboarding? -Provide an env name, select a data warehouse type, and upload the `profiles.yml` file with the `elementary` profile. +We can provide [support on Slack](https://join.slack.com/t/elementary-community/shared_invite/zt-1b9vogqmq-y~IRhc2396CbHNBXLsrXcA) or hop on an [onboarding call](https://savvycal.com/MaayanSa/df29881c). diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx new file mode 100644 index 000000000..f52b98cc1 --- /dev/null +++ b/docs/cloud/onboarding/create-profile.mdx @@ -0,0 +1,128 @@ +--- +title: "Create `profiles.yml` file" +sidebarTitle: "Create profiles.yml" +--- + +You will need to provide Elementary cloud a `profiles.yml` file with a connection profile named `elementary`. + +- The profile needs to point at the database and schema name where your elementary tables are. +- The provided credentials need to have read permissions to the elementary schema. + +The easiest way to generate the profile is: +1. Run the following command in the dbt project where elementary dbt package is deployed (works in dbt cloud as well): + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +2. Copy and save the output to a `profiles.yml` file, update the missing details, and you are ready. + +### Permissions and security + +**Elementary cloud doesn't need permissions to your sensitive data.** + +It is recommended to create a read only user for the elementary schema only, and provide it to Elementary Cloud in the profile. +For more details, refer to [security and privacy](/cloud/security-and-privacy). + +### `profiles.yml` examples + +Here is the format of `profiles.yml` for each supported data warehouse: + + + +```yml Snowflake +## SNOWFLAKE ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: snowflake + account: [account id] + + ## User/password auth ## + user: [username] + password: [password] + + role: [user role] + database: [database name] + warehouse: [warehouse name] + schema: [schema name]_elementary + threads: 4 + +``` + +```yml BigQuery +## BIGQUERY ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: bigquery + + ## Service account auth ## + method: service-account + keyfile: empty + + project: [project id] + dataset: [dataset name] # elementary dataset, usually [dataset name]_elementary + threads: 4 + location: [dataset location] + priority: interactive +``` + +```yml Redshift +## REDSHIFT ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: redshift + host: [hostname, like hostname.region.redshift.amazonaws.com] + + ## User/password auth ## + user: [username] + password: [password] + + dbname: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: 4 +``` + +```yml Databricks +## DATABRICKS ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: databricks + host: [hostname, like .cloud.databricks.com] + http_path: [like /sql/1.0/endpoints/] + schema: [schema name] # elementary schema, usually [schema name]_elementary + token: [token] + threads: [number of threads like 8] +``` + +```yml Postgres +## POSTGRES ## +## Configure the database and schema of elementary models. + +elementary: + outputs: + default: + type: postgres + host: [hostname] + user: [username] + password: [password] + port: [port] + dbname: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] + +``` + + + diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index 2dc452b11..af67fe76f 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -1,5 +1,5 @@ --- -title: "Quickstart: Install Elementary dbt package" +title: "Install Elementary dbt package" sidebarTitle: "Install dbt package" --- diff --git a/docs/mint.json b/docs/mint.json index 9c98b5b42..6622a419a 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -223,9 +223,10 @@ ] }, { - "group": "Onboarding", + "group": "Getting Started", "pages": [ "cloud/onboarding/quickstart-dbt-package", + "create-profile", "cloud/onboarding/signup", "cloud/onboarding/connect-data-warehouse", "cloud/manage-team" From 097c648b3fada39107e24b2c46a03f8ef0371fd4 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:53:20 +0300 Subject: [PATCH 163/194] cloud docs changes --- docs/cloud/onboarding/connect-data-warehouse.mdx | 2 +- docs/mint.json | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 98132d01a..4831821f7 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -1,6 +1,6 @@ --- title: "Connect your data warehouse" -sidebarTitle: "Data warehouse" +sidebarTitle: "Connect data warehouse" --- You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). diff --git a/docs/mint.json b/docs/mint.json index 6622a419a..e8c27c1c1 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -226,7 +226,7 @@ "group": "Getting Started", "pages": [ "cloud/onboarding/quickstart-dbt-package", - "create-profile", + "cloud/onboarding/create-profile", "cloud/onboarding/signup", "cloud/onboarding/connect-data-warehouse", "cloud/manage-team" From 5238e591f614a6b2054479b651766a4964278d4a Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:56:33 +0300 Subject: [PATCH 164/194] cloud docs changes --- docs/cloud/onboarding/create-profile.mdx | 4 ++++ docs/cloud/onboarding/quickstart-dbt-package.mdx | 5 +++-- docs/cloud/onboarding/signup.mdx | 2 +- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index f52b98cc1..413aa9372 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -126,3 +126,7 @@ elementary: +### What's next? + +1. [Singup to Elementary cloud](/cloud/sonboarding/signup). +2. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). \ No newline at end of file diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index af67fe76f..9aeeddac2 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -52,5 +52,6 @@ If you see data in these models you completed the package deployment (Congrats! ### What's next? -1. [Singup to Elementary cloud](/cloud/saas-onboarding/signup). -2. [Connect your Elementary schema to Elementary cloud](/cloud/saas-onboarding/connect-data-warehouse). \ No newline at end of file +1. [Create a connection profile](/cloud/onboarding/create-profile). +2. [Singup to Elementary cloud](/cloud/sonboarding/signup). +3. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). \ No newline at end of file diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index 119fb45ab..189ddebbc 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,5 +1,5 @@ --- -title: "Quickstart: Signup and connect" +title: "Signup and login" sidebarTitle: "Signup and login" --- From 982760ab1135f8fc2d72ce4b3726ad883a6ce8df Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 15:58:46 +0300 Subject: [PATCH 165/194] cloud docs changes --- docs/cloud/manage-team.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 6c999433a..35bd956fe 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -1,6 +1,6 @@ --- -title: "Quickstart: Invite and remove users" -sidebarTitle: "Team settings" +title: "Invite and remove users" +sidebarTitle: "Invite users" --- ### Invite users From 1db49d04d7550a5740598f1d780cf897170030a4 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Thu, 8 Jun 2023 16:00:00 +0300 Subject: [PATCH 166/194] cloud docs changes --- docs/cloud/manage-team.mdx | 2 +- docs/cloud/onboarding/connect-data-warehouse.mdx | 2 +- docs/cloud/onboarding/create-profile.mdx | 2 +- docs/cloud/onboarding/quickstart-dbt-package.mdx | 2 +- docs/cloud/onboarding/signup.mdx | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 35bd956fe..380c6ab7c 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -1,6 +1,6 @@ --- title: "Invite and remove users" -sidebarTitle: "Invite users" +sidebarTitle: "5️⃣ Invite users" --- ### Invite users diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 4831821f7..4e550fc5b 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -1,6 +1,6 @@ --- title: "Connect your data warehouse" -sidebarTitle: "Connect data warehouse" +sidebarTitle: "4️⃣ Connect data warehouse" --- You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index 413aa9372..d1687ef0b 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -1,6 +1,6 @@ --- title: "Create `profiles.yml` file" -sidebarTitle: "Create profiles.yml" +sidebarTitle: "2️⃣ Create profiles.yml" --- You will need to provide Elementary cloud a `profiles.yml` file with a connection profile named `elementary`. diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index 9aeeddac2..3fc71aac1 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -1,6 +1,6 @@ --- title: "Install Elementary dbt package" -sidebarTitle: "Install dbt package" +sidebarTitle: "1️⃣ Install dbt package" --- diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index 189ddebbc..e6326ea01 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,6 +1,6 @@ --- title: "Signup and login" -sidebarTitle: "Signup and login" +sidebarTitle: "3️⃣ Signup and login" --- ### Signup to Elementary cloud From b0a5a6f77ba117426a4c318aca10658fd04d1d9b Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sat, 10 Jun 2023 13:59:08 +0300 Subject: [PATCH 167/194] cloud docs changes --- docs/cloud/manage-team.mdx | 2 +- docs/cloud/onboarding/connect-data-warehouse.mdx | 2 +- docs/cloud/onboarding/create-profile.mdx | 2 +- docs/cloud/onboarding/quickstart-dbt-package.mdx | 2 +- docs/cloud/onboarding/signup.mdx | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 380c6ab7c..4213e0966 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -1,6 +1,6 @@ --- title: "Invite and remove users" -sidebarTitle: "5️⃣ Invite users" +sidebarTitle: "5. Invite users" --- ### Invite users diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index 4e550fc5b..8ddcfd327 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -1,6 +1,6 @@ --- title: "Connect your data warehouse" -sidebarTitle: "4️⃣ Connect data warehouse" +sidebarTitle: "4. Connect data warehouse" --- You can connect Elementary to a data warehouse that has an Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index d1687ef0b..1d842baa2 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -1,6 +1,6 @@ --- title: "Create `profiles.yml` file" -sidebarTitle: "2️⃣ Create profiles.yml" +sidebarTitle: "2. Create profiles.yml" --- You will need to provide Elementary cloud a `profiles.yml` file with a connection profile named `elementary`. diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index 3fc71aac1..23afdc280 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -1,6 +1,6 @@ --- title: "Install Elementary dbt package" -sidebarTitle: "1️⃣ Install dbt package" +sidebarTitle: "1. Install dbt package" --- diff --git a/docs/cloud/onboarding/signup.mdx b/docs/cloud/onboarding/signup.mdx index e6326ea01..a977beb38 100644 --- a/docs/cloud/onboarding/signup.mdx +++ b/docs/cloud/onboarding/signup.mdx @@ -1,6 +1,6 @@ --- title: "Signup and login" -sidebarTitle: "3️⃣ Signup and login" +sidebarTitle: "3. Signup and login" --- ### Signup to Elementary cloud From c5c576f13eba84d52385ed974b83876f51224579 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Sun, 11 Jun 2023 13:39:07 +0300 Subject: [PATCH 168/194] Update manage-team.mdx --- docs/cloud/manage-team.mdx | 7 ------- 1 file changed, 7 deletions(-) diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 4213e0966..97f9eb3f6 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -14,10 +14,3 @@ Users you invite will recieve an Email saying you invited them, and will need to - - -### Remove users - -On the top left button select `Account settings`, and select the `Team` screen. - -You can remove users by clicking selecting this option under the user options. From 90924288ae14d48a8f993633dbd1f550899b2a8e Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Sun, 11 Jun 2023 16:38:05 +0300 Subject: [PATCH 169/194] new release notes --- docs/mint.json | 2 ++ docs/release-notes/releases/0.7.10.mdx | 43 ++++++++++++++++++++++++++ docs/release-notes/releases/0.8.0.mdx | 32 +++++++++++++++++++ 3 files changed, 77 insertions(+) create mode 100644 docs/release-notes/releases/0.7.10.mdx create mode 100644 docs/release-notes/releases/0.8.0.mdx diff --git a/docs/mint.json b/docs/mint.json index e8c27c1c1..a42bd58d0 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -201,6 +201,8 @@ { "group": "Releases", "pages": [ + "release-notes/releases/0.8.0", + "release-notes/releases/0.7.10", "release-notes/releases/0.7.7", "release-notes/releases/0.7.6", "release-notes/releases/0.7.5", diff --git a/docs/release-notes/releases/0.7.10.mdx b/docs/release-notes/releases/0.7.10.mdx new file mode 100644 index 000000000..5132d5c86 --- /dev/null +++ b/docs/release-notes/releases/0.7.10.mdx @@ -0,0 +1,43 @@ +--- +title: "Elementary 0.7.10" +sidebarTitle: "0.7.10" +--- + +_May 17, 2023: [v0.7.10 Python](https://github.com/elementary-data/elementary/releases/tag/v0.7.10), [v0.7.8 dbt package](https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.7.8)_ + +### 🔥 What's new? + +- **New artifacts model - dbt_columns 🆕** + - Many users requested that we add this model with useful information about the columns in your project. + - We also plan to add features to the report and alerts based on this data. Stay tuned... :coming-soon: + - (This is not yet supported for Databricks, [working on it](https://github.com/elementary-data/elementary/issues/872)) + +- **New lineage filters - Tags and owners #️⃣👥** + - You can now filter the lineage to see only the nodes that are relevant to a business department or a specific owner, and their upstream and downstream nodes. + - We also made some general improvements to the filters usability like adding search and clear all. + +- **Test results sample size in the report can be changed 🔢** + - The report is no longer limited to 5 results, you can change this by adding the var: + - `test_sample_row_count: 10` + +- New flag - `edr --version` 🏁 + - Thank you [@Manul Patel](https://elementary-community.slack.com/team/U054N92MU11) for contributing this  🤩 + + +### 💫 More changes + +- Alerts suppression interval is no longer limited to 24 hours. +- Added indicative exceptions to Elementary tests. +- `time_bucket` can now be configured in model and var levels as well. +- Created workarounds to solve breaking changes in dbt 1.5.0 adapters: +- Not rely on Databricks adapter to create temp tables - Thanks [@Joseph Berni](https://elementary-community.slack.com/team/U03NWMS0Y93), [@fitz](https://elementary-community.slack.com/team/U03V5KGR3U3) and [@Dharit Sura](https://elementary-community.slack.com/team/U047ZFZRDCH) for reporting! +- Run queries from `run` instead of `run-operation` due to bug in Redshift adapter - Thank you [@Eugene Sobolev](https://elementary-community.slack.com/team/U054BV7MR0T) for reporting and investigating with us! + + +### 🐞 Bug fixes +- Support dbt run results compiled sql with % on Redshift new adapter - Thanks [@Fabien Traventhal](https://elementary-community.slack.com/team/U03G693L05R) for reporting! +- Fixed ignored backfill_days in no timestamp tests - Thanks [@leila](https://elementary-community.slack.com/team/U04HFCUM2G6) and [@Roland Baranovic](https://elementary-community.slack.com/team/U04K5SUJS8Z) for reporting! +- Fixed alerts `-group-by` empty value - Thank you [@Dimosthenis Schizas](https://elementary-community.slack.com/team/U054B4PNACE) for contributing 🤩 +- Owners accept dict format - Thank you [@Stephen Lloyd](https://elementary-community.slack.com/team/U03FQELBBV1) for reporting and [@Manul Patel](https://elementary-community.slack.com/team/U054N92MU11) for contributing 🤩 +- Paginate upload of source freshness data for large results - Thank you [@Fred](https://elementary-community.slack.com/team/U03QXQ3VCF8) for reporting and fixing 🤩 +- Thank you [@winzee](https://github.com/winzee) and [@vinooganesh](https://github.com/vinooganesh) for helping keep our docs accurate and typos free 🎉e Melhuish](https://elementary-community.slack.com/team/U04KWBDTP4J)! \ No newline at end of file diff --git a/docs/release-notes/releases/0.8.0.mdx b/docs/release-notes/releases/0.8.0.mdx new file mode 100644 index 000000000..e80b19e18 --- /dev/null +++ b/docs/release-notes/releases/0.8.0.mdx @@ -0,0 +1,32 @@ +--- +title: "Elementary 0.8.0" +sidebarTitle: "0.8.0" +--- + +_June 1, 2023: [v0.8.0 Python](https://github.com/elementary-data/elementary/releases/tag/v0.8.0), [v0.8.0 dbt package](https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.8.0)_ +_As this is a minor version bump, you need to run `dbt run -s elementary`_ + +### 🔥 What's new? + +- **🆕 Jobs info from Orchestrator 🆕** + - Elementary now supports collecting metadata about your jobs from your orchestration tool! + - The goal is to provide context that is useful to triage and resolve data issues: + - As a first step, you could filter the lineage by job in the Elementary report. + - More orchestrator related features are coming soon 😎 + - Here is the [guide for enabling jobs info collection.](https://docs.elementary-data.com/deployment-and-configuration/collect-job-data). + +- **You can now configure all test params in the project / model / test level 🤯** + - Why is it useful? + - It enables you to tailor the tests to the dataset and get higher level of accuracy! + - You can leverage inheritance, configure at a higher level (like folder of models) and save the need to configure by test. + - Some examples: + - You can set `days_back: 90` to tests with `time_bucket: period: week`, and `days_back: 7` to tests with `time_bucket: period: hour` . + - You can set `timestamp_column: updated_at` in your dbt_project.yml if this is your convention, and override it for models where it's different. + - You can set `seasonality`, `time_bucket` and `timestamp_column` at the source level, and it will apply for all the tests you add to tables of this source. + - We also upgraded our documentation of the [tests configuration](https://docs.elementary-data.com/guides/elementary-tests-configuration) and [how the tests work](https://docs.elementary-data.com/guides/how-anomaly-detection-works), to make it clearer 😇 + + +### 💫 More changes + +- Added `materialization` field to models run results, thank you [@Aril Mavinkere](https://elementary-community.slack.com/team/U058SJFFTEU) for contributing! 🤩 +- Removed `env` from report summary. \ No newline at end of file From 99243fb662a23add6739f7bd054985c3d5e2470b Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:24:38 +0300 Subject: [PATCH 170/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 4bac22a9f..3e0c1e4a6 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,7 +18,8 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests - +#### Volume anomalies + ``` elementary.volume_anomalies ``` From 621b7aa32bbdb5b7e0d29693f7ca8ace164653fd Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:25:24 +0300 Subject: [PATCH 171/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 3e0c1e4a6..33ebc5725 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,7 +18,7 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests -#### Volume anomalies +#### Volume anomalies ``` elementary.volume_anomalies From e5d92799ad3313fd526ab77ddfff9ed9cbd5be2b Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:26:08 +0300 Subject: [PATCH 172/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 33ebc5725..e033509ed 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,7 +18,7 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests -#### Volume anomalies +#### - Volume anomalies ``` elementary.volume_anomalies From 6560566ef0ece123ddf537d72ba27e23887a80eb Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:28:08 +0300 Subject: [PATCH 173/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index e033509ed..0d76fab76 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -18,6 +18,7 @@ The tests are configured and executed like any other tests in your project. ### Table (model / source) tests + #### - Volume anomalies ``` @@ -26,8 +27,8 @@ The tests are configured and executed like any other tests in your project. Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). - - +#### - Freshness anomalies + ``` elementary.freshness_anomalies ``` @@ -35,7 +36,8 @@ The tests are configured and executed like any other tests in your project. Requires a [`timestamp_column`](/guides/anomaly-detection-configuration/timestamp-column) configuration. - +#### - Event freshness anomalies + ``` elementary.event_freshness_anomalies ``` @@ -44,7 +46,8 @@ The tests are configured and executed like any other tests in your project. database (the `update timestamp`). Configuring `event_timestamp_column` is required, and `update_timestamp_column` is optional. - +#### - Dimension anomalies + ``` elementary.dimension_anomalies ``` @@ -53,7 +56,8 @@ The tests are configured and executed like any other tests in your project. The test counts rows grouped by given `dimensions` (columns/expressions). - +#### - All columns anomalies + ``` elementary.all_columns_anomalies ``` @@ -65,7 +69,9 @@ The tests are configured and executed like any other tests in your project. ### Column tests - + +#### - Columns anomalies + ``` elementary.column_anomalies ``` @@ -75,7 +81,7 @@ The tests are configured and executed like any other tests in your project. -#### Adding tests examples: +### Adding tests examples From d87259bb5648132377e39cbaa8c52d98c662ae43 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 11:28:42 +0300 Subject: [PATCH 174/194] Update add-elementary-tests.mdx --- docs/guides/add-elementary-tests.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/guides/add-elementary-tests.mdx b/docs/guides/add-elementary-tests.mdx index 0d76fab76..5450eca19 100644 --- a/docs/guides/add-elementary-tests.mdx +++ b/docs/guides/add-elementary-tests.mdx @@ -14,8 +14,6 @@ The tests are configured and executed like any other tests in your project. Demo -## Available anomaly detection tests - ### Table (model / source) tests From 7cffee9d04a934b5589a90958c1e2979e356c12d Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Tue, 13 Jun 2023 18:02:04 +0300 Subject: [PATCH 175/194] Update where-expression.mdx --- .../anomaly-detection-configuration/where-expression.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/guides/anomaly-detection-configuration/where-expression.mdx b/docs/guides/anomaly-detection-configuration/where-expression.mdx index 7413e4482..8a70b8995 100644 --- a/docs/guides/anomaly-detection-configuration/where-expression.mdx +++ b/docs/guides/anomaly-detection-configuration/where-expression.mdx @@ -28,10 +28,10 @@ models: where_expression: "loaded_at is not null" ``` -```yml dbt_project.yml +```yml dbt_project.yml vars: - timestamp_column: "loaded_at > '2022-01-01'" + where_expression: "loaded_at > '2022-01-01'" ``` - \ No newline at end of file + From 0e0925b48e794c6cf38ac63a9c52878de614de8a Mon Sep 17 00:00:00 2001 From: Alex Alves Date: Wed, 14 Jun 2023 10:05:17 +0200 Subject: [PATCH 176/194] Update how-anomaly-detection-works.mdx Link to guides/data-anomaly-detection was not working --- docs/guides/how-anomaly-detection-works.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/how-anomaly-detection-works.mdx b/docs/guides/how-anomaly-detection-works.mdx index eff6bc2be..aa492a2de 100644 --- a/docs/guides/how-anomaly-detection-works.mdx +++ b/docs/guides/how-anomaly-detection-works.mdx @@ -54,7 +54,7 @@ To calculate how data changes over time and detect issues, we split the data int For example, if we use daily time bucket and monitor for row count anomalies, we will count new rows per day. ### Detection algorithm -Read about it in [data anomaly detection](/guides/data_anomaly_detection). +Read about it in [data anomaly detection](/guides/data-anomaly-detection). @@ -85,4 +85,4 @@ Configuration params related directly to the test's core concepts: **Monitored data set** - [where_expression](/guides/anomaly-detection-configuration/where-expression) -- [dimensions](/guides/anomaly-detection-configuration/dimensions) \ No newline at end of file +- [dimensions](/guides/anomaly-detection-configuration/dimensions) From 5bc8f8df90d84c54e3a604b2ea7f2eaeabd59622 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 15:51:51 +0300 Subject: [PATCH 177/194] new release notes --- docs/guides/alerts-configuration.mdx | 489 ++++++++++++++++++++ docs/guides/data-anomaly-detection.mdx | 3 +- docs/guides/how-anomaly-detection-works.mdx | 2 +- docs/mint.json | 8 +- docs/quickstart/send-slack-alerts.mdx | 450 ++---------------- 5 files changed, 527 insertions(+), 425 deletions(-) create mode 100644 docs/guides/alerts-configuration.mdx diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx new file mode 100644 index 000000000..529ebdba5 --- /dev/null +++ b/docs/guides/alerts-configuration.mdx @@ -0,0 +1,489 @@ +--- +title: "Alerts Configuration and Customization" +sidebarTitle: "Alerts configuration" +--- + +You can enrich your alerts by adding properties to tests and models in your `.yml` files. +The supported attributes are: description, tags, owner, subscribers. + +You can configure and customize your alerts by configuring: +custom channel, alert fields, alert grouping, alert filters, suppression interval. + + +## Alert properties in `.yml` files + +Elementary prioritizes configuration in the following order: + +**For models / sources:** +1. Model config block. +2. Model properties. +3. Model path configuration under `models` key in `dbt_project.yml`. + +**For tests:** +1. Test properties. +2. Tests path configuration under `tests` key in `dbt_project.yml`. +3. Parent model configuration. + +
+ 
+  meta:
+       owner: "@jessica.jones"
+       subscribers: ["@jessica.jones", "@joe.joseph"]
+       tags: ["#marketing", "#data_ops"]
+       channel: data_ops
+       description: "This is the test description"
+       alert_suppression_interval: 24
+       alert_fields: ["description", "owners", "tags", "subscribers"]
+       slack_group_alerts_by: table
+ 
+
+ + +### Alert content + +#### Owner + +Elementary enriches alerts with [owners for models or tests](https://docs.getdbt.com/reference/resource-configs/meta#designate-a-model-owner)). +- If you want the owner to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). +- You can configure a single owner or a list of owners (`["@jessica.jones", "@joe.joseph"]`). + + + +```yml model +models: + - name: my_model_name + meta: + owner: "@jessica.jones" +``` + +```yml test +tests: + - not_null: + meta: + owner: ["@jessica.jones", "@joe.joseph"] +``` + +```yml test/model config block +{{ config( + tags=["Tag1","Tag2"] + meta={ + "description": "This is a description", + "owner": "@jessica.jones" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +owner: "@jessica.jones" + +tests: + path: + subfolder: + +meta: + +owner: "@jessica.jones" +``` + + + +#### Subscribers + +If you want additional users besides the owner to be tagged on an alert, add them as subscribers. +- If you want the subscriber to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). +- You can configure a single subscriber or a list (`["@jessica.jones", "@joe.joseph"]`). + + + +```yml model +models: + - name: my_model_name + meta: + subscribers: "@jessica.jones" +``` + +```yml test +tests: + - not_null: + meta: + subscribers: ["@jessica.jones", "@joe.joseph"] +``` + +```yml test/model config block +{{ config( + meta={ + "subscribers": "@jessica.jones" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +subscribers: "@jessica.jones" + +tests: + path: + subfolder: + +meta: + +subscribers: "@jessica.jones" +``` + + + + +#### Test description + +Elementary supports configuring description for tests that are included in alerts. +It's recommended to add an explanation of what does it mean if this test fails, so alert will include this context. + + + +```yml test +tests: + - not_null: + meta: + description: "This is the test description" +``` + +```yml test config block +{{ config( + tags=["Tag1","Tag2"] + meta={ + description: "This is the test description" + } +) }} +``` + +```yml dbt_project.yml +tests: + path: + subfolder: + +meta: + +description: "This is the test description" +``` + + + +#### Tags + +You can use [tags](https://docs.getdbt.com/reference/resource-configs/tags) to provide context to your alerts. + +- You can tag a group or a channel in a slack alert by adding `#channel_name` as a tag. +- Tags are aggregated,so a test alert will include both the test and the parent model tags. + + + +```yml model +models: + - name: my_model_name + tags: ["#marketing", "#data_ops"] +``` + +```yml test +tests: + - not_null: + tags: ["#marketing", "#data_ops"] +``` + +```yml test/model config block +{{ config( + tags=["#marketing", "#data_ops"] + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +tags: ["#marketing", "#data_ops"] + +tests: + path: + subfolder: + +tags: ["#marketing", "#data_ops"] +``` + + + + +### Alerts distribution + +Elementary allows you to customize alerts to distribute the right information to the right people. +This way you can ensure your alerts are valuable and avoid alert fatigue. + +#### Custom channel + +Elementary supports configuring custom Slack channels for models and tests. +By default, Elementary uses the Slack channel that was configured in the Slack integration. + + + +```yml model +models: + - name: my_model_name + meta: + channel: data_ops +``` + +```yml test +tests: + - not_null: + meta: + channel: data_ops +``` + +```yml test/model config block +{{ config( + meta={ + "channel": "data_ops" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +channel: data_ops + +tests: + path: + subfolder: + +meta: + +channel: data_ops +``` + + + +#### Suppression interval + +Don’t want to get multiple alerts if the same test keeps failing? +You can now configure an `alert_suppression_interval`, this is a “snooze” period for alerts on the same issue. + +The accepted value is in hours, so 1 day snooze is `alert_suppression_interval: 24`. +Elementary won't send new alerts on the same issue that are generated within suppression interval. + + + +```yml model +models: + - name: my_model_name + meta: + alert_suppression_interval: 24 +``` + +```yml test +tests: + - not_null: + meta: + alert_suppression_interval: 12 +``` + +```yml test/model config block +{{ config( + meta={ + "alert_suppression_interval": 24 + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +alert_suppression_interval: 24 + +tests: + path: + subfolder: + +meta: + +alert_suppression_interval: 48 +``` + + + +#### Group alerts by table + +By default, Elementary sends a single alert to notify on each failure with extensive information for fast triage. + +Elementary also supports grouping alerts by table. +In this case, a single Slack notification will be generated containing all issues associated with this table. +The created notification will contain a union of the relevant owners, tags and subscribers. + +Due to their nature, grouped alerts will contain less information on each issue. + + + + +```yml model +models: + - name: my_model_name + meta: + slack_group_alerts_by: table +``` + +```yml test +tests: + - not_null: + meta: + slack_group_alerts_by: table +``` + +```yml test/model config block +{{ config( + meta={ + "slack_group_alerts_by": "table" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +slack_group_alerts_by: table + +tests: + path: + subfolder: + +meta: + +slack_group_alerts_by: table +``` + + + + +#### Alert fields + + +**Currently this feature is supported only by test alerts!** + + +You can decide which fields to include in the alert, and create a format of alert that fits your use case and recipients. +By default, all the fields are included in the alert. + +Supported alert fields: + +- table: Displays the table name of the test +- column: Displays the column name of the test +- description: Displays the description of the test +- owners: Displays the owners of the model on which the test is running +- tags: Displays the dbt tags of the test/model +- subscribers: Displays the subscribers of the test/model +- result_message: Displays the returned message from the test result +- test_parameters: Displays the parameters that were provided to the test +- test_query: Displays the query of the test +- test_results_sample: Displays a sample of the test results + + + +```yml model +models: + - name: my_model_name + meta: + alert_fields: ["description", "owners", "tags", "subscribers"] +``` + +```yml test +tests: + - not_null: + meta: + alert_fields: ["description", "owners", "tags", "subscribers"] +``` + +```yml test/model config block +{{ config( + meta={ + "alert_fields": "['description', 'owners', 'tags', 'subscribers']" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + +alert_fields: ["description", "owners", "tags", "subscribers"] + +tests: + path: + subfolder: + +meta: + +alert_fields: ["description", "owners", "tags", "subscribers"] +``` + + + +## Alerts global configuration + +#### Enable/disable alerts + +You can choose to enable / disable alert types by adding a var to your `dbt_project.yml`. + +Below are the available vars and their default config: + +```yml dbt_project.yml +vars: + disable_model_alerts: false + disable_test_alerts: false + disable_warn_alerts: false + disable_skipped_model_alerts: true + disable_skipped_test_alerts: true +``` + +## Alerts CLI flags + +#### Filter alerts + +Elementary supports filtering alerts using a selector, and sending only the selected alerts. +You can filter the alerts by tag, owner or model. + +If you run `edr` from the dbt project directory (or pass `--project-dir`), you can use any of the dbt selectors. + + + +```shell tag filter +edr monitor --select tag:critical +edr monitor --select tag:finance +``` + +```shell owner filter +edr monitor --select config.meta.owner:@jeff +edr monitor --select config.meta.owner:@jessy +``` + +```shell model filter +edr monitor --select model:customers +edr monitor --select model:orders + +edr monitor --select customers +edr monitor --select orders +``` + + + + +#### Group alerts by table + +By default, Elementary sends a single alert to notify on each failure with extensive information for fast triage. + +Elementary also supports grouping alerts by table. +In this case, a single Slack notification will be generated containing all issues associated with this table. +The created notification will contain a union of the relevant owners, tags and subscribers. + +Due to their nature, grouped alerts will contain less information on each issue. + +```shell +edr monitor --group-by table +``` + diff --git a/docs/guides/data-anomaly-detection.mdx b/docs/guides/data-anomaly-detection.mdx index 796c31e95..795996025 100644 --- a/docs/guides/data-anomaly-detection.mdx +++ b/docs/guides/data-anomaly-detection.mdx @@ -1,5 +1,6 @@ --- -title: "Data anomaly detection" +title: "Data anomaly detection method" +sidebarTitle: "Detection method" --- Elementary uses "[standard score](https://en.wikipedia.org/wiki/Standard_score)", also known as "Z-score" for anomaly detection. This score represents the number of standard deviations of a value from the average of a set of values. diff --git a/docs/guides/how-anomaly-detection-works.mdx b/docs/guides/how-anomaly-detection-works.mdx index aa492a2de..04b16330b 100644 --- a/docs/guides/how-anomaly-detection-works.mdx +++ b/docs/guides/how-anomaly-detection-works.mdx @@ -1,6 +1,6 @@ --- title: "Elementary anomaly detection tests" -sidebarTitle: "Core concepts" +sidebarTitle: "Data anomaly detection" --- Elementary dbt package includes **anomaly detection tests, implemented as [dbt tests](https://docs.getdbt.com/docs/building-a-dbt-project/tests)**. diff --git a/docs/mint.json b/docs/mint.json index a42bd58d0..1171f3e4e 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -95,7 +95,13 @@ "guides/share-observability-report/send-report-summary" ] }, - "quickstart/send-slack-alerts", + { + "group": "Send Slack alerts", + "pages": [ + "quickstart/send-slack-alerts", + "guides/alerts-configuration" + ] + }, "guides/add-elementary-tests", "guides/add-schema-tests", "guides/python-tests" diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 436d5b935..e0226e07f 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -1,50 +1,45 @@ --- -title: "Send Slack alerts" +title: "Setup Slack alerts" --- -Elementary has a Slack integration to send alerts about failures of dbt tests, Elementary tests, model runs, and source freshness. +Elementary has a Slack integration to send alerts about: +- Failures and/or of dbt tests +- Failures and/or Elementary tests +- Model runs failures +- Source freshness issues -You can customize the alerts in your `.yml` files by configuring: +You can enrich your alerts by adding properties to tests and models in your `.yml` files. +The supported attributes are: description, tags, owner, subscribers. -- **Description** -- **Tags** -- **Owner** -- **Subscribers** -- **Custom channel** -- **Alert fields** -- **Alert filters** -- **Alert grouping** -- **Suppression interval** +You can configure and customize your alerts by configuring: +custom channel, alert fields, alert grouping, alert filters, suppression interval. -New Slack alert format + +
+ New Slack alert format +
+ -## Before you start -Before you can start using the alerts, make sure to install the dbt package, configure a profile and install the CLI. -This is **required for the alerts to work.** - - - - - - - - +## Setup Slack Integration - + - +**Before you start** - +Before you can start using the alerts, make sure to install the dbt package, configure a profile and install the CLI. +This is **required for the alerts to work.** - +1. A working Python installation +2. [pip installer](https://pip.pypa.io/en/stable/) for Python +3. Access and credentials to a data warehouse supported by Elementary - +We also recommend you work with a [Python virtual environment](https://docs.python.org/3/library/venv.html). -## Setup Slack Integration + @@ -61,395 +56,6 @@ Or just `edr monitor` if you used `config.yml`. --- -## Enable/disable alerts - -By default, alerts are sent on failed tests, errored models and errored snapshots. -You can choose to enable / disable alert types by adding a var to your `dbt_project.yml`. - -Below are the available vars and their default config: - -```yml dbt_project.yml -vars: - # Alerts configuration vars # - # All set to false by default # - disable_model_alerts: false - disable_test_alerts: false - disable_warn_alerts: false - disable_skipped_model_alerts: true - disable_skipped_test_alerts: true -``` - - -## Alert properties - -In your `.yml` files, add the following properties to models / tests: - - - - - -Elementary enriches alerts with [table owners](https://docs.getdbt.com/reference/resource-configs/meta#designate-a-model-owner)). - -If you want to tag a model owner in a slack alert: -- Use '@' and the email prefix of the slack user. -- For example, if we want to tag a user named Jessica with an email jessica.jones@marvel.com in our Slack workspace, simply add the email prefix (with lower case) jessica.jones as follows to your model schema.yml / properties.yml: - -```yml properties.yml -models: - - name: my_model_name - meta: - owner: "@jessica.jones" -``` - -It is possible to tag multiple owners as well: - -```yml properties.yml -models: - - name: my_model_name - meta: - owner: ["@jessica.jones", "@joe.joseph"] -``` - - - - - -Elementary supports configuring description to tests alerts. - -To set it up, simply add the description to your test in the `properties.yml` - -```yml properties.yml -tests: - - test_name: - meta: - description: "This is the test description" -``` - - - - - - -You can use [tags](https://docs.getdbt.com/reference/resource-configs/tags) to provide context to your alerts. - -You can also use it to tag a group or a channel in a slack alert: - -- Add it as model tag and use '#' as the prefix of the channel name. -- For example, to tag the marketing team's data ops channel add the following to your `model schema.yml` - / `properties.yml`. - -```yml properties.yml -tests: - - test_name: - meta: - tags: ["#marketing", "#support"] -``` - - - - - - If you want to tag users on an alert: -- Use '@' and the email prefix of your slack user, and to 'subscribers' under a meta field to your `properties.yml` file. -- For example, if we want to tag a user named Jessica with an email jessica.jones@marvel.com in our Slack workspace, use "@jessica.jones". - -```yml properties.yml -models: - - name: my_model_name - meta: - alerts_config: - subscribers: "@jessica.jones" - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - subscribers: "@luke.cage" -``` - -It is possible to tag multiple subscribers as well: - -```yml properties.yml -models: - - name: my_model_name - meta: - alerts_config: - subscribers: ["@jessica.jones", "@luke.cage"] -``` - - - - - - -## Alert configuration - -Elementary allows you to customize alerts to distribute the right information to the right people. This way you can ensure your alerts are valuable and to avoid alert fatigue. - - - - - - -By default Elementary uses the Slack channel that was configured in the Slack integration. -Elementary supports configuring custom slack channels that are configured on your models / sources / tests and snapshots. - -- If you configure a custom slack channel for a model, all the test alerts that belong to this model will be sent to this custom slack channel. -- If you configure a custom slack channel for both a model and a test, the test channel will override the model channel. -- If you configure a custom slack channel and you decide to group your alerts by table into a single message, it will be sent to the model channel (even if a differnt channel was configured on the test level) - - -To set it up, simply add the relevant channel to your models in the `properties.yml`: - -```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - channel: marketing_data_ops -``` - -If your models / tests are in folders by department / team, another useful option is to configure the channel in -your `dbt_project.yml` file: - -```yml dbt_project.yml -models: - marketing_bi: - +meta: - alerts_config: - channel: marketing_data_ops - -tests: - marketing_bi: - +meta: - alerts_config: - channel: marketing_data_ops -``` - -You can also configure a custom slack channel for a specific test: - -```yml properties.yml -models: - - name: marketing_leads - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - channel: marketing_data_ops -``` - - - - - -**Currently this feature is supported only by test alerts!** - - -Elementary supports the following alert fields: - -- table: Displays the table name of the test -- column: Displays the column name of the test -- description: Displays the description of the test -- owners: Displays the owners of the model on which the test is running -- tags: Displays the dbt tags of the test/model -- subscribers: Displays the subscribers of the test/model -- result_message: Displays the returned message from the test result -- test_parameters: Displays the parameters that were provided to the test -- test_query: Displays the query of the test -- test_results_sample: Displays a sample of the test results - -By default, all of the fields are shown in the alerts. -Elementary supports configuring alert fields on your dbt project / models and tests. -- If you configure alert fields on your dbt project, all the test alerts of all of your tests will display only the configured alert fields. -- If you configure alert fields for a model, all the test alerts that belong to this model will display only the configured alert fields. -- If you configure alert fields for both a model and a test, the test configured alert fields will override the model configured alert fields (same as for the dbt project configured alert fields). - -To set it up globaly for your project, add the desired alert fields to your models and tests in the `dbt_project.yml` file: - -```yml dbt_project.yml -models: - marketing_leads: - +meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] - -tests: - marketing_leads: - +meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] -``` - -To set it up for a model, add the desired alert fields to your model in the properties.yml: - -```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] -``` - -You can also configure alert fields for a specific test: - -```yml properties.yml -models: - - name: marketing_leads - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - alert_fields: ["description", "owners", "tags", "subscribers"] -``` - - - - - - - -Elementary supports filtering alerts using a selector. -Elementry `edr monitor` command will notify only on the selector's matched alerts. - -There are 3 selectors supported by elementary: - -- tag - Notify on models/sources/tests that are tagged with the provided tag selector (notice that tests can be matched on their model's/source's tag). -- owner - Notify on models/sources/tests that their owner is provided owner selector (notice that tests can be matched on their model's/source's owner). -- model - Notify on the model/source and its tests. - -To filter alerts by tag: - -```shell -edr monitor --select tag:critical -edr monitor --select tag:finance -``` - -To filter alerts by owner: - -```shell -edr monitor --select config.meta.owner:@jeff -edr monitor --select config.meta.owner:@jessy -``` - -To filter alerts by model: - -```shell -edr monitor --select model:customers -edr monitor --select model:orders - -edr monitor --select customers -edr monitor --select orders -``` - - - - - -Elementary support configuring suppression interval for alerts. -By default, the suppression interval for all of the alerts is set to 0. -Elementary won't send any alert that is generated within suppression interval. - -`alert_suppression_interval` can accept values greater than 0, including unrounded numbers - this number represents the number of hours for which alerts will be skipped. - -To set it up globaly for your project, add the alert suppression interval to your models and tests in the `dbt_project.yml` file: - -```yml dbt_project.yml -models: - marketing_leads: - +meta: - alerts_config: - alert_suppression_interval: 24 - - -tests: - marketing_leads: - +meta: - alerts_config: - alert_suppression_interval: 24 -``` - -To set it up for a model, add the desired alert suppression interval to your model in the properties.yml: - -```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - alert_suppression_interval: 24 -``` - -You can also configure alert suppression interval for a specific test: - -```yml properties.yml -models: - - name: marketing_leads - columns: - - name: column_name - tests: - - unique: - meta: - alerts_config: - alert_suppression_interval: 24 -``` - - - - -By default, Elementary sends a single alert to notify on each failure. When using single alerts, the alert will include extensive information for fast triage. - -Elementary also supports grouping alerts by table. In this case, a single Slack notification will be generated containing all test warnings/failures/errors as well as the errors associated with the model. The created notification will contain a union of the relevant owners, tags and subscribers. Due to their nature, grouped alerts will contain less information on each issue. As always, you can use our ([detailed report](/quickstart/generate-report-ui)) for easy triage. - -To group alerts by table: - -```shell -edr monitor --group-by table -``` - -Grouping can also be configured through the yml files. To set it up globaly for your project, add the configuration to your models in the dbt_project.yml file: - - ```yml dbt_project.yml -models: - marketing_bi: - +meta: - alerts_config: - # alerts on models in marekting_bi should be grouped by table: - slack_group_alerts_by: table - -``` - -To set it up for a model, add the configuration to your model in the properties.yml: - - ```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - # all alerts on marketing_leads should group together to one slack message: - slack_group_alerts_by: table -``` - -Grouping by table can be configured globally (in the dbt_project.yml) but if you wish to override it for a specific model where you want a single alert for each failure, you can add the configuration to your model in the properties.yml: - - ```yml properties.yml -models: - - name: marketing_leads - meta: - alerts_config: - # alerts on marketing_leads will not be grouped: - slack_group_alerts_by: alert -``` - - - - - - - ## Alert on source freshness failures _Not supported in dbt cloud_ From 9866ec392006d85ab34d1e3fa58aef97d6dfce16 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 15:53:58 +0300 Subject: [PATCH 178/194] new release notes --- docs/guides/alerts-configuration.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx index 529ebdba5..8cc8f8783 100644 --- a/docs/guides/alerts-configuration.mdx +++ b/docs/guides/alerts-configuration.mdx @@ -26,7 +26,7 @@ Elementary prioritizes configuration in the following order:
  
-  meta:
+        meta:
        owner: "@jessica.jones"
        subscribers: ["@jessica.jones", "@joe.joseph"]
        tags: ["#marketing", "#data_ops"]

From 747a4b1e0fbba6775049d45731a9e96a7b9a3e91 Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Wed, 14 Jun 2023 15:58:26 +0300
Subject: [PATCH 179/194] new release notes

---
 docs/guides/alerts-configuration.mdx  | 12 ++++++------
 docs/quickstart/send-slack-alerts.mdx |  8 ++++----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx
index 8cc8f8783..9930ccd47 100644
--- a/docs/guides/alerts-configuration.mdx
+++ b/docs/guides/alerts-configuration.mdx
@@ -4,10 +4,10 @@ sidebarTitle: "Alerts configuration"
 ---
 
 You can enrich your alerts by adding properties to tests and models in your `.yml` files.
-The supported attributes are: description, tags, owner, subscribers.
+The supported attributes are: [owner](/guides/alerts-configuration#owner), [subscribers](/guides/alerts-configuration#subscribers), [description](/guides/alerts-configuration#test-description), [tags](/guides/alerts-configuration#tags).
 
 You can configure and customize your alerts by configuring:
-custom channel, alert fields, alert grouping, alert filters, suppression interval.
+[custom channel](/guides/alerts-configuration#custom-channel), [suppression interval](/guides/alerts-configuration#suppression_interval), [alert fields](/guides/alerts-configuration#alert_fields), [alert grouping](/guides/alerts-configuration#group-alerts-by-table), [alert filters](/guides/alerts-configuration#filter-alerts).
 
 
 ## Alert properties in `.yml` files
@@ -29,12 +29,12 @@ Elementary prioritizes configuration in the following order:
         meta:
        owner: "@jessica.jones"
        subscribers: ["@jessica.jones", "@joe.joseph"]
+       description: "This is the test description"
        tags: ["#marketing", "#data_ops"]
-       channel: data_ops
-       description: "This is the test description"
-       alert_suppression_interval: 24
+       channel: data_ops
+       alert_suppression_interval: 24
+       slack_group_alerts_by: table
        alert_fields: ["description", "owners", "tags", "subscribers"]
-       slack_group_alerts_by: table
  
 
diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index e0226e07f..6a1518d42 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -8,11 +8,11 @@ Elementary has a Slack integration to send alerts about: - Model runs failures - Source freshness issues -You can enrich your alerts by adding properties to tests and models in your `.yml` files. -The supported attributes are: description, tags, owner, subscribers. +You can enrich your alerts by adding properties to tests and models in your `.yml` files. +The supported attributes are: [owner](/guides/alerts-configuration#owner), [subscribers](/guides/alerts-configuration#subscribers), [description](/guides/alerts-configuration#test-description), [tags](/guides/alerts-configuration#tags). -You can configure and customize your alerts by configuring: -custom channel, alert fields, alert grouping, alert filters, suppression interval. +You can configure and customize your alerts by configuring: +[custom channel](/guides/alerts-configuration#custom-channel), [suppression interval](/guides/alerts-configuration#suppression_interval), [alert fields](/guides/alerts-configuration#alert_fields), [alert grouping](/guides/alerts-configuration#group-alerts-by-table), [alert filters](/guides/alerts-configuration#filter-alerts).
From af4fa96ed0a00002488ef68f19056fcda85e4f44 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 15:59:38 +0300 Subject: [PATCH 180/194] new release notes --- docs/guides/alerts-configuration.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx index 9930ccd47..9ff47dc1d 100644 --- a/docs/guides/alerts-configuration.mdx +++ b/docs/guides/alerts-configuration.mdx @@ -26,7 +26,7 @@ Elementary prioritizes configuration in the following order:
  
-        meta:
+     meta:
        owner: "@jessica.jones"
        subscribers: ["@jessica.jones", "@joe.joseph"]
        description: "This is the test description"

From 9254fbc95b610cd789fa1b34185cba0c49e9da03 Mon Sep 17 00:00:00 2001
From: Maayan-s 
Date: Wed, 14 Jun 2023 16:00:14 +0300
Subject: [PATCH 181/194] new release notes

---
 docs/guides/alerts-configuration.mdx | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/guides/alerts-configuration.mdx b/docs/guides/alerts-configuration.mdx
index 9ff47dc1d..18a454d00 100644
--- a/docs/guides/alerts-configuration.mdx
+++ b/docs/guides/alerts-configuration.mdx
@@ -26,15 +26,15 @@ Elementary prioritizes configuration in the following order:
 
 
  
-     meta:
-       owner: "@jessica.jones"
-       subscribers: ["@jessica.jones", "@joe.joseph"]
-       description: "This is the test description"
-       tags: ["#marketing", "#data_ops"]
-       channel: data_ops
-       alert_suppression_interval: 24
-       slack_group_alerts_by: table
-       alert_fields: ["description", "owners", "tags", "subscribers"]
+   meta:
+     owner: "@jessica.jones"
+     subscribers: ["@jessica.jones", "@joe.joseph"]
+     description: "This is the test description"
+     tags: ["#marketing", "#data_ops"]
+     channel: data_ops
+     alert_suppression_interval: 24
+     slack_group_alerts_by: table
+     alert_fields: ["description", "owners", "tags", "subscribers"]
  
 
From d6fc69295178bf6376726c30ada3b3433a6e2fce Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 17:13:39 +0300 Subject: [PATCH 182/194] new release notes --- docs/quickstart/send-slack-alerts.mdx | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 6a1518d42..6d1a2c3f8 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -30,15 +30,9 @@ You can configure and customize your alerts by configuring: **Before you start** -Before you can start using the alerts, make sure to install the dbt package, configure a profile and install the CLI. +Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** -1. A working Python installation -2. [pip installer](https://pip.pypa.io/en/stable/) for Python -3. Access and credentials to a data warehouse supported by Elementary - -We also recommend you work with a [Python virtual environment](https://docs.python.org/3/library/venv.html). - From 588ed4ce779271c8e0e47ab28f8311e68ff4e3fc Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Wed, 14 Jun 2023 17:16:08 +0300 Subject: [PATCH 183/194] Update send-report-summary.mdx --- docs/guides/share-observability-report/send-report-summary.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/share-observability-report/send-report-summary.mdx b/docs/guides/share-observability-report/send-report-summary.mdx index da5fb59d2..c12be8f59 100644 --- a/docs/guides/share-observability-report/send-report-summary.mdx +++ b/docs/guides/share-observability-report/send-report-summary.mdx @@ -21,7 +21,7 @@ After you [set up a Slack app and token](/integrations/slack#slack-integration-s AWS S3: ```shell -edr send-report --aws-profile-name --s3-bucket-name --slack-token --slack-channel-name +edr send-report --aws-profile-name --s3-bucket-name --slack-token --slack-channel-name --update-bucket-website true ``` GCS: From 87897ace4501e07996b357e1a5ef29ce55df1559 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Wed, 14 Jun 2023 17:16:45 +0300 Subject: [PATCH 184/194] new release notes --- docs/quickstart/send-slack-alerts.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 6d1a2c3f8..8ce5b8680 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -32,7 +32,7 @@ You can configure and customize your alerts by configuring: Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** - +
From 573501c2ac3c8e8655ab3ad9e9bb50ae51964925 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Wed, 14 Jun 2023 22:00:18 +0300 Subject: [PATCH 185/194] Update send-slack-alerts.mdx --- docs/quickstart/send-slack-alerts.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 8ce5b8680..7dbe7f713 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -32,7 +32,8 @@ You can configure and customize your alerts by configuring: Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** -
+ + From a76c76cdecf72a2b866d7cf4c4d692677a65c410 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Wed, 14 Jun 2023 22:23:50 +0300 Subject: [PATCH 186/194] Update send-slack-alerts.mdx --- docs/quickstart/send-slack-alerts.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/quickstart/send-slack-alerts.mdx b/docs/quickstart/send-slack-alerts.mdx index 7dbe7f713..c75a6f1e1 100644 --- a/docs/quickstart/send-slack-alerts.mdx +++ b/docs/quickstart/send-slack-alerts.mdx @@ -32,8 +32,7 @@ You can configure and customize your alerts by configuring: Before you can start using the alerts, make sure to [install the dbt package](/quickstart), [configure a profile and install the CLI](/quickstart-cli). This is **required for the alerts to work.** - - +
From 3d4d3128feb6873389588ea0e3f8ce9722e9f379 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 13:12:12 +0300 Subject: [PATCH 187/194] Update create-profile.mdx --- docs/cloud/onboarding/create-profile.mdx | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/cloud/onboarding/create-profile.mdx b/docs/cloud/onboarding/create-profile.mdx index 1d842baa2..b7a7cb6f6 100644 --- a/docs/cloud/onboarding/create-profile.mdx +++ b/docs/cloud/onboarding/create-profile.mdx @@ -43,7 +43,8 @@ elementary: ## User/password auth ## user: [username] password: [password] - + + port: 5439 role: [user role] database: [database name] warehouse: [warehouse name] @@ -129,4 +130,4 @@ elementary: ### What's next? 1. [Singup to Elementary cloud](/cloud/sonboarding/signup). -2. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). \ No newline at end of file +2. [Connect your Elementary schema to Elementary cloud](/cloud/onboarding/connect-data-warehouse). From 120b671a243d4c24d96eca8da845c846c4bbbcfa Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 13:12:30 +0300 Subject: [PATCH 188/194] Update redshift-profile.mdx --- docs/_snippets/profiles/redshift-profile.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/_snippets/profiles/redshift-profile.mdx b/docs/_snippets/profiles/redshift-profile.mdx index be90541e1..ccd08aecd 100644 --- a/docs/_snippets/profiles/redshift-profile.mdx +++ b/docs/_snippets/profiles/redshift-profile.mdx @@ -14,6 +14,7 @@ elementary: user: [username] password: [password] + port: 5439 dbname: [database name] schema: [schema name]_elementary threads: 4 From b8aec4a13b82b14ac171f0bb8c2620f61f31d69d Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:34:58 +0300 Subject: [PATCH 189/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index 8eeecdfd3..6c1013112 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -45,9 +45,14 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro ## Compliance + + **SOC 2** + Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance. + + [Contact us](mailto:legal@elementary-data.com) for auditing reports and penetration testing results. ## Have more questions? We would be happy to answer! -Reach out to us on [email](mailto:legal@elementary-data.com) or [Slack](https://join.slack.com/t/elementary-community/shared_invite/zt-1b9vogqmq-y~IRhc2396CbHNBXLsrXcA). \ No newline at end of file +Reach out to us on [email](mailto:legal@elementary-data.com) or [Slack](https://join.slack.com/t/elementary-community/shared_invite/zt-1b9vogqmq-y~IRhc2396CbHNBXLsrXcA). From 17f2de81449ad723eba6b961d5876769b641858f Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:35:38 +0300 Subject: [PATCH 190/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index 6c1013112..ff2ae1979 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -46,7 +46,8 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro ## Compliance - **SOC 2** + **SOC 2 certification** +
Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance.
From 826fc605717287f9bfebdde42edb6b53d3d61bba Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:36:00 +0300 Subject: [PATCH 191/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index ff2ae1979..ff8e48bc1 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -47,7 +47,7 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro **SOC 2 certification** -
+ Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance.
From b07e9e0af25b4cba84b72c83e5bc445e43e76aa4 Mon Sep 17 00:00:00 2001 From: Maayan Salom Date: Thu, 15 Jun 2023 15:36:31 +0300 Subject: [PATCH 192/194] Update security-and-privacy.mdx --- docs/cloud/general/security-and-privacy.mdx | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index ff8e48bc1..646da0720 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -46,9 +46,7 @@ To avoid this sampling, set the var `test_sample_rows_count: 0` in your `dbt_pro ## Compliance - **SOC 2 certification** - - Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance. + **SOC 2 certification:** Elementary is currently in the process of obtaining SOC2 and ISO27001 compliance. [Contact us](mailto:legal@elementary-data.com) for auditing reports and penetration testing results. From 5eef395cd0803e277e1885078d217d5803043550 Mon Sep 17 00:00:00 2001 From: Maayan-s Date: Tue, 20 Jun 2023 11:05:58 +0300 Subject: [PATCH 193/194] updated on run end info --- docs/dbt/on-run-end_hooks.mdx | 30 +++++++++++++++++++++++++++++- docs/mint.json | 2 +- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/docs/dbt/on-run-end_hooks.mdx b/docs/dbt/on-run-end_hooks.mdx index 9deb9223d..10b540a2f 100644 --- a/docs/dbt/on-run-end_hooks.mdx +++ b/docs/dbt/on-run-end_hooks.mdx @@ -46,7 +46,35 @@ We only run the hooks that are relevant to each run, and each hook creates a min The first time you execute Elementary the initial update might take a while, but the following updates should be quick. **For `dbt 1.3.0` and lower**, these models would be fully updated on each run. -The performance impact depends on the size of your dbt project. +The performance impact depends on the size of your dbt project. + + +**We strongly recommend Elementary users to use a dbt version of 1.4.0 or above.** + + + +Elementary implemented an "artifacts cache" to improve performance drastically. +A change to dbt-core was required to achieve that, which was only included in 1.4.0 release. +This means that if you upgrade to dbt 1.4.0 or above you will get a great improvement in Elementary hooks runtime. + + +**If you can't upgrade, the alternative is** - +1. Disable the auto uploading of artifacts. +2. Make sure to upload artifacts yourself anytime you make a change to the project (merge a PR). + +**How?** + +1. Add this to dbt_project.yml: + +```yaml dbt_project.yml +vars: + disable_dbt_artifacts_autoupload: true +``` + +2. Make sure to run `dbt run --select edr.dbt_artifacts` upon merging PRs. + + + #### Result models diff --git a/docs/mint.json b/docs/mint.json index 1171f3e4e..6dc801df9 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -111,6 +111,7 @@ "group": "Deployment and Configuration", "pages": [ "deployment-and-configuration/elementary-in-production", + "dbt/on-run-end_hooks", "deployment-and-configuration/collect-job-data", "understand-elementary/cli-install", "understand-elementary/cli-commands" @@ -160,7 +161,6 @@ "pages": [ "understand-elementary/elementary-overview", "guides/modules-overview/dbt-package", - "dbt/on-run-end_hooks", "dbt/dbt-artifacts", "understand-elementary/elementary-report-ui", "understand-elementary/elementary-alerts" From 0c9e814178f81a5efe14c76db8bc57f463784a31 Mon Sep 17 00:00:00 2001 From: Noy Arie Date: Tue, 20 Jun 2023 13:21:02 +0300 Subject: [PATCH 194/194] update urls --- .../collect-job-data.mdx | 31 ++++++++++--------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/docs/deployment-and-configuration/collect-job-data.mdx b/docs/deployment-and-configuration/collect-job-data.mdx index 7ef4aa542..80d6b8c78 100644 --- a/docs/deployment-and-configuration/collect-job-data.mdx +++ b/docs/deployment-and-configuration/collect-job-data.mdx @@ -3,13 +3,13 @@ title: "Collect jobs info from orchestrator" sidebarTitle: "Jobs name & info" --- -_Supported in Elementary 0.8.0 and above_ +_Supported in Elementary 0.8.0 and above_ Elementary can collect metadata about your jobs from the orchestrator you are using, and enrich the Elementary report with this information. The goal is to provide context that is useful to triage and resolve data issues, such as: - Is my freshness / volume issue related to a job that didn't complete? Which job? -- Which tables were built as part of the job that loaded data with issues? +- Which tables were built as part of the job that loaded data with issues? - Which job should I rerun to resolve? @@ -19,15 +19,16 @@ The goal is to provide context that is useful to triage and resolve data issues, - Job ID: `job_id` - Job results URL: `job_url` - The ID of a specific run execution: `job_run_id` +- Job run results URL: `job_run_url` -## How Elementary collects jobs metadata? +## How Elementary collects jobs metadata? #### Environment variables Elementary collects jobs metadata in run time from `env_vars`. -Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. +Orchestration tools usually have default environment variables, so this might happen automatically. The list of supported orchestrators and default env vars is in the following section. These are the env vars that are collected: -`ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID` +`ORCHESTRATOR`, `JOB_NAME`, `JOB_ID`, `JOB_URL`, `JOB_RUN_ID`, `JOB_RUN_URL` To configure `env_var` for your orchestrator, refer to your orchestrator's docs. @@ -43,13 +44,13 @@ dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_loa | var / env_var | Format | |------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| -| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster`
Any other string would be collected but not presented in the report. | +| orchestrator | One of: `airflow`, `dbt_cloud`, `github_actions`, `prefect`, `dagster` | | job_name, job_id, job_run_id | String | -| job_url | Valid HTTP URL | +| job_url, job_run_url | Valid HTTP URL | -## Which orchestrators are supported? +## Which orchestrators are supported? You can pass job info to Elementary from any orchestration tool as long as you configure `env_vars` / `vars`. @@ -57,26 +58,26 @@ The following default environment variables are supported out of the box: | Orchestrator | Env vars | |----------------|----------------------------------------------------------------------------------------------------------------------------| -| dbt cloud | orchestrator
job_id: `DBT_CLOUD_JOB_ID`
job_run_id: `DBT_CLOUD_RUN_ID` | +| dbt cloud | orchestrator
job_id: `DBT_CLOUD_JOB_ID`
job_run_id: `DBT_CLOUD_RUN_ID`
job_url: generated from `ACCOUNT_ID`, `DBT_CLOUD_PROJECT_ID`, `DBT_CLOUD_JOB_ID`
job_run_url: generated from `ACCOUNT_ID`, `DBT_CLOUD_PROJECT_ID`, `DBT_CLOUD_RUN_ID` | | Github actions | orchestrator
job_run_id: `GITHUB_RUN_ID`
job_url: generated from `GITHUB_SERVER_URL`, `GITHUB_REPOSITORY`, `GITHUB_RUN_ID` | | Airflow | orchestrator | -## What if I use dbt cloud + orchestrator? +## What if I use dbt cloud + orchestrator? -By default, Elementary will collect the dbt cloud jobs info. -If you wish to override that, change your dbt cloud invocations to pass the orchestrator job info using `--vars`: +By default, Elementary will collect the dbt cloud jobs info. +If you wish to override that, change your dbt cloud invocations to pass the orchestrator job info using `--vars`: ```shell dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' ``` ## Where can I see my job info? -- In your Elementary schema, the raw fields are stored in the table `dbt_invocations`. You could also use the view `job_run_results` which groups invocation by job. -- In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. +- In your Elementary schema, the raw fields are stored in the table `dbt_invocations`. You could also use the view `job_run_results` which groups invocation by job. +- In the Elementary report, if the info was collected successfully, you can filter the lineage by job and see the details in the node info. ## Can't find your orchestrator? Missing info? -We would love to support more orchestrators and collect more useful info! +We would love to support more orchestrators and collect more useful info! Please [open an issue](https://github.com/elementary-data/elementary/issues/new/choose) and tell us what we should add.