diff --git a/02_activities/assignments/Microcredential_Cohort/Assignment 2 ERD 1 - Bookstore Schema.pdf b/02_activities/assignments/Microcredential_Cohort/Assignment 2 ERD 1 - Bookstore Schema.pdf new file mode 100644 index 000000000..9bbc2acb1 Binary files /dev/null and b/02_activities/assignments/Microcredential_Cohort/Assignment 2 ERD 1 - Bookstore Schema.pdf differ diff --git a/02_activities/assignments/Microcredential_Cohort/Assignment 2 ERD 2 - Customer Address SCD.pdf b/02_activities/assignments/Microcredential_Cohort/Assignment 2 ERD 2 - Customer Address SCD.pdf new file mode 100644 index 000000000..f7767c70d Binary files /dev/null and b/02_activities/assignments/Microcredential_Cohort/Assignment 2 ERD 2 - Customer Address SCD.pdf differ diff --git a/02_activities/assignments/Microcredential_Cohort/Assignment2.md b/02_activities/assignments/Microcredential_Cohort/Assignment2.md index d91d3c9d3..4b60ba4e3 100644 --- a/02_activities/assignments/Microcredential_Cohort/Assignment2.md +++ b/02_activities/assignments/Microcredential_Cohort/Assignment2.md @@ -56,7 +56,50 @@ The store wants to keep customer addresses. Propose two architectures for the CU **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... +#### Prompt 3 +The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2? + +**Answer:** + +### Architecture 1: Overwrite Existing Address (Type 1 Slowly Changing Dimension) +In this model, the CUSTOMER_ADDRESS table stores only the customer’s current address. + +Suggested columns: +- customer_address_id +- customer_id +- street_address +- city +- province +- postal_code +- country + +When a customer updates their address, the existing record is updated using an UPDATE statement. The previous address is overwritten and not retained. + +This is a **Type 1 Slowly Changing Dimension** because historical address data is not preserved. + + +### Architecture 2: Retain Address History (Type 2 Slowly Changing Dimension) +In this model, the CUSTOMER_ADDRESS table stores both current and historical addresses. + +Suggested columns: +- customer_address_id +- customer_id +- street_address +- city +- province +- postal_code +- country +- effective_start_date +- effective_end_date +- is_current + +When a customer changes address: +1. The previous record is marked with an effective_end_date. +2. A new row is inserted with the updated address and marked as current. + +This preserves address history over time. + +This is a **Type 2 Slowly Changing Dimension** because historical changes are retained. ``` *** diff --git a/02_activities/assignments/Microcredential_Cohort/assignment2.sql b/02_activities/assignments/Microcredential_Cohort/assignment2.sql index 4079c18ae..76d629af4 100644 --- a/02_activities/assignments/Microcredential_Cohort/assignment2.sql +++ b/02_activities/assignments/Microcredential_Cohort/assignment2.sql @@ -23,8 +23,11 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil All the other rows will remain the same. */ --QUERY 1 - - +SELECT +product_name || ', ' || +coalesce(product_size, '') || ' ('|| +coalesce (product_qty_type, 'unit') || ')' +FROM product; --END QUERY @@ -41,8 +44,15 @@ HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). Filter the visits to dates before April 29, 2022. */ --QUERY 2 - - +SELECT +customer_id, +market_date, +dense_rank() OVER ( + PARTITION by customer_id + ORDER by market_date +) as visit_number +FROM customer_purchases +WHERE market_date < '2022-04-29'; --END QUERY @@ -53,8 +63,21 @@ only the customer’s most recent visit. HINT: Do not use the previous visit dates filter. */ --QUERY 3 - - +WITH ranked_visits AS ( + SELECT + customer_id, + market_date, + ROW_NUMBER() OVER ( + PARTITION BY customer_id + ORDER BY market_date DESC + ) AS recent_visit_rank + FROM customer_purchases +) + +SELECT * +FROM ranked_visits +WHERE recent_visit_rank = 1; + --END QUERY @@ -66,9 +89,16 @@ You can make this a running count by including an ORDER BY within the PARTITION Filter the visits to dates before April 29, 2022. */ --QUERY 4 - - - +SELECT +customer_id, +product_id, +market_date, +count(*) OVER ( + PARTITION by customer_id, product_id + ) as purchase_count + FROM customer_purchases + WHERE market_date < '2022-04-29'; + --END QUERY @@ -85,7 +115,14 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ --QUERY 5 - +SELECT +product_name, +CASE + WHEN INSTR(product_name, '-') > 0 THEN + TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL +END AS description +FROM product; --END QUERY @@ -94,23 +131,50 @@ Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR w /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ --QUERY 6 - - +SELECT * +FROM product +WHERE product_size REGEXP '[0-9]'; --END QUERY -- UNION -/* 1. Using a UNION, write a query that displays the market dates with the highest and lowest total sales. - -HINT: There are a possibly a few ways to do this query, but if you're struggling, try the following: -1) Create a CTE/Temp Table to find sales values grouped dates; -2) Create another CTE/Temp table with a rank windowed function on the previous query to create -"best day" and "worst day"; -3) Query the second temp table twice, once for the best day, once for the worst day, -with a UNION binding them. */ +/* 1. Using a UNION, write a query that displays the market dates with the highest +and lowest total sales. + +HINT: +1) Create a CTE/Temp Table to find sales values grouped by dates +2) Create another CTE/Temp table with a rank windowed function +3) Query the second temp table twice, once for the best day and once for the worst day, +then bind them with UNION. +*/ --QUERY 7 +WITH sales_by_date AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +ranked_sales AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS best_rank, + RANK() OVER (ORDER BY total_sales ASC) AS worst_rank + FROM sales_by_date +) + +SELECT market_date, total_sales +FROM ranked_sales +WHERE best_rank = 1 + +UNION + +SELECT market_date, total_sales +FROM ranked_sales +WHERE worst_rank = 1; @@ -132,9 +196,29 @@ How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ --QUERY 8 - - - +SELECT + vendor_name, + product_name, + COUNT(customer_id) * 5 * cost_to_customer_per_qty AS total_revenue +FROM ( + SELECT + vi.vendor_id, + v.vendor_name, + p.product_name, + vi.cost_to_customer_per_qty, + c.customer_id + FROM vendor_inventory vi + JOIN vendor v + ON vi.vendor_id = v.vendor_id + JOIN product p + ON vi.product_id = p.product_id + CROSS JOIN customer c +) +GROUP BY + vendor_name, + product_name, + cost_to_customer_per_qty; + --END QUERY @@ -145,7 +229,12 @@ It should use all of the columns from the product table, as well as a new column Name the timestamp column `snapshot_timestamp`. */ --QUERY 9 - +CREATE TABLE product_units AS +SELECT + *, + CURRENT_TIMESTAMP AS snapshot_timestamp +FROM product +WHERE product_qty_type = 'unit'; --END QUERY @@ -155,7 +244,13 @@ Name the timestamp column `snapshot_timestamp`. */ This can be any product you desire (e.g. add another record for Apple Pie). */ --QUERY 10 - +INSERT INTO product_units +SELECT + *, + CURRENT_TIMESTAMP +FROM product +WHERE product_name = 'Apple Pie' +LIMIT 1; --END QUERY @@ -167,6 +262,13 @@ This can be any product you desire (e.g. add another record for Apple Pie). */ HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ --QUERY 11 +DELETE FROM product_units +WHERE product_name = 'Apple Pie' +AND snapshot_timestamp = ( + SELECT MIN(snapshot_timestamp) + FROM product_units + WHERE product_name = 'Apple Pie' +); @@ -174,27 +276,38 @@ HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ -- UPDATE -/* 1.We want to add the current_quantity to the product_units table. -First, add a new column, current_quantity to the table using the following syntax. +/* 1. We want to add the current_quantity to the product_units table. +First, add a new column: ALTER TABLE product_units -ADD current_quantity INT; +ADD COLUMN current_quantity INT; -Then, using UPDATE, change the current_quantity equal to the last quantity value from the vendor_inventory details. - -HINT: This one is pretty hard. -First, determine how to get the "last" quantity per product. -Second, coalesce null values to 0 (if you don't have null values, figure out how to rearrange your query so you do.) -Third, SET current_quantity = (...your select statement...), remembering that WHERE can only accommodate one column. -Finally, make sure you have a WHERE statement to update the right row, - you'll need to use product_units.product_id to refer to the correct row within the product_units table. -When you have all of these components, you can run the update statement. */ ---QUERY 12 +Then using UPDATE, change current_quantity equal to the last quantity +value from vendor_inventory. +HINT: +- determine last quantity per product +- coalesce null values to 0 +- use product_units.product_id in WHERE +*/ +--QUERY 12 +ALTER TABLE product_units +ADD COLUMN current_quantity INT; + +UPDATE product_units +SET current_quantity = COALESCE( + ( + SELECT quantity + FROM vendor_inventory vi + WHERE vi.product_id = product_units.product_id + ORDER BY market_date DESC + LIMIT 1 + ), + 0 +); --END QUERY -