A production-grade SQL Server Data Warehouse built using the Medallion Architecture (Bronze β Silver β Gold), integrating multi-source enterprise data into analytics-ready dimensional models.
- Project Overview
- Architecture
- Data Sources
- Data Layers
- Data Flow & Lineage
- ETL Workflow
- Gold Layer Outputs
- Tech Stack
- Author
BusinessDW is a comprehensive Business Data Warehouse built on SQL Server that integrates data from two enterprise source systems β CRM and ERP β and transforms raw operational data into structured, analytics-ready information.
The project follows the Medallion Architecture β a layered data design pattern that ensures traceability, data quality, and business-readiness at each stage of the pipeline.
Key Goals:
- Integrate multi-source enterprise data (CRM + ERP)
- Build a reliable, auditable ingestion pipeline
- Apply data cleansing, standardization, and business rules
- Deliver dimensional models ready for BI, reporting, and analytics
ββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Sources ββββββΆβ BRONZE ββββββΆβ SILVER ββββββΆβ GOLD ββββββΆβ Consumers β
β CRM / ERPβ β Raw Data β β Clean & Std β βBusiness-Readyβ β BI / Analytics β
ββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
The data warehouse is structured as a SQL Server database with three schema layers, each serving a distinct purpose in the data pipeline.
Data is ingested from two enterprise systems via CSV file extracts (batch ingestion):
| File | Description |
|---|---|
cust_info.csv |
Customer master information |
prd_info.csv |
Product details |
sales_details.csv |
Transactional sales / order records |
| File | Description |
|---|---|
cust_az12.csv |
Customer demographics β birth date, gender |
loc_a101.csv |
Customer geography β country mapping via customer ID |
px_cst_g1v2.csv |
Product classification β category, subcategory, maintenance indicator, product line |
| Property | Detail |
|---|---|
| Definition | Raw, unprocessed data β as-is from source systems |
| Objective | Traceability & Debugging |
| Object Type | Tables |
| Load Method | Full Load (Truncate & Insert) |
| Transformations | None (as-is) |
| Data Modeling | None (as-is) |
| Target Audience | Data Engineers |
The Bronze layer preserves the exact source data to enable full auditability and root-cause analysis for any downstream data issues.
| Property | Detail |
|---|---|
| Definition | Cleaned & standardized data |
| Objective | Intermediate layer β prepare data for analysis |
| Object Type | Tables |
| Load Method | Full Load (Truncate & Insert) |
| Target Audience | Data Analysts, Data Engineers |
Transformations applied:
- Data Cleansing
- Data Standardization
- Data Normalization
- Derived Columns
- Data Enrichment
Key business rules:
- Deduplicate customers
- Standardize product IDs and names
- Conform customer IDs across ERP and CRM
- Map country codes
- Enrich products with classification attributes
- Validate date and gender values
- Cleanse sales transactions
| Property | Detail |
|---|---|
| Definition | Business-Ready data |
| Objective | Provide data for reporting & analytics consumption |
| Object Type | Views |
| Load Method | None (derived from Silver) |
| Target Audience | Data Analysts, Business Users |
Transformations applied:
- Data Integration (joining across domains)
- Data Aggregation
- Business Logic & Rules
Data Modeling patterns:
- Star Schema
- Aggregated Objects
- Flat Tables
Each Bronze table maps 1:1 to a Silver table. Silver tables are then integrated and aggregated into Gold views using business logic and dimensional modeling.
Each layer follows a consistent 4-phase development workflow:
Analyse Code Validate Document
βββββββββββββββββ βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ
Interview source βββΆ Data Ingestion βββΆ Completeness & βββΆ Documenting
system experts (Stored Proc) Schema Checks Versioning in GIT
Analyse Code Validate Document
βββββββββββββββββ βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ
Explore & βββΆ Data Cleansing βββΆ Data Correctness βββΆ Documenting
Understand Data (Stored Proc) Checks Versioning in GIT
+ Data Flow
+ Data Integration
Analyse Code Validate Document
βββββββββββββββββ βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ
Explore & βββΆ Data βββΆ Data Integration βββΆ Documenting
Understand Integration Checks Versioning in GIT
Business Objects (Stored Proc) + Data Model
+ Data Catalog
+ Data Flow
The following analytics-ready objects are produced in the Gold layer:
| Object | Type | Description |
|---|---|---|
dim_customers |
View | Unified customer dimension β integrates CRM + ERP demographics and geography |
dim_products |
View | Enriched product dimension β integrates product details with classification data |
dim_location |
View | Geographic dimension for location-based analytics |
fact_sales |
View | Central sales fact table β transactional order records |
These objects form a Star Schema optimized for BI tools, ad-hoc SQL queries, and machine learning pipelines.
| Component | Technology |
|---|---|
| Database | SQL Server |
| ETL / Transformation | T-SQL Stored Procedures |
| Load Strategy | Batch Processing β Full Load (Truncate & Insert) |
| Source Interface | CSV files (folder-based ingestion) |
| Version Control | Git |
| Consumers | BI & Reporting, Ad-Hoc SQL, Machine Learning |
Ritik Aspiring Data Engineer | Focused on building production-grade data systems
This project is a complete end-to-end Business Data Warehouse implementation β from raw data ingestion to business-ready analytical datasets. It demonstrates real-world data engineering practices including multi-source integration, layered ETL design, data quality management, and dimensional modeling using industry-standard Medallion Architecture principles.
Built with SQL Server Β· Medallion Architecture Β· CRM & ERP Integration


