dbt + Metabase integration
-
Updated
Nov 24, 2025 - Python
dbt + Metabase integration
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python
Hypergol is a Data Science/Machine Learning productivity toolkit to accelerate any projects into production with autogenerated code, standardised structure for data and ML and parallel processing out-of-the-box.
This repository is a working ETL framework which utilizes user data from Spotify API using ➲Python for Extraction and Transformation ➲SQL for Data Loading and Staging ➲Airflow for Data Orchestration and Monitoring ➲PowerBI for Reporting
⚙️ ETL pipeline on AWS using S3 and Redshift
Developed a 3-page Power BI dashboard (global and Asian overview) using Python scripts to load and clean World Bank data (1960–2020), reducing data processing time by 25\%. and Containerized the database in Docker, enabling scalable access, and visualized trends (e.g., 3\% annual GDP growth in Asia), enhancing stakeholder insights.
Formula 1 race data engineering project which utilises azure services and databricks to ingest and analyse the data.
Data model for the Participatory Knowledge Practices in Analogue and Digital Image Archives (PIA) project
This project showcases an end-to-end ELT (Extract, Load, Transform) pipeline leveraging the TPCH orders table from Snowflake's sample database. The primary goal is to demonstrate modern data engineering practices using Snowflake, dbt (Data Build Tool), and Apache Airflow.
ETL pipeline on PostgreSQLusing Apache Airflow and dbt Cloud
This project carried out as the final capstone project of the Udacity Data Engineering nanodegree program. It involves Extracting, Loading, and Transforming of datasets of different file formats from the web (downloadable,), to the lake (S3), and then the warehouse (Redshift)
An end to end data engineering project aiming to build an ELT data pipeline that generate insights into ads campaign.
A model prediction of C@ncer patients. This project contains informative analysis and model prediction. Unfortunately, the code doesn't work past the analysis. it would be great if someone could reach out to me to solve the problem. After clicking "Train model", and doing anything after that, you go back tot the train model button.
Second task on CodSoft Internship Transaction Fraud Detection! During my CodSoft internship, I worked on a challenging project focused on detecting fraudulent credit card transactions
Real-time AML transaction monitoring system processing 5M transactions/day with <3s latency. Built with Kafka, Spark Streaming, Delta Lake, AWS Glue, and Airflow. Features: rule-based detection, SCD Type 2 customer profiling, automated compliance reporting, and comprehensive data quality framework.
Automated data pipeline scraping fuel prices daily, apply transformations and load the data to a PostgreSQL database.
End-to-end Reddit data pipeline using Airflow, PostgreSQL, and Dimensional Data Modeling
End-to-end ETL pipeline in Python leveraging Pydantic data models and GCP-native orchestration (Cloud Build, Cloud Run, Workflows) to extract, transform, and load Splash API data into BigQuery for scalable analytics
Add a description, image, and links to the data-modelling topic page so that developers can more easily learn about it.
To associate your repository with the data-modelling topic, visit your repo's landing page and select "manage topics."