Skip to content
View amansarohadev's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report amansarohadev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
amansarohadev/README.md

⚡ Hello, I'm Aman Saroha

Typing SVG

LinkedIn Portfolio LeetCode Email


🚀 Overview

I am a Data Engineer specialized in architecting and optimizing high-throughput distributed data pipelines and enterprise cloud lakehouses. My engineering philosophy centers on idempotency, schema evolution, and performance tuning at scale.

With a deep-level foundation managing mission-critical infrastructure under intense SLA pressure for Microsoft enterprise clients (via HCLTech), I bridge pure database mechanics with modern big data compute fabrics.


🛠️ Core Tech Stack

Compute & Streaming: [ Apache Spark, PySpark, Azure Databricks, Apache Kafka, Apache Flink ]
Storage & Architecture: [ Delta Lake, Apache Iceberg, Medallion Architecture (Bronze/Silver/Gold) ]
Orchestration & Warehousing: [ Azure Data Factory (ADF), Azure Synapse Analytics, Azure SQL DB ]
Languages & Engineering: [ Python, Advanced SQL (CTEs, Window Functions), T-SQL, PowerShell ]
DataOps & Infrastructure: [ Infrastructure as Code (Bicep), CI/CD, Azure Managed Identities ]

🏗️ Production Architecture Blueprint

This is the end-to-end framework I deploy for reliable, automated cloud data products:

                  [ Source APIs / DBs ]
                              
                              ▼ (Azure Data Factory Orchestration)
┌─────────────────────────────┬─────────────────────────────┐
│            Azure Data Lake Storage (ADLS Gen2)            │
│                                                           │
│         📥 Bronze Layer (Raw Staging · Parquet)           │
│                             │                             │
│                             ▼ (PySpark Data Cleanse)      │
│      ⚙️ Silver Layer (Enriched · Schema Enforcement)       │
│                             │                             │
│                             ▼ (Spark Performance Tuning)  │
│      🏆 Gold Layer (Delta Tables · ACID Transactions)      │
└─────────────────────────────┴─────────────────────────────┘
                              
                              
       [ Analytics Engine: Azure Synapse / Power BI ]

📂 Featured Data Products

🌊 Enterprise Cloud Lakehouse Pipeline

A production-grade big data engine designed for zero-data-loss and full data reliability.

  • Distributed Processing: Programmed complex PySpark dataframe transformations inside Azure Databricks to clean and parse massive log structures.
  • Optimization: Maximized Spark driver/executor efficiency via broadcast joins, data caching strategies, and shuffle partitioning minimization.
  • Storage Engine: Utilized Delta Lake and Apache Iceberg table formats to implement schema enforcement, partition evolution, and time-travel capabilities.
  • Orchestration: Built parameter-driven Azure Data Factory (ADF) pipelines using dynamic control loops to dynamically ingest multi-tenant staging data.

🔍 Azure SQL DB Mission-Critical Toolkit

Advanced performance optimization scripts forged from live tier-3 enterprise support environments.

  • Real-time query execution plan auditing, DMV diagnostics, and Query Store wait-state extractions.
  • Implemented production-grade script layers to detect live database deadlocks and resource contention under heavy application loads.
  • Automated Infrastructure deployment templates using Bicep to secure private endpoints, VNets, and failover topologies.

Pinned Loading

  1. adf adf Public

    Azure Data Factory pipeline implementations | Incremental loads, parameterized pipelines, ForEach + Get Metadata patterns | Production-ready ADF

  2. Vendor-Performance-Analysis Vendor-Performance-Analysis Public

    End-to-end retail analytics pipeline | Python, SQLAlchemy, Power BI | 2GB+ dataset | Statistical testing on vendor KPIs | Business-ready insights

    Jupyter Notebook

  3. azure-de-learning-journal azure-de-learning-journal Public

    Azure Data Engineering build log | ADF, Databricks, PySpark, Synapse | Medallion Architecture implementation | Documented end-to-end

    Jupyter Notebook 1

  4. hr-analytics-mysql hr-analytics-mysql Public

    HR analytics using advanced SQL | Window functions, CTEs, views, indexing | Optimized for dashboard consumption | MySQL

    1