Aman amansarohadev

⚡ Hello, I'm Aman Saroha

🚀 Overview

I am a Data Engineer specialized in architecting and optimizing high-throughput distributed data pipelines and enterprise cloud lakehouses. My engineering philosophy centers on idempotency, schema evolution, and performance tuning at scale.

With a deep-level foundation managing mission-critical infrastructure under intense SLA pressure for Microsoft enterprise clients (via HCLTech), I bridge pure database mechanics with modern big data compute fabrics.

🛠️ Core Tech Stack

Compute & Streaming: [ Apache Spark, PySpark, Azure Databricks, Apache Kafka, Apache Flink ]
Storage & Architecture: [ Delta Lake, Apache Iceberg, Medallion Architecture (Bronze/Silver/Gold) ]
Orchestration & Warehousing: [ Azure Data Factory (ADF), Azure Synapse Analytics, Azure SQL DB ]
Languages & Engineering: [ Python, Advanced SQL (CTEs, Window Functions), T-SQL, PowerShell ]
DataOps & Infrastructure: [ Infrastructure as Code (Bicep), CI/CD, Azure Managed Identities ]

🏗️ Production Architecture Blueprint

This is the end-to-end framework I deploy for reliable, automated cloud data products:

                  [ Source APIs / DBs ]
                              │
                              ▼ (Azure Data Factory Orchestration)
┌─────────────────────────────┬─────────────────────────────┐
│            Azure Data Lake Storage (ADLS Gen2)            │
│                                                           │
│         📥 Bronze Layer (Raw Staging · Parquet)           │
│                             │                             │
│                             ▼ (PySpark Data Cleanse)      │
│      ⚙️ Silver Layer (Enriched · Schema Enforcement)       │
│                             │                             │
│                             ▼ (Spark Performance Tuning)  │
│      🏆 Gold Layer (Delta Tables · ACID Transactions)      │
└─────────────────────────────┴─────────────────────────────┘
                              │
                              ▼
       [ Analytics Engine: Azure Synapse / Power BI ]

📂 Featured Data Products

🌊 Enterprise Cloud Lakehouse Pipeline

A production-grade big data engine designed for zero-data-loss and full data reliability.

Distributed Processing: Programmed complex PySpark dataframe transformations inside Azure Databricks to clean and parse massive log structures.
Optimization: Maximized Spark driver/executor efficiency via broadcast joins, data caching strategies, and shuffle partitioning minimization.
Storage Engine: Utilized Delta Lake and Apache Iceberg table formats to implement schema enforcement, partition evolution, and time-travel capabilities.
Orchestration: Built parameter-driven Azure Data Factory (ADF) pipelines using dynamic control loops to dynamically ingest multi-tenant staging data.

🔍 Azure SQL DB Mission-Critical Toolkit

Advanced performance optimization scripts forged from live tier-3 enterprise support environments.

Real-time query execution plan auditing, DMV diagnostics, and Query Store wait-state extractions.
Implemented production-grade script layers to detect live database deadlocks and resource contention under heavy application loads.
Automated Infrastructure deployment templates using Bicep to secure private endpoints, VNets, and failover topologies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aman amansarohadev

Achievements