I am a Data Engineer specialized in architecting and optimizing high-throughput distributed data pipelines and enterprise cloud lakehouses. My engineering philosophy centers on idempotency, schema evolution, and performance tuning at scale.
With a deep-level foundation managing mission-critical infrastructure under intense SLA pressure for Microsoft enterprise clients (via HCLTech), I bridge pure database mechanics with modern big data compute fabrics.
Compute & Streaming: [ Apache Spark, PySpark, Azure Databricks, Apache Kafka, Apache Flink ]
Storage & Architecture: [ Delta Lake, Apache Iceberg, Medallion Architecture (Bronze/Silver/Gold) ]
Orchestration & Warehousing: [ Azure Data Factory (ADF), Azure Synapse Analytics, Azure SQL DB ]
Languages & Engineering: [ Python, Advanced SQL (CTEs, Window Functions), T-SQL, PowerShell ]
DataOps & Infrastructure: [ Infrastructure as Code (Bicep), CI/CD, Azure Managed Identities ]This is the end-to-end framework I deploy for reliable, automated cloud data products:
[ Source APIs / DBs ]
│
▼ (Azure Data Factory Orchestration)
┌─────────────────────────────┬─────────────────────────────┐
│ Azure Data Lake Storage (ADLS Gen2) │
│ │
│ 📥 Bronze Layer (Raw Staging · Parquet) │
│ │ │
│ ▼ (PySpark Data Cleanse) │
│ ⚙️ Silver Layer (Enriched · Schema Enforcement) │
│ │ │
│ ▼ (Spark Performance Tuning) │
│ 🏆 Gold Layer (Delta Tables · ACID Transactions) │
└─────────────────────────────┴─────────────────────────────┘
│
▼
[ Analytics Engine: Azure Synapse / Power BI ]A production-grade big data engine designed for zero-data-loss and full data reliability.
- Distributed Processing: Programmed complex PySpark dataframe transformations inside Azure Databricks to clean and parse massive log structures.
- Optimization: Maximized Spark driver/executor efficiency via broadcast joins, data caching strategies, and shuffle partitioning minimization.
- Storage Engine: Utilized Delta Lake and Apache Iceberg table formats to implement schema enforcement, partition evolution, and time-travel capabilities.
- Orchestration: Built parameter-driven Azure Data Factory (ADF) pipelines using dynamic control loops to dynamically ingest multi-tenant staging data.
Advanced performance optimization scripts forged from live tier-3 enterprise support environments.
- Real-time query execution plan auditing, DMV diagnostics, and Query Store wait-state extractions.
- Implemented production-grade script layers to detect live database deadlocks and resource contention under heavy application loads.
- Automated Infrastructure deployment templates using Bicep to secure private endpoints, VNets, and failover topologies.
