Shay Iqbal
Data Architecture & Engineering Leader Azure AWS Big Data AI/ML
Medallion Architecture
****.********@*****.*** 510-***-**** San Francisco, CA 94105 linkedin.com/in/shayiqbal/
SUMMARY
Principal-level Senior Data Engineer with 10+ years of experience architecting enterprise-scale data ecosystems across healthcare, consulting, and global organizations. Expert in modern lakehouse architectures, distributed systems, and large-scale ETL/ELT frameworks using Spark, Databricks, Kafka, Snowflake, and Delta Lake. Skilled in developing real-time and batch ingestion pipelines and driving cloud modernization initiatives across AWS, Azure, and GCP to improve scalability, reliability, and performance. Proficient in Airflow, dbt, Python, SQL, Azure Data Factory, Azure Databricks, and AKS, with strong experience in data quality, governance, lineage, and observability using Great Expectations, DataHub, and Monte Carlo. Adept at enabling analytics through optimized data models, semantic layers, and BI-ready datasets for tools like Power BI, Tableau, and Looker. Recognized for influencing architectural direction, mentoring engineering teams, reducing platform costs, and delivering secure, compliant, high-impact data solutions that accelerate decision-making.
PROFESSIONAL EXPERIENCE
Principal Data Engineer
Clinithink
Architect enterprise data platforms using Azure, Databricks, Delta Lake, Synapse, and Kafka.
03/2022 – Present
Lead design of scalable pipelines for batch and real-time processing using PySpark, SQL, and Python.
Implement cloud modernization frameworks across Azure and AWS. Drive governance initiatives using Purview, data lineage modeling, metadata automation, and quality frameworks.
Oversee streaming architectures using Kafka, Spark Structured Streaming, and event-driven patterns.
Lead BI enablement using Power BI semantic models, data marts, and metrics layers.
Develop architecture roadmaps, platform standards, and engineering best practices. Mentor engineering teams and provide technical direction for large-scale delivery programs.
Partner with product, clinical, and analytics teams to build compliant healthcare data solutions (HIPAA, GDPR).
Lead Data Engineer
Slalom
Delivered cloud data engineering solutions across Azure, AWS, and GCP. 09/2018 – 02/2022
Built high-volume ETL and ELT pipelines using Python, SQL, dbt, and Spark. Designed scalable data models using dimensional modeling, Data Vault, and canonical models.
Implemented orchestration using Airflow, Prefect, and CI/CD workflows. Developed analytics layers and dashboards using Power BI and Tableau. Supported migrations from legacy systems to cloud platforms and modern architectures.
Collaborated with cross-functional teams to deliver client-facing solutions. Data Engineer
MSH
Built data pipelines for ingestion, transformation, and reporting using SQL and Python.
10/2015 – 08/2018
Worked with SQL Server, PostgreSQL, and other relational databases. Developed analytics datasets and optimized SQL workloads for performance. Supported BI reporting using Power BI and Tableau. Maintained data quality, metadata, and governance processes. Coordinated with analysts and engineering teams to support operational workflows. SKILLS
Data Architecture & Modeling
Star Schema, Snowflake Schema, Data Vault, Canonical Models, Delta Lake, Lakehouse, Medallion Architecture, Dimensional Modeling, Logical Models, Physical Models, Graph Modeling Cloud Platforms
Azure, Microsoft Fabric, Synapse, ADLS, Databricks, Purview, Event Hub, AWS, Redshift, Glue, EMR, S3, Lambda, GCP, BigQuery, Dataflow, Dataproc, Pub/Sub Data Engineering
PySpark, Python, SQL, Scala, Delta Live Tables, Databricks Asset Bundles, ETL, ELT, CDC, Batch Processing, Real-Time Processing, API Ingestion
Big Data & Streaming
Kafka, Flink, Hadoop, Spark, Event-Driven Architecture, Streaming Pipelines BI & Analytics
Power BI, Tableau, Looker, Semantic Layer, KPIs, Data Marts Governance & Quality
Purview, Data Lineage, Metadata Management, Great Expectations, Soda, Monte Carlo, Data Contracts, PII Governance, HIPAA, GDPR, SOC2
DevOps & Orchestration
Airflow, Prefect, Dagster, CI/CD, Git, GitHub Actions, GitLab, Azure DevOps, Docker, Kubernetes, Terraform, IaC
Databases
SQL Server, PostgreSQL, MySQL, Oracle, NoSQL, Cosmos DB ML & GenAI Enablement
Feature Stores, ML Pipelines, TensorFlow, PyTorch, LLMs, RAG, Agentic Systems, Model Serving Security & Performance
IAM, RBAC, Encryption, Optimization, High Availability, Auto-Scaling, Cost Optimization, FinOps Tools & Ecosystem
dbt, Snowflake, Palantir, DataHub, Alation, Amundsen, KQL, Apache Spark, Google Dataflow Management & Leadership
Team Leadership, Technical Direction, Delivery Management, Project Ownership, Agile, Scrum, Stakeholder Management, Cross-Functional Collaboration, Roadmap Planning, Mentoring, Coaching, Client Leadership, Vendor Management, Sprint Planning, Prioritization, Decision-Making Enterprise Solution Architecture
End-to-End System Design, Architecture Blueprints, Platform Scalability, Technology Selection, Integration Patterns
Data Product Management
Data Product Lifecycle, Domain Ownership, Data Contracts, KPI Alignment, Business Value Delivery AI/Data Strategy & Modernization
Cloud Modernization, AI Readiness, Data Maturity Assessments, Strategic Roadmaps, Organizational Enablement
CERTIFICATIONS
Microsoft Azure Data Engineer
Associate
AWS Certified Data Analytics –
Specialty
AWS Certified Solutions
Architect – Associate
Microsoft Azure Solutions
Architect Expert
Microsoft Azure Data Engineer
Associate
AWS Certified Data Analytics –
Specialty
AWS Certified Solutions
Architect – Associate
Microsoft Azure Solutions
Architect Expert
EDUCATION
Bachelor of Science in Computer Science
University of California
Graduated: 2015
2015