Summary
Skills
Muhammad Iqbal
Lead Data Engineer Cloud Data Engineer Data Architect
******.****@*****.*** 415-***-**** Oakland, CA, 94603 Data Engineering leader with over a decade plus years of experience architecting and optimizing large-scale data platforms across AWS, GCP, and Azure. Expert in designing robust ETL and ELT pipelines, advanced data processing with Spark and Databricks, and real-time analytics using Kafka and Pub/Sub. Adept at transforming legacy systems into high-performance, cost-efficient architectures that improve reliability and time-to-insight. Deep understanding of data governance, security, and compliance frameworks, including GDPR, SOC 2, and HIPAA. Recognized for building and mentoring high-performing teams, fostering data-driven culture, and delivering scalable solutions that empower decision-making across healthcare, SaaS, and fintech domains.
Programming & Scripting
Python: Pandas, PySpark, NumPy
SQL: Advancedoptimization,Windowfunctions,Query
tuning
Scala or Java
Bash, Shell scripting: Automation, Orchestration
Data Storage & Warehousing
BigQuery, Redshift, Synapse
Delta Lake, Iceberg, Hudi for modern lakehouse design Parquet, ORC, Avro for efficient data storage formats Snowflake
Data Quality, Observability & Governance
GreatExpectations,Soda,MonteCarlofordatatesting&
quality
Data Catalogs: Alation, Collibra, AWS Glue Data Catalog Lineage & Metadata: OpenLineage, DataHub, Amundsen Security & Compliance: IAM, KMS, Encryption, GDPR, SOC2
Monitoring & Alerting: Prometheus, Grafana, ELK, Datadog Cost & Performance Optimization
FinOps tools: CloudZero, Kubecost, AWS Cost Explorer Query&computetuning:Partitioning,Caching, Indexing Resource autoscaling & storage tiering for multi-cloud efficiency
Healthcare & Compliance
FHIR, HL7, DICOM standards
HIPAAcomplianceandPHIdatahandling
EHR data modeling: Epic, Cerner
Data Processing & Frameworks
Apache Spark: Batch + Structured streaming
Databricks: Delta Lake, Unity Catalog, MLflow
Airflow and Astronomer
Dbt: Data transformations, Testing, Documentation
Kafka, Kinesis, Pub/Sub for real-time streaming
Cloud Ecosystem
AWS: S3, Glue, EMR, Lambda, Redshift, Athena
Azure: Data Factory, Synapse, Databricks, Key Vault GCP:BigQuery,Dataflow,Pub/Sub,CloudComposer,GKE
Terraform, CloudFormation for Infrastructure-as-Code Docker, Kubernetes: EKS, GKE, AKS
CI/CD & Automation
CI/CD Tools: Jenkins, GitHub Actions, GitLab CI, Argo CD Version Control: Git, GitFlow, Trunk-based development Automation: Ansible, Terraform, Bash scripting
Testing: PyTest, Unit/integration testing for pipelines Analytics, BI & Visualization
Tableau, Power BI, Looker, Mode Analytics
SQL-based transformations feeding BI dashboards
IntegrationwithML/AIpipelines:MLflow,SageMaker,
Vertex AI
Soft Skills & Leadership
Dataarchitecturedesign&Solutionownership
Cross-functional collaboration
Mentorship, documentation and Agile delivery
Stakeholder communication translating data into
business outcomes
Experience
Lead Data Engineer
Validic Sep 2022 – Present
Designed and led the development of scalable, compliant data platforms on AWS and Azure that power patient analytics and clinical intelligence.
Directed the creation of robust ETL and ELT pipelines with Spark, Airflow, and dbt, enablingnearreal-time delivery of critical healthcare insights.
Automated infrastructure provisioning throughTerraformandAnsible,cutting deployment time and ensuring consistency across multiple environments.
Built unified monitoring and observability using Prometheus, Grafana, ELK, and Datadog, improving reliability and accelerating incident resolution.
Drove data quality andgovernanceprogramstomaintainaccuracy,lineage, and regulatory complianceunderGDPRand healthcare standards.
Collaborated with product, data science, and clinical teams to translate business and patient-care objectives into scalable data solutions.
Mentored a team of data engineers, promoting best practices in pipeline architecture, CI/CD, and cloud cost optimization. Tuned large-scale data pipelines and infrastructure to boost throughput, reduce latency, and lower compute costs across high-volume workloads.
Senior Data Engineer
Ascend.io FEB 2019 to JUNE 2022
Designedanddeliveredscalable,high-performancedatapipelinesusingSpark,Databricks,Airflow, and dbt across AWS, GCP, and Azure environments.
ArchitectedamoderndatalakehouseintegratingSnowflake,BigQuery, and Delta Lake, improving query speed, scalability, and cost efficiency.
Automated infrastructurewithTerraformandCI/CDpipelines,ensuring consistent deployments and reducing delivery time by more than 40%.
Introduced observability frameworks with SLOs, DAG SLAs, and data quality metrics, improving reliability and cutting pipeline failures significantly.
Partneredwithproduct,analytics, and ML teams to design datamodelsthatpoweredAI-driven insights and customer- facing dashboards.
Implemented datagovernanceandaccesscontrolsalignedwithGDPRandHIPAA,reinforcing securityandcompliance across data workflows.
Mentored data engineers on Spark performance tuning, modular dbt development, and scalable infrastructure practices. Data Engineer
XpertDox Nov 2015 to DEC 2018
Built andmaintainedETLpipelinestointegratediversehealthcaredatasetsintoAWSusingS3,RDS, and Lambda for downstream analytics.
Migrated legacy systems to a cloud-first data warehouse, improving reporting speed and reliability for clinical and operational users.
Implemented Docker and Kubernetes for workload containerization, improving deployment consistency and system scalability.
CreatedmonitoringdashboardswithPrometheusandELKtotrackdatapipelineperformanceandproactivelydetect anomalies.
Automated infrastructure provisioning with Terraform and CloudFormation, reducingmanualsetupandaccelerating delivery cycles.
Collaborated with clinical teams to develop HIPAA-compliant analytics pipelines supporting patient engagement and care outcomes.
Certificates
Education
Projects
Healthcare Cloud Migration
Migrated on-premise healthcare systems to AWS using EC2, S3, RDS, and EKS. Automated infrastructure with Terraform/AnsibleandimplementedobservabilitywithPrometheus,Grafana, and ELK.Deliveredasecure,HIPAA-compliant cloud platform supporting patient analytics.
FinTech Lakehouse Modernization
Modernized legacy financial systems by implementing a Lakehouse with Databricks, Delta Lake, and Snowflake. Integrated multi-source data via Azure Data Factory and AWS Glue, embedding automated quality and compliance checks. Accelerated analytics and reduced manual reconciliation.
Enterprise Observability Framework
Implemented observability across AWS, GCP, and Azure using Prometheus, Grafana, Datadog, and ELK. Designed alerting, incident response, and CI/CDpipelineswithJenkins,GitLab CI/CD, and ArgoCD. Improved platform reliability and confidence in production data systems.
AWS Certified Data Analytics
CloudProfessionalDataEngineer
Certified Health Data Analyst
Bachelorʼs of Sciences