Moh Farooq
Principal Data Engineer Staff Data Engineer Senior Data Engineer
***********@*****.*** 904-***-**** Pittsburgh PA, 15215 Summary
Data Engineer with hands-on experience designing, building, and optimizing scalable data pipelines and ETL workflows using Python, SQL, Apache Spark, and cloud-based platforms. Proven ability to work across the full data lifecycle, from ingestion and transformation to monitoring and optimization, while collaborating with cross-functional teams to, high- quality, reliable data solutions. Strong problem-solving skills, attention to detail, and a continuous learning mindset. Skills
Data Engineering & Pipelines
Large-scale batch processing, near real-time processing, event-driven pipelines, stream joins, stream aggregations, windowing, watermarking, offset management, distributed processing.
Databases & Storage
PostgreSQL, MySQL, SQL Server, Oracle, MongoDB,
ORC, Avro, indexing strategies, partitioning strategies, backup and recovery, high-volume data ingestion.
Cloud Platforms
AWS S3, AWS EMR, Azure Databricks, Azure Key Vault, Google Cloud Dataflow, Cloud Composer, Pub/Sub, cloud monitoring, cloud security, cost optimization.
APIs & Integration
RESTful APIs, API development, API consumption,
OAuth2 authentication, Parquet format, third-party API integration, pagination handling, incremental API loads, rate limiting, webhooks, microservices integration, API versioning.
Batch & Streaming Processing
Large-scale batch processing, near real-time processing, event-driven pipelines, offset management, message replay, high-throughput ingestion, data consistency, distributed processing.
Version Control & CI/CD
Git, GitHub, GitLab, Bitbucket, branching strategies, pull requests, code reviews, Terraform, release management, rollback strategies, DevOps practices.
Big Data Technologies
Apache Spark, Spark SQL, Spark Streaming, Delta Lake, ACID transactions, time travel, distributed computing, cluster computing, partitioning strategies, bucketing, shuffle optimization, broadcast joins.
Workflow Orchestration & Scheduling
Apache Airflow, DAG design, sensors, operators, Azure Data Factory pipelines, AWS Step Functions, workflow orchestration, dependency management, pipeline
monitoring, workflow optimization.
Data Security & Governance
Data encryption at rest, credential management, data masking, tokenization, PII protection, GDPR compliance, audit logging, governance frameworks, secure data
handling, secure API access.
Monitoring & Data Quality
Centralized logging, anomaly detection, data freshness checks, reconciliation reports, root cause analysis, failure recovery, performance monitoring, cost monitoring. production support.
ETL/ELT
Data extraction, RDBMS ingestion, NoSQL ingestion, API ingestion, ELT pipelines, cloud data warehouse
transformations, data transformation logic, data cleansing, data deduplication, data normalization, performance tuning.
Programming
Python, advanced Python, SQL optimization, Java, object- oriented programming, functional programming, shell scripting, reusable code frameworks, performance
optimization, unit testing, integration testing.
Professional Experience
Principal Data Engineer
Stanford Health Care
•Architected and led enterprise-scale healthcare data platforms leveraging Databricks, Apache Spark, Python, and SQL to process high-volume clinical, claims, and operational datasets.
04/2021 – Present
•Designed end-to-end data integration architectures enabling secure data movement across cloud and hybrid healthcare environments.
•Established best practices for ETL design, healthcare data modeling, performance optimization, and data quality governance.
•Automated data ingestion and transformation workflows using CI/CD pipelines, improving deployment reliability and release velocity.
•Optimized large-scale data pipelines supporting millions of patient and transactional records, significantly improving scalability and processing efficiency.
•Implemented monitoring, logging, and alerting frameworks to ensure pipeline health, compliance, and rapid incident resolution.
•Collaborated with clinical, compliance, analytics, and business stakeholders to align data solutions with healthcare regulations and organizational objectives.
•Mentored and guided data engineering teams, driving technical excellence across multiple healthcare data initiatives.
Staff Data Engineer
Mac Repair SF
•Designed, developed, and maintained scalable enterprise data pipelines using Python, SQL, Apache Spark, and Databricks.
03/2018 – 03/2021
•Built and optimized ETL workflows integrating data from relational databases, NoSQL systems, REST APIs, and cloud-based storage platforms.
•Partnered with data scientists, analysts, and software engineers to translate business and technical requirements into robust data solutions.
•Deployed and optimized pipelines within cloud environments, improving system performance, reliability, and cost efficiency.
•Applied advanced data modeling and query optimization techniques to enhance data accessibility and analytics performance.
•Managed source code using Git and supported automated CI/CD deployments for data applications.
•Proactively monitored production pipelines and resolved data quality, latency, and performance issues.
Senior Data Engineer
Wiztec
•Developed and maintained data pipelines using Python, SQL, and ETL tools such as Databricks and Apache Spark.
11/2016 – 02/2018
•Implemented ETL processes to extract, transform, and load data from databases, APIs, and flat files.
•Worked closely with cross-functional teams to gather requirements and deliver accurate datasets.
•Supported data integration and pipeline configuration in cloud platforms.
•Applied basic data modeling and query optimization techniques to improve efficiency.
•Used Git for version control and collaborated within CI/CD workflows.
•Assisted with pipeline monitoring, logging, and troubleshooting. Data Engineer
Software KC
•Assisted in building and maintaining data pipelines using Python and SQL 01/2014 – 10/2016
•Supported ETL activities including data extraction, transformation, validation, and loading.
•Performed data cleansing, reconciliation, and quality checks to ensure accuracy and consistency.
•Documented data workflows, pipeline configurations, and operational procedures.
•Monitored scheduled batch jobs and escalated data issues to senior engineering teams.
•Gained hands-on experience with cloud-based data systems, storage, and processing tools. Projects
Enterprise Data Pipeline Platform
Data Pipeline Platform
•Designed and implemented a scalable data pipeline using Python, Apache Spark, and Databricks.
•Integrated multiple data sources including relational databases and APIs.
•Improved processing efficiency and reduced pipeline failures through monitoring and optimization. Cloud-Based ETL System
•Built ETL workflows in a cloud environment to ingest, transform, and load data into a centralized data lake.
•Applied data modeling and query optimization to improve analytics performance.
•Automated pipeline deployments using CI/CD practices. Data Quality & Monitoring Framework
•Developed logging and alerting mechanisms to track pipeline health and detect anomalies.
•Implemented validation checks to improve data accuracy and reliability. Certificates
Databricks Fundamentals
Professional Certificate
Databricks Fundamentals
Professional Certificate
Python for Data Certification
Engineering
AWS Cloud Practitioner
Education
Bachelor of Science in Computer Science