Senior Data Engineer

Location:

Posted:

March 22, 2026

Resume:

Senior Data Engineer with **+ years of experience designing and scaling modern data platforms and ETL/ELT pipelines across enterprise environments, with strong expertise in Databricks, Apache Spark (PySpark), Snowflake, and dbt. Experienced in building robust lakehouse architectures and high-performance data systems supporting analytics, business intelligence, and machine learning workflows at scale. Skilled in developing batch and streaming pipelines, including real-time ingestion, incremental processing, and CDC-based data pipelines for large datasets. Proficient in dimensional modeling and optimizing data platforms to improve performance, scalability, and efficiency across distributed systems. Hands-on experience working across multi-cloud environments, including AWS and GCP, enabling flexible and scalable data infrastructure solutions. Proven track record of improving system performance, reducing costs, and ensuring data reliability, quality, and consistency in production environments.

CORE EXPERTISE

Databricks, Snowflake, PySpark, Lakehouse Architecture, Scalable Data Pipelines TECHNICAL SKILLS

• Core Data Platforms: Databricks, Snowflake, Delta Lake, BigQuery

• Programming & Data Processing: Python, SQL, PySpark, Apache Spark

• Data Engineering: ETL/ELT Pipelines, Batch & Streaming Pipelines, CDC, Data Transformation

• Data Architecture & Modeling: Lakehouse Architecture, Dimensional Modeling (Star Schema, Fact & Dimension Tables)

• Database: PostgreSQL, MySQL, Micro SQL Server, MongoDB

• Orchestration & Transformation: Apache Airflow, dbt

• Streaming & Messaging: Apache Kafka, Google Pub/Sub

• Cloud Platforms (Multi-Cloud): AWS (S3, EC2, IAM, Kinesis), GCP (BigQuery, Cloud Storage, Pub/Sub), Azure (Data Factory)

• Data Quality & Monitoring: Data Validation, Schema Enforcement, Data Freshness, Monitoring & Alerting

• DevOps & Tools: Terraform, Docker, CI/CD (GitHub Actions, GitLab CI) PROFESSIONAL EXPERIENCE

Senior Data Engineer

Molina HealthCare – Long Beach, CA, USA April 2021 – Present

• Designed and maintained scalable ETL/ELT pipelines using Databricks, PySpark, Apache Spark, dbt, and Airflow, processing large- scale operational and behavioral datasets (TB-scale) for analytics and machine learning use cases across multiple business units.

• Architected a lakehouse data platform using Databricks and Delta Lake, enabling unified batch and streaming data processing with ACID transactions, schema evolution, and time-travel capabilities, improving data reliability and governance across 150+ datasets.

• Built reusable data ingestion frameworks integrating data from REST APIs, relational databases, and event-driven systems, standardizing structured and semi-structured data into curated datasets for downstream analytics and ML workflows.

• Implemented real-time and near real-time pipelines using Apache Kafka and Google Pub/Sub, enabling event-driven data processing for operational monitoring, analytics, and feature generation.

• Developed modular transformation pipelines using dbt, leveraging advanced SQL techniques (window functions, cohort analysis, time-series aggregations) to support experimentation, KPI reporting, and business intelligence use cases.

• Designed and maintained bronze, silver, and gold data layers, standardizing transformation logic and enabling scalable, consistent data consumption across engineering, analytics, and product teams. Thomas Westhoefer

Chicago, IL +1-209-***-**** *****************@*****.*** www.linkedin.com/in/thomasw

• Optimized distributed Spark workloads and analytical queries across Databricks and Snowflake, improving performance by 35% and reducing compute costs by 20% through partitioning strategies, clustering, and query plan optimization.

• Implemented incremental and CDC-based data pipelines, reducing batch processing runtimes by 30% while significantly improving data freshness and reliability for production datasets.

• Developed pipelines for healthcare interoperability data (FHIR, HL7v2), including parsing, normalization, and schema mapping into structured datasets supporting analytics and ML applications.

• Partnered with data scientists to build feature engineering pipelines supporting TensorFlow and PyTorch models, ensuring consistent feature computation across training, validation, and inference environments.

• Established comprehensive data quality frameworks using dbt tests, schema validation, referential integrity checks, and automated pipeline validation integrated into CI/CD workflows.

• Implemented monitoring, logging, and alerting across Airflow workflows and Spark jobs, improving pipeline observability and enabling faster incident detection and resolution.

• Mentored junior engineers and contributed to architectural decisions, promoting best practices in Spark optimization, data modeling, and scalable pipeline design.

Senior Data Engineer

WorkFusion – New York, NY, USA January 2016 – March 2021

• Designed and maintained scalable data pipelines using Python and SQL, integrating financial and operational data from multiple systems to support analytics, reporting, and automation workflows.

• Built reusable dimensional data models (star schema, fact and dimension tables, semantic layers), enabling consistent KPI definitions across business units and reducing metric inconsistencies by 35%.

• Developed complex SQL transformations using window functions, aggregations, cohort analysis, and performance tuning techniques to support large-scale reporting and analytical workloads.

• Optimized reporting queries and database performance through indexing strategies and query tuning, improving dashboard performance by 40% and reducing latency for enterprise reporting systems.

• Designed structured datasets and transformation logic to support business intelligence tools (Tableau, Power BI), enabling scalable and reliable reporting across multiple teams.

• Implemented data validation, reconciliation processes, and consistency checks across pipelines to ensure accuracy and reliability of business-critical metrics.

• Contributed to testing and release processes by validating data transformations and supporting regression testing for reporting logic updates.

• Provided technical mentorship on SQL optimization, data modeling fundamentals, and debugging production pipeline issues, improving engineering standards across the team.

Data Engineer

Webhead Technologies – San Antonio, TX, USA May 2011 – November 2015

• Developed and maintained ETL pipelines using Python and SQL to process structured and semi-structured data for internal analytics and reporting systems.

• Designed SQL-based reporting datasets and aggregation layers supporting operational dashboards and business intelligence workflows.

• Applied advanced SQL techniques, including joins, indexing, and query optimization, to improve reporting performance and system stability.

• Supported data validation and quality assurance processes to maintain data accuracy across business units. EDUCATION

• Bachelor of Science in Computer Science

University of Central Florida – Orlando, FL, USA August 2007 – May 2011

Contact this candidate