PySpark, SQL, AWS, Kafka, Airflow, FinOps, MLOps

Location:

Irvine, CA

Posted:

April 05, 2025

Contact this candidate

Resume:

Jim Marczyk

Irvine, CA

***.*******@*****.*** 949-***-****

Summary

Results-driven Data Engineer with 7+ years of experience architecting scalable data solutions, optimizing cloud costs by up to 65%, and integrating machine learning models into business workflows. Expertise in AWS, Azure Databricks, PySpark, and FinOps strategies to enhance performance, reduce costs, and improve data reliability. Proven ability to build high-performance ETL pipelines, automate cloud infrastructure, and enhance data governance. Adept at collaborating with cross-functional teams to deliver cost-effective and scalable data solutions.

Professional Experience

AWS FinOps Engineer (Remote Contract)

Boston Scientific

Mar 2024 – Present

● Reduced cloud costs by 45% across four federated projects through FinOps strategies and resource rightsizing.

● Automated environment management using AWS Lambda and EventBridge, achieving 64% cost savings in Dev/Test.

● Enhanced PostgreSQL Materialized View performance by 40% via advanced query optimization. Data Engineer / ML Ops (Remote Contract)

Ontada

Oct 2023 – Dec 2023

● Built scalable PySpark pipelines in Azure Databricks, reducing processing time by 20%.

● Improved data quality checks, increasing accuracy by 15%.

● Automated CI/CD pipelines with GitLab Actions, reducing release cycles by 25%. Data Engineer (Remote Contract)

Zillow Group

Dec 2022 – Jun 2023

● Designed efficient SparkSQL business logic, improving workflow efficiency by 20%.

● Reduced error rates by 25% through improved data validation processes.

● Increased job reliability in Airflow by 10% via proactive monitoring. Data Engineer (Remote Contract)

Signify Health

Feb 2022 – Jun 2022

● Diagnosed and resolved pipeline issues, enhancing reporting accuracy by 30%.

● Optimized AWS Glue jobs, improving processing speed by 40%.

● Automated data refresh tasks in Airflow, reducing manual intervention by 50%. Data Engineer (Remote Contract)

UnitedHealth Group / Optum

Feb 2020 – Jan 2022

● Collaborated with Data Scientists to optimize ML models, increasing prediction accuracy by 20%.

● Streamlined data workflows, reducing processing times by 30%.

● Enhanced ML system performance by 25% through PySpark validation techniques.

● Integrated ML data with field systems using PySpark, Kafka and Avro encoding. Data Engineer (Onsite Contract)

AT&T / DirecTV Technology & Operations

Aug 2019 – Jan 2020

● Built Hadoop clusters with Yarn and PySpark, demonstrating the need for cloud platforms.

● Migrated workflows to Hadoop, improving scalability by 40%.

● Replaced static cron jobs with Airflow workflows, reducing scheduling errors by 30%. Data Engineer (Remote Contract)

Episource

Aug 2018 – May 2019

● Developed PySpark pipelines to evaluate physician care quality, cutting analysis time by 25%.

● Improved log analysis with S3 Select, reducing job runtime by 20%. Data Engineer / SDET (Onsite Startup)

Hart

Dec 2015 – Apr 2017

● Built a compliance reporting pipeline using SparkSQL on AWS EMR, increasing efficiency by 60%.

● Automated testing with Selenium/Python, reducing deployment errors by 42%. Education

● Master of Business Administration (MBA) – National University

● Bachelor of Electrical Engineering (BSEE) – Illinois Institute of Technology Technical Skills

Cloud Platforms: AWS (Glue, EMR, Redshift, Lambda, S3, Athena), Azure Databricks Big Data & Analytics: PySpark, SparkSQL, Hive, Kafka, Airflow Databases: PostgreSQL, SQL Server, Oracle

Programming & Automation: Python, Terraform, Selenium, CI/CD (GitHub Actions, GitLab CI/CD) Optimization & Cost Management: FinOps, AWS Cost Explorer, Reserved Instances, ETL Performance Tuning

Contact this candidate