Lead Data Engineer

Location:

New York City, NY

Posted:

April 29, 2026

Contact this candidate

Original resume on Jobvertise

Resume:

Rick Benson

Data Engineering Strategy Chief Architect, Data Engineering

rickbenson.code gmail.com New Jersey, US Github LinkedIn

Summary

Data Engineer with 9 years of experience building scalable data platforms, cloud-native analytics pipelines, and

real-time streaming systems across fintech, healthcare, retail, and logistics. Skilled in leading global teams and

delivering cost-efficient, high-performance, and compliant data ecosystems. Expertise in AWS, Azure, and GCP

with hands-on work in Apache Spark, Flink, Kafka, dbt, Delta Lake, and Iceberg. Strong in data warehousing, data

mesh, and automation. Focused on secure, reliable, high-quality solutions that drive business impact.

Professional Experience

Lead Data Engineer, Self Employed 04/2022 Present

Built and scaled real-time streaming pipelines on AWS and GCP with Kafka, Pub Sub,

Kinesis, Flink, Beam, Kafka Streams, and Spark Structured Streaming to support

fraud detection and personalization with 99.99% uptime.

Designed and deployed a data mesh spanning six domains with Avro, Protobuf,

Schema Registry, and Confluent Kafka, integrating observability with DataDog,

OpenTelemetry, Great Expectations, Soda Core, and Monte Carlo.

Created modular dbt frameworks in Python and SQL, integrated Polars for high-

performance transformations, and automated delivery with GitLab CI CD, GitHub

Actions, and Jenkins.

Orchestrated Spark, PySpark, and Flink workloads on Kubernetes using Karpenter,

Helm, and Argo Workflows, with autoscaling and cost-aware scheduling that

reduced compute by 38 percent.

Championed Iceberg, Delta Lake, and Hudi for lakehouse governance, schema

evolution, and ACID analytics on S3, GCS, and BigQuery.

Standardized Iac with Terraform, Pulumi, AWS CDK, Ansible, and Docker for Airflow,

Prefect, and dbt, adopted by 12+ teams.

Advanced monitoring and ML-driven anomaly detection deployed with Prometheus,

Grafana, and Data fold, ensuring 100 percent SLA compliance.

Mentored 25 engineers across four pods, leading code reviews, performance tuning,

and architectural planning for large-scale systems in Python, Scala, Java, and Go.

Senior Data Engineer, High Tech Labs 04/2020 04/2022

Built multi-cloud pipelines on AWS Glue, Lambda, Redshift, and S3, plus GCP

Dataflow and BigQuery, processing 1.2 TB daily.

Architected secure Delta Lake frameworks with schema enforcement, GDPR CCPA

data masking, tokenization, and audit-ready datasets.

Automated ETL pipelines with Glue and Python, integrating lineage tracking and

encryption at rest and in transit.

Provisioned infrastructure with AWS CDK and Terraform, improving speed of

deployments across environments.

Built and maintained 50+ validation suites in Great Expectations, Soda Core, and

Monte Carlo, reducing data quality incidents by 80 percent.

Introduced Unity Catalog and Lake Formation for fine-grained access control, IAM,

and governance.

Mentored junior engineers, led design sessions, and collaborated in Agile teams to

deliver secure and compliant analytics solutions.

Data Engineer, Expedia 08/2018 03/2020

Refactored and scaled ETL pipelines with Python, SQL, and Airflow to ingest

transactional and event-driven data into PostgreSQL and data warehouse systems.

Migrated workflows to cloud environments with EMR, Redshift, and Snowflake,

improving scalability and cost efficiency.

Integrated APIs, CSV, JSON, and XML data sources into centralized datasets with

Airflow, cron, and custom Python logic.

Delivered Tableau and Looker dashboards for KPIs like GMROI and sell-through, and

implemented role-based access control, PCI DSS compliance, and encryption

standards.

Optimized SQL and Polars-based transformations to accelerate visualization by 60

percent, while strengthening governance with audit logging and Ranger policies

Junior Data Engineer, Confluent 02/2016 07/2018

Designed and implemented custom data integration solutions using Python and

SQL, reducing processing time and significantly improving data pipeline efficiency.

Supported daily data pipeline operations with cron jobs and early Airflow

deployments, moving toward cloud adoption.

Developed secure ingestion from APIs and FTP sources into central reporting layers,

using MongoDB and Elasticsearch for fast lookups.

Tuned SQL queries with indexes and joins, cutting dashboard latency by 60 percent,

and introduced access control, encryption, and audit logging.

Collaborated with senior engineers to introduce containerization with Docker and

automate delivery pipelines with Jenkins and GitHub Actions.

Education

Bachelor s Degree, Stockton University

Contact this candidate