Andy Anwin
Senior Data Engineer
+1-860-***-**** ************@*****.*** Hartford, CT, USA
Summary
Results-oriented Data Engineer with over 14 years of experience in designing,
implementing, and optimizing scalable data infrastructures and ETL processes. Proven
track record in enhancing data flow efficiency and enabling real-time analytics across
distributed systems. Expertise in leveraging cloud data platforms and big data technologies,
including Apache Spark, Kafka, and Snowflake, to drive strategic data initiatives. Adept at
collaborating with cross-functional teams to deliver high-quality data solutions that meet
business objectives. Strong background in data governance, data quality assurance, and
performance tuning, ensuring reliable and accessible data for analytics. Committed to
continuous improvement and innovation in data engineering practices to support
organizational growth and decision-making.
Skills
Cloud & Infrastructure:
AWS (EC2, S3, Lambda, IAM, EMR, Redshift), Azure, GCP (BigQuery,
Dataflow), Cloudera, Kubernetes, Docker, Terraform (IaC)
Data Pipelines & Orchestration:
Apache Airflow, Azure Data Factory, Google Dataflow, Apache Beam,
ETL/ELT, dbt, REST APIs
Big Data & Distributed Processing:
Apache Spark (PySpark, Scala), Apache Kafka, Hadoop, Hive, EMR, Data Lakes,
Petabyte-scale data handling
Programming & Scripting:
Python, Scala, Java, SQL, HiveQL, Bash, Jupyter, NumPy, Pandas
Streaming & Real-time Systems:
Kafka, Spark Streaming, Event-driven architectures, Real-time ingestion and
processing
DevOps & CI/CD:
Git, GitHub, JIRA, Docker, Kubernetes, Terraform, CI/CD pipelines
Monitoring & Visualization:
Grafana, Prometheus, Custom dashboards, System & pipeline metrics
Security & Compliance:
HIPAA-compliant data handling, Access control (IAM), Secure data pipelines
Analytics & Modeling:
Google BigQuery, dbt, Redshift, Data modeling, Query optimization
Project Methodologies:
Agile, Scrum, Kanban
Experience
Senior Data Engineer
DataNova Solutions – Austin, TX
May 2024 – Present
Designed scalable data pipelines using Apache Airflow, dbt, and Snowflake for
client-facing analytics applications
Migrated legacy batch ETL systems to real-time processing using Kafka and
AWS Kinesis
Built reusable data quality monitoring components using Great Expectations and
custom Python scripts
Collaborated with analytics teams to optimize data models in Redshift and
improve query performance
Deployed containerized data services via Docker and orchestrated deployment
with GitLab CI/CD
Lead Data Engineer
ClearMetrics Inc. – Denver, CO
June 2020– April 2024
Developed and maintained ETL workflows using Python, SQL, and Amazon
Redshift
Built parameterized data pipelines in Apache Airflow for ingestion and
transformation tasks
Implemented star and snowflake schemas to support marketing and customer
intelligence analytics
Created data marts and curated datasets for BI teams using dbt and Looker
Automated data loads from REST APIs, S3 buckets, and on-prem databases
Monitored pipeline performance and managed failure recovery using AWS
CloudWatch
Data Engineer
Google – Mountain View, CA
March 2019 – May 2020
Led end-to-end development of high-throughput data ingestion pipelines using
Dataflow and BigQuery
Partnered with product teams to create scalable event tracking strategies for Ads
performance analytics
Integrated machine learning workflows with Vertex AI and orchestrated model
training pipelines
Designed and implemented secure data governance policies in compliance with
global privacy standards
Automated complex data workflows using Cloud Functions and Terraform for
GCP infrastructure
Delivered unified data views by building robust data marts for cross-functional
reporting
Big Data Engineer
ClearMetrics Inc. – Denver, CO
June 2015 – February 2019
Built ELT pipelines for customer usage data using Python, SQL, and Amazon
Redshift
Implemented SCD Type 2 data models to support historical reporting needs in
marketing dashboards
Developed and maintained Airflow DAGs for orchestration of multi-stage data
workflows
Collaborated with data scientists to engineer features for predictive churn models
Introduced version control and modular design patterns across data transformation
scripts
Integrated REST API data sources and scheduled ingestion jobs using Lambda
and CloudWatch
Junior Data Engineer
TechAxis Analytics – Chicago, IL
July 2011 – May 2015
Assisted in data warehouse development using SQL Server and SSIS for business
intelligence reporting
Supported daily ETL operations and performed root cause analysis on data load
failures
Designed initial schema models for transactional and aggregated data layers
Built stored procedures and automated data validations to ensure data accuracy
Participated in migration from on-prem SQL to AWS-based infrastructure using
EC2 and S3
Education
Bachelor of Science in Computer Science
University of Engineering and Technology