Data Engineer Architect

Location:

Seattle, WA

Salary:

130000

Posted:

October 08, 2025

Contact this candidate

Resume:

Principal Data Engineer Data Architect Big Data & Cloud

Specialist

*********@*****.***

640-***-****

Cleveland, Ohio 44102, United States

Innovative and results-driven Data Architect and Senior Data Engineer with 12+ years of experience in designing, building, and optimizing scalable, high-performance data platforms. Skilled in Big Data technologies, cloud solutions, and advanced ETL/ELT pipelines to drive business growth and operational efficiency. Expert in cloud-based data lakes, warehouses, real-time data integration, distributed systems, and automation, with strong focus on cost optimization and performance tuning. Proven leader in delivering AI-powered analytics solutions, enabling advanced business intelligence and decision-making. Experienced in data governance, security, and compliance (GDPR, HIPAA), with a commitment to building secure, scalable, and future-ready data architectures. Recognized for mentoring engineering teams and fostering a collaborative, high-performance work culture.

Skills

Programming & Scripting

Python, SQL, Java, Scala, Shell Scripting

ETL/ELT & Orchestration

Apache Airflow, dbt, AWS Glue, Informatica, NiFi,

Talend, Azure Data Factory, Dagster

Data Warehousing & Modeling

Monte Carlo, OpenLineage, Great Expectations,

Databand, Prometheus, Grafana

DevOps & Automation

Terraform, Kubernetes, Docker, CloudFormation,

CI/CD (Jenkins, GitHub Actions)

AI/ML Engineering:

MLOps, TensorFlow, Scikit-learn, Feature Engineering, Model Deployment

Data Observability & Monitoring

Ensures data reliability, pipeline health, and

operational insights

Streaming & Real-Time Processing

Apache Pulsar, Apache Flink,Kinesis Data,Confluent Kafka

Modern Metadata & Lineage

Amundsen, DataHub, Marquez,Trino /Presto

Cloud Platforms

AWS (S3, Redshift, Glue, Lambda, EMR), Google

Cloud (BigQuery, Dataflow), Azure (Synapse Analytics) Big Data Technologies

Apache Spark, Hadoop, Kafka, Flink, Druid, Presto

Business Intelligence

Tableau, Power BI, Looker, Apache Superset, Mode

Analytics

Data Governance & Security

GDPR, HIPAA, RBAC, Data Lineage, Data Masking,

Data Cataloging

FinOps & Cloud Cost Optimization

CloudZero, Finout, Kubecost, AWS Budgets

API Development & Data Integration

For building and integrating data services and

connecting systems efficiently

Advanced Data Pipelines & Workflow

Automation

Argo Workflows,Apache Beam,Luigi

Data Tools Ecosystem

dbt Cloud,Metabase, Lightdash,PostHog, ClickHouse Hasso Butt

Professional Experience

Census, Senior Data Architect

•Designed real-time, scalable data pipelines using Apache Spark, Kafka, and AWS Glue to power AI/ML-driven analytics platforms across healthcare and retail clients.

•Architected cost-efficient data lake and lakehouse solutions using Amazon S3, Redshift, and Delta Lake, improving performance and reducing storage costs by 30%.

2023 – Present

•Integrated Census for reverse ETL, enabling real-time sync of curated data from Snowflake to Salesforce and HubSpot, improving marketing and sales operations efficiency.

•Led cross-functional, cloud-first data engineering teams, implementing secure, modular infrastructure-as-code patterns on AWS, Azure, and GCP.

•Established company-wide data governance and compliance frameworks aligned with HIPAA, GDPR, and SOC2, including RBAC and PII masking policies.

•Built metadata-driven architecture using dbt, OpenLineage, and Amundsen, increasing traceability and auditability of production pipelines.

•Drove FinOps best practices using Kubecost, AWS Cost Explorer, and usage forecasting models, reducing data platform spend by over 20% while scaling capacity.

•Collaborated with DevOps to deploy secure secrets management using AWS KMS and Vault, automating key rotation and access policies Red Hawk Tech, Senior Data Engineer

•Built fault-tolerant pipelines for batch and real-time data using Spark, Kafka, and Kinesis.

•Optimized complex transformations in distributed data flows using PySpark and SQL. 2018 – 2023

•Refactored legacy ETL systems into Docker-based microservices.

•Automated data testing using Great Expectations and custom Python rules.

•Built Airflow DAGs with integrated notifications for SLA and anomaly alerts.

•Developed secure data exchange systems using SFTP, APIs, and change data capture.

•Implemented encryption, masking, and compliance workflows for regulated datasets.

•Engineered ML pipelines on SageMaker for batch inference and monitoring.

•Designed event-driven solutions using Lambda, SNS, and SQS for real-time needs.

•Migrated workloads from Hadoop to AWS/Snowflake, optimizing performance.

•Created metadata-aware orchestrators for schema evolution handling.

•Developed lineage visualizations and impact analysis for business users.

•Led cross-functional initiatives between engineering, product, and data science. Metaplane, Data Reliability Engineer

•Designed and implemented automated data quality checks using Great Expectations, improving anomaly detection and reducing data incidents by 40%.

•Deployed Metaplane’s own observability tools (internal R&D) to test lineage mapping, schema drift alerts, and freshness checks across Snowflake and dbt models. 2015 – 2017

•Integrated dbt Cloud, Monte Carlo, and OpenLineage into production pipelines for end-to-end lineage visibility and SLA enforcement.

•Developed custom metrics for data freshness, completeness, and accuracy using Python, SQL, and Prometheus, alerting through PagerDuty.

•Collaborated with engineering teams to enforce data contracts and versioning strategies across ELT jobs.

Pandata LLC, Data Engineer & Data Analyst

•Automated ETL workflows using Python, Airflow, and AWS Glue.

•Designed scalable data models for cloud data warehouses. 2013 – 2015

•Built real-time validation systems to ensure accuracy and consistency.

•Developed dashboards for operational reporting using SQL and BI tools.

•Integrated third-party APIs and systems for unified data pipelines.

•Created ad-hoc analytics workflows to support business operations.

•Conducted root-cause analysis for data anomalies and production issues.

•Built data marts for self-service reporting and team-wide access.

•Collaborated with stakeholders to define KPIs and reporting metrics.

•Documented data pipelines and definitions to improve transparency. Certifications

Microsoft Azure Data Fundamentals (DP-900)

Databricks Certified Data Engineer Associate

Databricks Lakehouse Fundamentals

Education

Bachelor of Science in Computer Science 2009 – 2012

Contact this candidate