Principal Data Engineer Data Architect Big Data & Cloud
Specialist
*********@*****.***
Cleveland, Ohio 44102, United States
Innovative and results-driven Data Architect and Senior Data Engineer with 12+ years of experience in designing, building, and optimizing scalable, high-performance data platforms. Skilled in Big Data technologies, cloud solutions, and advanced ETL/ELT pipelines to drive business growth and operational efficiency. Expert in cloud-based data lakes, warehouses, real-time data integration, distributed systems, and automation, with strong focus on cost optimization and performance tuning. Proven leader in delivering AI-powered analytics solutions, enabling advanced business intelligence and decision-making. Experienced in data governance, security, and compliance (GDPR, HIPAA), with a commitment to building secure, scalable, and future-ready data architectures. Recognized for mentoring engineering teams and fostering a collaborative, high-performance work culture.
Skills
Programming & Scripting
Python, SQL, Java, Scala, Shell Scripting
ETL/ELT & Orchestration
Apache Airflow, dbt, AWS Glue, Informatica, NiFi,
Talend, Azure Data Factory, Dagster
Data Warehousing & Modeling
Monte Carlo, OpenLineage, Great Expectations,
Databand, Prometheus, Grafana
DevOps & Automation
Terraform, Kubernetes, Docker, CloudFormation,
CI/CD (Jenkins, GitHub Actions)
AI/ML Engineering:
MLOps, TensorFlow, Scikit-learn, Feature Engineering, Model Deployment
Data Observability & Monitoring
Ensures data reliability, pipeline health, and
operational insights
Streaming & Real-Time Processing
Apache Pulsar, Apache Flink,Kinesis Data,Confluent Kafka
Modern Metadata & Lineage
Amundsen, DataHub, Marquez,Trino /Presto
Cloud Platforms
AWS (S3, Redshift, Glue, Lambda, EMR), Google
Cloud (BigQuery, Dataflow), Azure (Synapse Analytics) Big Data Technologies
Apache Spark, Hadoop, Kafka, Flink, Druid, Presto
Business Intelligence
Tableau, Power BI, Looker, Apache Superset, Mode
Analytics
Data Governance & Security
GDPR, HIPAA, RBAC, Data Lineage, Data Masking,
Data Cataloging
FinOps & Cloud Cost Optimization
CloudZero, Finout, Kubecost, AWS Budgets
API Development & Data Integration
For building and integrating data services and
connecting systems efficiently
Advanced Data Pipelines & Workflow
Automation
Argo Workflows,Apache Beam,Luigi
Data Tools Ecosystem
dbt Cloud,Metabase, Lightdash,PostHog, ClickHouse Hasso Butt
Professional Experience
Census, Senior Data Architect
•Designed real-time, scalable data pipelines using Apache Spark, Kafka, and AWS Glue to power AI/ML-driven analytics platforms across healthcare and retail clients.
•Architected cost-efficient data lake and lakehouse solutions using Amazon S3, Redshift, and Delta Lake, improving performance and reducing storage costs by 30%.
2023 – Present
•Integrated Census for reverse ETL, enabling real-time sync of curated data from Snowflake to Salesforce and HubSpot, improving marketing and sales operations efficiency.
•Led cross-functional, cloud-first data engineering teams, implementing secure, modular infrastructure-as-code patterns on AWS, Azure, and GCP.
•Established company-wide data governance and compliance frameworks aligned with HIPAA, GDPR, and SOC2, including RBAC and PII masking policies.
•Built metadata-driven architecture using dbt, OpenLineage, and Amundsen, increasing traceability and auditability of production pipelines.
•Drove FinOps best practices using Kubecost, AWS Cost Explorer, and usage forecasting models, reducing data platform spend by over 20% while scaling capacity.
•Collaborated with DevOps to deploy secure secrets management using AWS KMS and Vault, automating key rotation and access policies Red Hawk Tech, Senior Data Engineer
•Built fault-tolerant pipelines for batch and real-time data using Spark, Kafka, and Kinesis.
•Optimized complex transformations in distributed data flows using PySpark and SQL. 2018 – 2023
•Refactored legacy ETL systems into Docker-based microservices.
•Automated data testing using Great Expectations and custom Python rules.
•Built Airflow DAGs with integrated notifications for SLA and anomaly alerts.
•Developed secure data exchange systems using SFTP, APIs, and change data capture.
•Implemented encryption, masking, and compliance workflows for regulated datasets.
•Engineered ML pipelines on SageMaker for batch inference and monitoring.
•Designed event-driven solutions using Lambda, SNS, and SQS for real-time needs.
•Migrated workloads from Hadoop to AWS/Snowflake, optimizing performance.
•Created metadata-aware orchestrators for schema evolution handling.
•Developed lineage visualizations and impact analysis for business users.
•Led cross-functional initiatives between engineering, product, and data science. Metaplane, Data Reliability Engineer
•Designed and implemented automated data quality checks using Great Expectations, improving anomaly detection and reducing data incidents by 40%.
•Deployed Metaplane’s own observability tools (internal R&D) to test lineage mapping, schema drift alerts, and freshness checks across Snowflake and dbt models. 2015 – 2017
•Integrated dbt Cloud, Monte Carlo, and OpenLineage into production pipelines for end-to-end lineage visibility and SLA enforcement.
•Developed custom metrics for data freshness, completeness, and accuracy using Python, SQL, and Prometheus, alerting through PagerDuty.
•Collaborated with engineering teams to enforce data contracts and versioning strategies across ELT jobs.
Pandata LLC, Data Engineer & Data Analyst
•Automated ETL workflows using Python, Airflow, and AWS Glue.
•Designed scalable data models for cloud data warehouses. 2013 – 2015
•Built real-time validation systems to ensure accuracy and consistency.
•Developed dashboards for operational reporting using SQL and BI tools.
•Integrated third-party APIs and systems for unified data pipelines.
•Created ad-hoc analytics workflows to support business operations.
•Conducted root-cause analysis for data anomalies and production issues.
•Built data marts for self-service reporting and team-wide access.
•Collaborated with stakeholders to define KPIs and reporting metrics.
•Documented data pipelines and definitions to improve transparency. Certifications
Microsoft Azure Data Fundamentals (DP-900)
Databricks Certified Data Engineer Associate
Databricks Lakehouse Fundamentals
Education
Bachelor of Science in Computer Science 2009 – 2012