Post Job Free
Sign in

Senior Data Engineer ETL/ELT, Cloud, Snowflake, Python

Location:
Boston, MA
Salary:
80000$
Posted:
March 25, 2026

Contact this candidate

Resume:

VAISHNAVI KESHETTY

978-***-**** *****************@*****.*** Open to Remote Relocation

PROFESSIONAL SUMMARY

Data Engineer with 5+ years architecting enterprise data platforms — designing ELT/ETL pipelines, Snowflake and PostgreSQL database systems, and cloud-native architectures on AWS, Azure, and GCP processing 10M+ events/day at 99.9% uptime. Expert in Python, dbt, Spark, and Kafka; delivered 40% performance gains, 60% anomaly reduction, and $2M+ revenue impact through rigorous data governance, validation practices, and automation workflows. Proven collaborator in Agile, cross-functional teams — translating complex business requirements into scalable, secure, and efficient enterprise data solutions. Strong track record in database performance tuning, root-cause analysis, mentoring junior engineers, and maintaining comprehensive documentation across all data systems. M.S. Computer Science, Rivier University (2024).

TECHNICAL SKILLS

Languages & Scripting

Python, SQL, PySpark, Scala, Java, JavaScript, Bash/Shell Scripting, R

Databases & Warehousing

Snowflake, PostgreSQL, Oracle PL/SQL, MySQL, MongoDB, DynamoDB, Redis, Amazon Redshift, Apache Cassandra

ELT/ETL & Orchestration

Apache Airflow, dbt, Apache Spark, Kafka, Flink, Spark Structured Streaming, Hive, Delta Lake, Hadoop, HDFS

Cloud Architecture

AWS (S3, Redshift, Glue, EMR, Lambda, Kinesis, Athena, EC2, SQS, SNS) · Azure (Synapse, ADF, Databricks, Event Hubs) · GCP (BigQuery, Dataflow, Pub/Sub, Dataproc)

Data Governance & Quality

Great Expectations, Apache Griffin, dbt tests, schema evolution, data contracts, validation practices, data quality monitoring, CloudWatch

DevOps & Automation

Jenkins, GitLab CI/CD, Docker, Kubernetes, Terraform, Git, SonarQube, automation workflows, blue-green deployments

Visualization & BI

Power BI, Tableau, Grafana, AWS QuickSight, Looker, Jupyter Notebooks

ML / AI Enablement

MLflow, feature stores, feature engineering, model serving pipelines, scikit-learn, pandas, NumPy

Compliance & Methodology

GDPR, SOC2, HIPAA, AES-256, TLS 1.2, IAM/KMS, access control management, audit logging · Agile, root-cause analysis

PROFESSIONAL EXPERIENCE

Data Engineer

Aug 2024 – Present

Fifth Third Bank

United States

•Architected Snowflake and PostgreSQL data platforms with Python, PySpark, and dbt ELT/ETL pipelines, processing 10 TB+ daily financial data for real-time fraud detection.

•Led organization-wide data governance and validation practices across 50+ critical datasets — implemented dbt + Great Expectations framework with 500+ automated validation rules, cutting data anomalies 60% and establishing enforceable data contracts that improved data reliability across all downstream consumers.

•Designed and orchestrated 200+ Airflow DAGs with complex dependency graphs and automation workflows for reliable, repeatable deployments — reduced manual interventions by 80% and achieved sub-hour data latency for mission-critical banking operations.

•Performed database performance tuning, access control management, and provisioning of AWS infrastructure (Glue, Lambda, EMR, IAM/KMS) ensuring GDPR/SOC2 compliance for sensitive financial data across $100M+ operations.

•Mentored junior data engineers through Agile code reviews, technical guidance, and sprint planning sessions; collaborated with cross-functional teams to gather requirements, estimate effort, and deliver data solutions aligned with business objectives.

•Delivered executive Power BI dashboards with drill-down analytics and predictive insights, enabling data-driven decisions for C-suite stakeholders; maintained comprehensive documentation of all data systems and processes.

Data Engineer Aug 2021 – Feb 2023

Capgemini India

•Designed and implemented scalable ELT/ETL pipelines and data models (Spark, Python, Scala) on enterprise cloud architecture processing 5M+ records/hour — achieved 40% performance gain through root-cause analysis of bottlenecks, partition optimization, caching strategies, and adaptive query execution.

•Built real-time data ingestion architecture (Kafka, Spark Structured Streaming, AWS Kinesis) with exactly-once semantics, enabling sub-minute data availability for operational anomaly detection dashboards and analytics.

•Implemented end-to-end automation workflows and CI/CD pipelines (Jenkins, GitLab, SonarQube) with automated testing, code quality checks, and blue-green deployments — reduced release cycles from 3 weeks to 2 days.

•Deployed HIPAA-compliant multi-cloud (AWS/Azure) solutions with AES-256 encryption at rest, TLS 1.2 in-transit, access control management, and comprehensive audit logging; created Grafana observability dashboards tracking pipeline health, data quality metrics, and SLA alerting.

Data Engineer May 2019 – Jul 2021

InMobi India

•Built enterprise-scale ELT/ETL pipelines processing 50M+ ad-tech events daily using Spark, Hive, Hadoop, and Kafka — supported real-time bidding (RTB) systems and campaign attribution models at 99.95% accuracy; reduced cloud spend 25% via spot-instance autoscaling and data lifecycle policies on AWS/GCP.

•Conducted root-cause analysis on SQL and Spark performance issues (predicate pushdown, broadcast joins, adaptive query execution) — delivered 50% compute-cost reduction and 3 query speedup; recommended and implemented improvements with full documentation for knowledge sharing.

•Automated orchestration workflows using Apache Airflow with dynamic DAG generation and backfill capabilities, eliminating 15+ hrs/week of manual operations; collaborated in Agile, cross-functional teams with data science and product analytics to drive $2M+ revenue impact through improved ad-targeting.

PROJECTS

Enterprise Fraud Detection Platform & Feature Store Fifth Third Bank

•Architected an end-to-end enterprise data platform (Python, dbt, Snowflake, PostgreSQL, MLflow) with a comprehensive data governance framework including data contracts, schema evolution standards, and 500+ automated validation rules — achieved 60% anomaly reduction and sub-second model serving latency across $100M+ daily transactions.

•Established access control policies, data quality monitoring, and documentation standards across 50+ critical datasets; led cross-functional adoption of governance practices in an Agile delivery model.

Real-Time Ad-Tech ELT Pipeline & Automation Workflows InMobi

•Designed and deployed scalable ELT pipelines (Python, SQL, Spark, Kafka) on cloud architecture processing 50M+ daily events with 99.95% attribution accuracy; performed root-cause analysis to achieve 50% compute reduction and 3 query speedup through targeted SQL and Spark optimizations.

•Built automation workflows using Apache Airflow with dynamic DAG orchestration — eliminated 15+ hrs/week of manual operations and drove $2M+ revenue impact through improved ad-targeting in collaboration with Agile, cross-functional teams.

Multi-Cloud Streaming Ingestion & Data Governance Architecture Capgemini

•Architected a real-time ingestion system (Kafka, Spark Structured Streaming, AWS Kinesis) with exactly-once semantics and HIPAA-compliant data governance — sub-minute data availability for anomaly detection; implemented access control management, audit logging, and encryption standards.

•Built CI/CD automation workflows (Jenkins, GitLab, SonarQube) with blue-green deployments; mentored junior engineers through structured code reviews and technical guidance — cut release cycles from 3 weeks to 2 days; maintained full documentation of all data systems.

EDUCATION & CERTIFICATIONS

Rivier University

Nashua, NH

Master of Science in Computer Science

May 2024

CMR Institute of Technology

Hyderabad, India

Bachelor of Technology in Electronics & Communication Engineering (ECE)

2021

Certifications

•Microsoft Certified: Fabric Data Engineer Associate

•AWS Certified Cloud Practitioner Essentials



Contact this candidate