VAISHNAVI KESHETTY
978-***-**** *****************@*****.*** Open to Remote Relocation
PROFESSIONAL SUMMARY
Data Engineer with 5+ years architecting enterprise data platforms — designing ELT/ETL pipelines, Snowflake and PostgreSQL database systems, and cloud-native architectures on AWS, Azure, and GCP processing 10M+ events/day at 99.9% uptime. Expert in Python, dbt, Spark, and Kafka; delivered 40% performance gains, 60% anomaly reduction, and $2M+ revenue impact through rigorous data governance, validation practices, and automation workflows. Proven collaborator in Agile, cross-functional teams — translating complex business requirements into scalable, secure, and efficient enterprise data solutions. Strong track record in database performance tuning, root-cause analysis, mentoring junior engineers, and maintaining comprehensive documentation across all data systems. M.S. Computer Science, Rivier University (2024).
TECHNICAL SKILLS
Languages & Scripting
Python, SQL, PySpark, Scala, Java, JavaScript, Bash/Shell Scripting, R
Databases & Warehousing
Snowflake, PostgreSQL, Oracle PL/SQL, MySQL, MongoDB, DynamoDB, Redis, Amazon Redshift, Apache Cassandra
ELT/ETL & Orchestration
Apache Airflow, dbt, Apache Spark, Kafka, Flink, Spark Structured Streaming, Hive, Delta Lake, Hadoop, HDFS
Cloud Architecture
AWS (S3, Redshift, Glue, EMR, Lambda, Kinesis, Athena, EC2, SQS, SNS) · Azure (Synapse, ADF, Databricks, Event Hubs) · GCP (BigQuery, Dataflow, Pub/Sub, Dataproc)
Data Governance & Quality
Great Expectations, Apache Griffin, dbt tests, schema evolution, data contracts, validation practices, data quality monitoring, CloudWatch
DevOps & Automation
Jenkins, GitLab CI/CD, Docker, Kubernetes, Terraform, Git, SonarQube, automation workflows, blue-green deployments
Visualization & BI
Power BI, Tableau, Grafana, AWS QuickSight, Looker, Jupyter Notebooks
ML / AI Enablement
MLflow, feature stores, feature engineering, model serving pipelines, scikit-learn, pandas, NumPy
Compliance & Methodology
GDPR, SOC2, HIPAA, AES-256, TLS 1.2, IAM/KMS, access control management, audit logging · Agile, root-cause analysis
PROFESSIONAL EXPERIENCE
Data Engineer
Aug 2024 – Present
Fifth Third Bank
United States
•Architected Snowflake and PostgreSQL data platforms with Python, PySpark, and dbt ELT/ETL pipelines, processing 10 TB+ daily financial data for real-time fraud detection.
•Led organization-wide data governance and validation practices across 50+ critical datasets — implemented dbt + Great Expectations framework with 500+ automated validation rules, cutting data anomalies 60% and establishing enforceable data contracts that improved data reliability across all downstream consumers.
•Designed and orchestrated 200+ Airflow DAGs with complex dependency graphs and automation workflows for reliable, repeatable deployments — reduced manual interventions by 80% and achieved sub-hour data latency for mission-critical banking operations.
•Performed database performance tuning, access control management, and provisioning of AWS infrastructure (Glue, Lambda, EMR, IAM/KMS) ensuring GDPR/SOC2 compliance for sensitive financial data across $100M+ operations.
•Mentored junior data engineers through Agile code reviews, technical guidance, and sprint planning sessions; collaborated with cross-functional teams to gather requirements, estimate effort, and deliver data solutions aligned with business objectives.
•Delivered executive Power BI dashboards with drill-down analytics and predictive insights, enabling data-driven decisions for C-suite stakeholders; maintained comprehensive documentation of all data systems and processes.
Data Engineer Aug 2021 – Feb 2023
Capgemini India
•Designed and implemented scalable ELT/ETL pipelines and data models (Spark, Python, Scala) on enterprise cloud architecture processing 5M+ records/hour — achieved 40% performance gain through root-cause analysis of bottlenecks, partition optimization, caching strategies, and adaptive query execution.
•Built real-time data ingestion architecture (Kafka, Spark Structured Streaming, AWS Kinesis) with exactly-once semantics, enabling sub-minute data availability for operational anomaly detection dashboards and analytics.
•Implemented end-to-end automation workflows and CI/CD pipelines (Jenkins, GitLab, SonarQube) with automated testing, code quality checks, and blue-green deployments — reduced release cycles from 3 weeks to 2 days.
•Deployed HIPAA-compliant multi-cloud (AWS/Azure) solutions with AES-256 encryption at rest, TLS 1.2 in-transit, access control management, and comprehensive audit logging; created Grafana observability dashboards tracking pipeline health, data quality metrics, and SLA alerting.
Data Engineer May 2019 – Jul 2021
InMobi India
•Built enterprise-scale ELT/ETL pipelines processing 50M+ ad-tech events daily using Spark, Hive, Hadoop, and Kafka — supported real-time bidding (RTB) systems and campaign attribution models at 99.95% accuracy; reduced cloud spend 25% via spot-instance autoscaling and data lifecycle policies on AWS/GCP.
•Conducted root-cause analysis on SQL and Spark performance issues (predicate pushdown, broadcast joins, adaptive query execution) — delivered 50% compute-cost reduction and 3 query speedup; recommended and implemented improvements with full documentation for knowledge sharing.
•Automated orchestration workflows using Apache Airflow with dynamic DAG generation and backfill capabilities, eliminating 15+ hrs/week of manual operations; collaborated in Agile, cross-functional teams with data science and product analytics to drive $2M+ revenue impact through improved ad-targeting.
PROJECTS
Enterprise Fraud Detection Platform & Feature Store Fifth Third Bank
•Architected an end-to-end enterprise data platform (Python, dbt, Snowflake, PostgreSQL, MLflow) with a comprehensive data governance framework including data contracts, schema evolution standards, and 500+ automated validation rules — achieved 60% anomaly reduction and sub-second model serving latency across $100M+ daily transactions.
•Established access control policies, data quality monitoring, and documentation standards across 50+ critical datasets; led cross-functional adoption of governance practices in an Agile delivery model.
Real-Time Ad-Tech ELT Pipeline & Automation Workflows InMobi
•Designed and deployed scalable ELT pipelines (Python, SQL, Spark, Kafka) on cloud architecture processing 50M+ daily events with 99.95% attribution accuracy; performed root-cause analysis to achieve 50% compute reduction and 3 query speedup through targeted SQL and Spark optimizations.
•Built automation workflows using Apache Airflow with dynamic DAG orchestration — eliminated 15+ hrs/week of manual operations and drove $2M+ revenue impact through improved ad-targeting in collaboration with Agile, cross-functional teams.
Multi-Cloud Streaming Ingestion & Data Governance Architecture Capgemini
•Architected a real-time ingestion system (Kafka, Spark Structured Streaming, AWS Kinesis) with exactly-once semantics and HIPAA-compliant data governance — sub-minute data availability for anomaly detection; implemented access control management, audit logging, and encryption standards.
•Built CI/CD automation workflows (Jenkins, GitLab, SonarQube) with blue-green deployments; mentored junior engineers through structured code reviews and technical guidance — cut release cycles from 3 weeks to 2 days; maintained full documentation of all data systems.
EDUCATION & CERTIFICATIONS
Rivier University
Nashua, NH
Master of Science in Computer Science
May 2024
CMR Institute of Technology
Hyderabad, India
Bachelor of Technology in Electronics & Communication Engineering (ECE)
2021
Certifications
•Microsoft Certified: Fabric Data Engineer Associate
•AWS Certified Cloud Practitioner Essentials