SAI KEERTHI VADNALA
Cincinnati, Ohio ***********@*****.*** 513-***-**** LinkedIn
PROFESSIONAL SUMMARY
Results-oriented Data Engineer with 4+ years of experience in architecting and maintaining cloud-native, scalable data platforms across AWS and Azure. Expertise in end-to-end pipeline design (ingestion, transformation, orchestration, monitoring), integrating ML workflows, and ensuring data reliability at scale. Proficient in tools such as Snowflake, Databricks, Airflow, dbt, Jenkins, CI/CD, and data modeling techniques. Delivered high-impact solutions in dynamic environments
(Comcast, JPMorgan Chase, Accenture, Cognizant). Strong academic grounding in Big Data and AI/ML with a 4.0 GPA. TECHNICAL SKILLS
Languages: Python, SQL, Scala, Java, Bash
Cloud Platforms: AWS (S3, Glue, Lambda, Redshift, EMR), Azure (Data Factory, Synapse, Blob, Azure ML), GCP (BigQuery, Dataflow)
Data Engineering: ETL, ELT, Data Lakes, Data Warehousing, Stream Processing, Kafka, dbt, Feature Stores Orchestration & DevOps: Airflow, Jenkins, Git, Docker, Terraform, Kubernetes Databases: Snowflake, SQL Server, PostgreSQL, MySQL, MongoDB, NoSQL AI/ML: PyTorch, TensorFlow, Scikit-learn, XGBoost, MLflow, ONNX, SageMaker, Azure ML Studio BI & Reporting: Power BI, Tableau, Excel (Macros, VBA, Power Query, VLOOKUP) Other Tools: VS Code, Jupyter, Postman, Swagger, REST APIs, FastAPI, Flask, PySpark PROFESSIONAL EXPERIENCE
Senior Data Engineer
Comcast, Remote Jan 2025 – Present
• Architected and deployed AWS-based ETL pipelines using S3, Glue, and Lambda to ingest 15M+ customer interaction records daily with latency <200ms.
• Developed distributed Kafka + Spark Streaming pipelines for real-time behavioral segmentation of 250K+ users/day.
• Integrated production ML scoring via SageMaker endpoints, accelerating fraud detection inference by 28%.
• Built 50+ resilient automated Airflow DAGs with SLAs, retries, and alerting using CloudWatch,achieving 99.7% uptime.
• Designed Data Quality checks using great expectations for schema, null, and duplication validation with 95% accuracy.
• Engineered Redshift and Snowflake schemas to support marketing and sales dashboards, improving load speed 40%.
• Automated feature versioning with Delta Lake + MLflow, supporting batch/stream pipelines across 4 models. Data Engineer Intern
JPMorgan Chase & Co., Columbus, OH May 2024 – Aug 2024
• Orchestrated 20+ Azure Data Factory (ADF) pipelines to process 10M+ financial transactions daily, consistently meeting SLA requirements with 99% on-time delivery.
• Designed and deployed Synapse semantic models for AML dashboards integrating 50+ data sources, reducing report generation time by 35%.
• Streamlined reusable ADF templates and parameterized components, accelerating ETL development for fraud-risk pipelines by 40%.
• Integrated Azure ML models into batch ingestion workflows via SDKs, enabling real-time scoring on 8M+ transactions monthly and improving model inference time by 25%.
• Collaborated with data science to standardize features for fraud, churn, and credit models, eliminating 30% pipeline duplication.
Data Engineer
Accenture, Hyderabad, India Jan 2021 – Jul 2023
• Migrated 80+ legacy ETL jobs to Azure Data Factory and Snowflake, enabling processing of 500M+ records/month with 35% fewer failures.
• Developed Python utilities for data cleansing (nulls, duplicates, schema fixes), reducing ingestion failures by 45%.
• Orchestrated and scheduled ML scoring jobs using Airflow and Jenkins, reducing model deployment time from 4 hours to under 45 minutes.
• Developed validation scripts for 200GB+ datasets to detect integrity issues, preventing downstream data failures and saving 20+ hours/week in manual debugging—resulting in annual cost savings of approximately $85,000.
• Built and maintained Power BI dashboards to visualize marketing KPIs, campaign performance, and channel metrics, used by 100+ stakeholders for daily decision-making. Data Engineer
Cognizant, Hyderabad, India Apr 2019 – Dec 2020
• Designed and maintained scalable ETL pipelines using Python and SQL to ingest, clean, and transform over 1.2M+ rows of sales and customer data monthly across distributed systems.
• Engineered SQL-based data marts and reporting layers to support Power BI dashboards, enhancing data accessibility and improving business reporting accuracy by 30%.
• Automated Excel-based reporting workflows using VBA and Power Query, reducing manual processing time by 40% and increasing delivery consistency.
• Developed Python scripts for data validation and anomaly detection (schema mismatches, nulls, outliers), achieving 92% detection accuracy and reducing QA turnaround by 38%.
• Collaborated with cross-functional teams to design data flow diagrams and define data contracts, ensuring alignment with analytics and downstream reporting needs.
• Optimized legacy data pipelines by implementing indexing strategies, reducing query execution times by 45% and improving end-user performance.
• Implemented version control, documentation, and reusable pipeline templates in Git, improving team onboarding efficiency and reducing code duplication, cutting onboarding time by 50% EDUCATION
University of Cincinnati, CECH
Master of Science in Information Technology
Aug 2023 – Dec 2024 • GPA: 4.0/4.0
Relevant Coursework: Big Data, Cloud Computing, Machine Learning, Statistical Data Analysis CERTIFICATIONS
• Microsoft Certified: Azure Data Engineer Associate (DP-203)
• Power BI Data Analyst Associate (PL-300)
• Microsoft Azure Fundamentals (AZ-900)
• AWS Certified Solutions Architect – Associate (In Progress) PROJECTS (Industry-Aligned)
End-to-End Customer Analytics Pipeline (Cloud, Airflow, Snowflake, dbt)
• Designed and implemented full-stack data pipeline spanning ingestion (S3, Kafka), orchestration (Airflow, dbt), warehousing (Snowflake), and visualization (Power BI).
• Used Airflow to orchestrate 40+ DAGs with built-in retries, SLAs, and Slack alerts; executed 5K+ jobs monthly.
• Developed 12+ dbt models with incremental strategies and optimized partitioning; reduced computing costs by 30%.
• Integrated ML scoring pipeline using SageMaker + Lambda, improving campaign ROI tracking by 18%. Real-Time Feature Store (Databricks, Delta Lake, MLflow)
• Implemented a scalable real-time feature pipeline using Kafka, Delta Lake, and Databricks Jobs for fraud models.
• Enabled seamless automated feature versioning via MLflow and CI/CD pipelines (GitHub Actions + Terraform).
• Reduced feature computation latency from 4 hours to under 15 minutes and achieved 22% faster model retraining cycles