Data Engineer Engineering

Location:

Plano, TX

Salary:

80000

Posted:

October 15, 2025

Contact this candidate

Resume:

SAI KEERTHI VADNALA

Cincinnati, Ohio ***********@*****.*** 513-***-**** LinkedIn

PROFESSIONAL SUMMARY

Results-oriented Data Engineer with 4+ years of experience in architecting and maintaining cloud-native, scalable data platforms across AWS and Azure. Expertise in end-to-end pipeline design (ingestion, transformation, orchestration, monitoring), integrating ML workflows, and ensuring data reliability at scale. Proficient in tools such as Snowflake, Databricks, Airflow, dbt, Jenkins, CI/CD, and data modeling techniques. Delivered high-impact solutions in dynamic environments

(Comcast, JPMorgan Chase, Accenture, Cognizant). Strong academic grounding in Big Data and AI/ML with a 4.0 GPA. TECHNICAL SKILLS

Languages: Python, SQL, Scala, Java, Bash

Cloud Platforms: AWS (S3, Glue, Lambda, Redshift, EMR), Azure (Data Factory, Synapse, Blob, Azure ML), GCP (BigQuery, Dataflow)

Data Engineering: ETL, ELT, Data Lakes, Data Warehousing, Stream Processing, Kafka, dbt, Feature Stores Orchestration & DevOps: Airflow, Jenkins, Git, Docker, Terraform, Kubernetes Databases: Snowflake, SQL Server, PostgreSQL, MySQL, MongoDB, NoSQL AI/ML: PyTorch, TensorFlow, Scikit-learn, XGBoost, MLflow, ONNX, SageMaker, Azure ML Studio BI & Reporting: Power BI, Tableau, Excel (Macros, VBA, Power Query, VLOOKUP) Other Tools: VS Code, Jupyter, Postman, Swagger, REST APIs, FastAPI, Flask, PySpark PROFESSIONAL EXPERIENCE

Senior Data Engineer

Comcast, Remote Jan 2025 – Present

• Architected and deployed AWS-based ETL pipelines using S3, Glue, and Lambda to ingest 15M+ customer interaction records daily with latency <200ms.

• Developed distributed Kafka + Spark Streaming pipelines for real-time behavioral segmentation of 250K+ users/day.

• Integrated production ML scoring via SageMaker endpoints, accelerating fraud detection inference by 28%.

• Built 50+ resilient automated Airflow DAGs with SLAs, retries, and alerting using CloudWatch,achieving 99.7% uptime.

• Designed Data Quality checks using great expectations for schema, null, and duplication validation with 95% accuracy.

• Engineered Redshift and Snowflake schemas to support marketing and sales dashboards, improving load speed 40%.

• Automated feature versioning with Delta Lake + MLflow, supporting batch/stream pipelines across 4 models. Data Engineer Intern

JPMorgan Chase & Co., Columbus, OH May 2024 – Aug 2024

• Orchestrated 20+ Azure Data Factory (ADF) pipelines to process 10M+ financial transactions daily, consistently meeting SLA requirements with 99% on-time delivery.

• Designed and deployed Synapse semantic models for AML dashboards integrating 50+ data sources, reducing report generation time by 35%.

• Streamlined reusable ADF templates and parameterized components, accelerating ETL development for fraud-risk pipelines by 40%.

• Integrated Azure ML models into batch ingestion workflows via SDKs, enabling real-time scoring on 8M+ transactions monthly and improving model inference time by 25%.

• Collaborated with data science to standardize features for fraud, churn, and credit models, eliminating 30% pipeline duplication.

Data Engineer

Accenture, Hyderabad, India Jan 2021 – Jul 2023

• Migrated 80+ legacy ETL jobs to Azure Data Factory and Snowflake, enabling processing of 500M+ records/month with 35% fewer failures.

• Developed Python utilities for data cleansing (nulls, duplicates, schema fixes), reducing ingestion failures by 45%.

• Orchestrated and scheduled ML scoring jobs using Airflow and Jenkins, reducing model deployment time from 4 hours to under 45 minutes.

• Developed validation scripts for 200GB+ datasets to detect integrity issues, preventing downstream data failures and saving 20+ hours/week in manual debugging—resulting in annual cost savings of approximately $85,000.

• Built and maintained Power BI dashboards to visualize marketing KPIs, campaign performance, and channel metrics, used by 100+ stakeholders for daily decision-making. Data Engineer

Cognizant, Hyderabad, India Apr 2019 – Dec 2020

• Designed and maintained scalable ETL pipelines using Python and SQL to ingest, clean, and transform over 1.2M+ rows of sales and customer data monthly across distributed systems.

• Engineered SQL-based data marts and reporting layers to support Power BI dashboards, enhancing data accessibility and improving business reporting accuracy by 30%.

• Automated Excel-based reporting workflows using VBA and Power Query, reducing manual processing time by 40% and increasing delivery consistency.

• Developed Python scripts for data validation and anomaly detection (schema mismatches, nulls, outliers), achieving 92% detection accuracy and reducing QA turnaround by 38%.

• Collaborated with cross-functional teams to design data flow diagrams and define data contracts, ensuring alignment with analytics and downstream reporting needs.

• Optimized legacy data pipelines by implementing indexing strategies, reducing query execution times by 45% and improving end-user performance.

• Implemented version control, documentation, and reusable pipeline templates in Git, improving team onboarding efficiency and reducing code duplication, cutting onboarding time by 50% EDUCATION

University of Cincinnati, CECH

Master of Science in Information Technology

Aug 2023 – Dec 2024 • GPA: 4.0/4.0

Relevant Coursework: Big Data, Cloud Computing, Machine Learning, Statistical Data Analysis CERTIFICATIONS

• Microsoft Certified: Azure Data Engineer Associate (DP-203)

• Power BI Data Analyst Associate (PL-300)

• Microsoft Azure Fundamentals (AZ-900)

• AWS Certified Solutions Architect – Associate (In Progress) PROJECTS (Industry-Aligned)

End-to-End Customer Analytics Pipeline (Cloud, Airflow, Snowflake, dbt)

• Designed and implemented full-stack data pipeline spanning ingestion (S3, Kafka), orchestration (Airflow, dbt), warehousing (Snowflake), and visualization (Power BI).

• Used Airflow to orchestrate 40+ DAGs with built-in retries, SLAs, and Slack alerts; executed 5K+ jobs monthly.

• Developed 12+ dbt models with incremental strategies and optimized partitioning; reduced computing costs by 30%.

• Integrated ML scoring pipeline using SageMaker + Lambda, improving campaign ROI tracking by 18%. Real-Time Feature Store (Databricks, Delta Lake, MLflow)

• Implemented a scalable real-time feature pipeline using Kafka, Delta Lake, and Databricks Jobs for fraud models.

• Enabled seamless automated feature versioning via MLflow and CI/CD pipelines (GitHub Actions + Terraform).

• Reduced feature computation latency from 4 hours to under 15 minutes and achieved 22% faster model retraining cycles

Contact this candidate