VISHNUVARDHAN NAGUNURI
Data Engineer
+1-984-***-**** ************************@*****.*** LinkedIn GitHub SUMMARY
Data Engineer with 4+ years of experience designing, building, and optimizing large-scale data pipelines across cloud
(Azure, AWS) and on-prem environments. Skilled in integrating diverse data sources and developing batch and real- time streaming solutions using Databricks, Spark, Kafka, and Snowflake. Experienced in orchestrating Python- and SQL-based ETL workflows with Azure Data Factory and AWS Glue/Lambda, ensuring data quality, governance, and HIPAA/PII compliance. Adept at delivering scalable, high-performance solutions and collaborating in Agile teams to translate complex technical challenges into actionable business insights and measurable efficiency gains. TECHNICAL SKILLS
Programming & Scripting — Python (PySpark, Pandas, NumPy), SQL, PowerShell, Feature Engineering, Regression, Classification, Model Evaluation
Data Engineering & ETL Tools — Apache Spark, Databricks (Delta Lake), Kafka, Apache Airflow, Snowflake, DBT, Parquet, Avro, REST API Integration
Cloud Platforms — Azure (Data Factory, ADLS, Synapse, Monitor, Event Hubs), AWS (EC2, Glue, Lambda, S3, Redshift, Step Functions)
Data Modeling & Warehousing — Star Schema, Snowflake Schema, PostgreSQL, MySQL, SQL Server, NoSQL
(MongoDB)
DevOps & CI/CD — Azure DevOps, GitHub Actions, Jenkins, Docker, Kubernetes, Terraform (Infrastructure as Code) Data Engineering Competencies — Scalable Pipeline Design, Batch Processing, Performance Optimization, Data Quality Management & Monitoring, Data Governance (RBAC, Data Masking, HIPAA Compliance) Emerging & Advanced Skills — Real-Time Data Processing (Kafka Streams), Serverless Data Processing Other Tools — Excel (for validation/debugging)
PROFESSIONAL EXPERIENCE
Data Engineer, Humana Inc. Jul 2024 – Present USA
•Engineered ETL pipelines in Azure Data Factory and Databricks (PySpark, SQL, Delta Lake) to integrate 15+ healthcare sources (EHRs, claims, pharmacy feeds) into Azure Data Lake, processing 10M+ patient and claims records monthly and enabling faster reporting and predictive analytics for providers.
•Developed real-time streaming pipelines with Kafka and Azure Event Hubs to capture IoT vitals and live claims feeds, achieving <5s latency into Azure Synapse for timely clinical monitoring and proactive interventions.
•Optimized Spark SQL workloads using partition pruning, broadcast joins, and caching, improving multi-terabyte batch processing performance by 40% and enhancing data availability for analytics teams.
•Orchestrated Python-based ETL transformations and CI/CD deployments via Azure DevOps and GitHub Actions, reducing release cycles by 30% while maintaining consistent, reliable pipelines.
•Ensured HIPAA compliance by implementing schema enforcement, automated data validations, access control, and PII masking across structured and semi-structured healthcare datasets.
•Partnered with clinicians, compliance officers, and product managers in Agile sprints to deliver solutions aligned with evolving healthcare requirements and key reporting metrics. Data Engineer, Paytm Aug 2020 – Dec 2022 India
•Architected ETL pipelines in Databricks and AWS (S3, Glue, Redshift) to process 500K+ daily financial transactions from banking APIs and third-party gateways, enabling enterprise-wide payment analytics.
•Supported modernization of legacy batch workflows by migrating to real-time and incremental pipelines using Kafka, Spark, AWS Glue, and AWS Lambda, reducing fraud detection reporting latency from 24 hours to <15 minutes.
•Designed and implemented star and snowflake schema models to support payment analytics, customer behavior analysis, and merchant performance dashboards.
•Built curated feature datasets in Databricks to support downstream fraud detection and risk scoring models, enabling analytics teams to improve accuracy of fraud alerts.
•Automated data refresh workflows in AWS Glue and Spark, orchestrated with AWS Step Functions, cutting reporting lag from 6 hours to <30 minutes and ensuring dashboards and risk models reflected near-real-time activity.
•Strengthened pipeline reliability by implementing automated quality checks, robust exception logging, and continuous monitoring, proactively identifying and resolving data issues, which reduced recurring errors by 35% and improved overall system stability.
•Contributed to event-driven integrations and CI/CD workflows with Jenkins and GitHub Actions, collaborating with fraud analytics teams, API developers, and DevOps engineers.
•Recognized with a performance award for driving 20% faster fraud detection reporting and improving data platform scalability, directly supporting Paytm’s customer trust and compliance initiatives. Associate Data Engineer, Reliance Jio Mar 2019 – Aug 2020 India
•Assisted in building and maintaining telecom-grade data pipelines using Azure Data Factory (ADF) and Azure Data Lake Storage (ADLS) to ingest large volumes of CDRs and subscriber data, ensuring consistent availability for downstream reporting and analytics.
•Designed and configured data transformations, aggregations, and error-handling in ADF pipelines, delivering accurate and compliant reporting across 50+ telecom KPIs.
•Integrated multiple telecom data sources into a centralized analytics environment in Azure Synapse, enabling churn prediction and customer segmentation use cases for marketing teams.
•Delivered incremental refresh strategies in ADF that kept KPIs updated within 30 minutes of activity, supporting near real-time decision-making across 50+ network metrics.
•Improved pipeline efficiency by optimizing ADF mappings, partitioning strategies, and scheduling, reducing daily runtimes by 20% and improving data availability.
•Supported data governance initiatives by implementing RBAC policies and data masking techniques to safeguard sensitive subscriber and billing information.
EDUCATION
MS Computer Science, University of Massachusetts Jan 2023 – Dec 2024 MA, USA BTech. Electronics and Communication Engineering, SNIST Jun 2016 – May 2020 Hyderabad, India CERTIFICATIONS
Azure Data Fundamentals (DP-900)
Databricks Certified Data Engineer Associate
HackerRank SQL Intermediate Certification