Likhith Gunda
•Overland Park,KS •***********@*****.*** • +1-913-***-****• LINKDEIN • Open to Relocate
SUMMARY
Data Engineer with 4+ years of experience designing, building, and optimizing scalable data platforms across AWS, Azure, and Snowflake. Skilled in streaming (Kafka, Kinesis), big data (Spark, PySpark, MapReduce), and cloud-native ETL (Glue, ADF, dbt). Experienced in data warehousing (Redshift, Snowflake), real-time analytics, and machine learning pipelines (NLP, predictive modeling). Adept at building Medallion architecture, enforcing data governance, and delivering BI/ML-ready datasets that drive enterprise decision-making
SKILLS
●Programming & Scripting: Python, Scala, SQL, R, Java, C#
●Big Data & Streaming: Kafka, Kinesis, Spark, MapReduce, PySpark
●Cloud Platforms: AWS (Glue, Redshift, S3, Lambda, DynamoDB, RDS, CloudWatch, Step Functions, QuickSight), Azure (ADF, Data Lake, Synapse, Databricks), GCP (BigQuery, Pub/Sub, Dataflow)
●Data Warehousing: Snowflake, Redshift, SQL Server, Teradata
●ETL & Data Integration: AWS Glue, dbt, DataBrew, Azure Data Factory, Informatica, Talend
●Infrastructure & DevOps: Terraform, Vault, GitOps, CI/CD, Docker, Kubernetes
●Data Governance & Security: HIPAA, HITRUST, SOX, Data Lineage, Metadata Management, IAM
●Machine Learning & Analytics: SageMaker, NLP, Computer Vision, Predictive Modeling
WORK EXPERIENCE
Data Engineer Nov 2024 – Present
Nanthealth,USA
●Designed and executed an enterprise-wide data strategy for ingesting and managing clinical, genomic, and claims data across 10+ business units using AWS Glue, Redshift, and S3, improving data accessibility and compliance (HIPAA, HITRUST).
●Engineered reusable transformation pipelines with dbt Cloud on Redshift, enabling standardized reporting and accelerating delivery of audit-ready datasets.
●Implemented automated IaC deployments with Terraform and Vault, reducing environment setup time by 40% and ensuring secure, reusable pipelines.
●Optimized Spark-based workflows on EMR to process multi-terabyte datasets, cutting job execution time by 25% and improving SLA adherence.
●Developed NLP-based machine learning models for automated claim categorization, increasing classification accuracy by 17% and reducing manual review workload.
●Designed Redshift + Glue pipelines with advanced SQL tuning, improving query response times by 30% and enabling faster insights for 10+ business units.
●Integrated AWS S3, Glue, and Redshift pipelines to support analytics and ML workflows, ensuring scalability and reliability across diverse workloads.
●Delivered self-service analytics dashboards in Amazon QuickSight, providing real-time insights for clinicians and executives, improving decision-making speed.
●Implemented end-to-end data lineage and monitoring with AWS Glue Data Catalog, CloudTrail, and CloudWatch, increasing auditability and pipeline reliability for regulated healthcare datasets.
●Partnered with data scientists to integrate computer vision models into analytics pipelines, enabling radiology image metadata analysis and improving diagnostic support insights.
Data Engineer Dec 2023 – Aug 2024
Verizon,USA
●Designed and managed AWS-based data lake architectures (S3, Glue, Redshift), streamlining ingestion, transformation, and governance of telecom subscriber, billing, and network data, reducing SLA breaches by 20%.
●Built CI/CD pipelines with Jenkins and Bitbucket for automated ETL testing and deployment, improving release reliability and reducing deployment errors by 35%.
●Managed Terraform-based multi-region AWS deployments with Vault secrets management, enhancing disaster recovery for critical billing systems and improving RPO by 50%.
●Wrote and optimized complex SQL in Redshift and Glue, reducing ETL runtimes by 25% and ensuring SOX-compliant data validation across finance and customer billing pipelines.
●Delivered QuickSight dashboards with automated reporting pipelines, increasing KPI visibility for telecom product launches, ARPU tracking, and churn analysis, reducing reporting cycles from days to hours.
●Transformed and validated financial data using AWS Glue and Redshift, embedding audit-compliant logic that improved reconciliation accuracy across telecom revenue streams and billing systems.
●Built SLA-compliance and pipeline monitoring dashboards in React integrated with REST APIs, enabling proactive issue detection for network data feeds and reducing downtime by 30%.
●Developed real-time operational KPI dashboards powered by Databricks pipelines, surfacing insights on customer usage, network performance, and service availability for faster executive decision-making.
Data Analyst Jun 2021 - Aug 2023
High Radius,Hyderabad, TG
●Fabricated a Data Warehouse for an E-fitness client with 100+ tables and composed 25+ ETL Pipelines using Azure Data Factory
from MongoDB, MySQL, App Store Connect API and WooCommerce to MSSQL Server.
●Played a role to create scalable, fault-tolerant data pipelines using Kafka and Amazon Kinesis for real-time data ingestion into
Redshift, achieving a 25% reduction in latency through parallel processing techniques.
●Built fault-tolerant real-time ingestion pipelines using Kafka + Amazon Kinesis into Redshift, reducing latency by 25%.
●Developed 25+ ETL pipelines with Azure Data Factory to centralize operational data into Snowflake.
●Optimized SQL models for reporting, improving marketing & sales analytics accuracy by 22%.
●Implemented monitoring and alerting for AWS pipelines using CloudWatch and custom dashboards, enabling proactive troubleshooting and SLA compliance.
EDUCATION
University of Central Missouri Master of Computer Science Aug 2023 - May 2025
Gitam University Bachelor of Electronics and Communication Jul 2019 - Apr 2023
Engineering