Data Engineer Machine Learning

Location:

Tampa, FL

Posted:

September 10, 2025

Contact this candidate

Resume:

Likhith Gunda

•Overland Park,KS •***********@*****.*** • +1-913-***-****• LINKDEIN • Open to Relocate

SUMMARY

Data Engineer with 4+ years of experience designing, building, and optimizing scalable data platforms across AWS, Azure, and Snowflake. Skilled in streaming (Kafka, Kinesis), big data (Spark, PySpark, MapReduce), and cloud-native ETL (Glue, ADF, dbt). Experienced in data warehousing (Redshift, Snowflake), real-time analytics, and machine learning pipelines (NLP, predictive modeling). Adept at building Medallion architecture, enforcing data governance, and delivering BI/ML-ready datasets that drive enterprise decision-making

SKILLS

●Programming & Scripting: Python, Scala, SQL, R, Java, C#

●Big Data & Streaming: Kafka, Kinesis, Spark, MapReduce, PySpark

●Cloud Platforms: AWS (Glue, Redshift, S3, Lambda, DynamoDB, RDS, CloudWatch, Step Functions, QuickSight), Azure (ADF, Data Lake, Synapse, Databricks), GCP (BigQuery, Pub/Sub, Dataflow)

●Data Warehousing: Snowflake, Redshift, SQL Server, Teradata

●ETL & Data Integration: AWS Glue, dbt, DataBrew, Azure Data Factory, Informatica, Talend

●Infrastructure & DevOps: Terraform, Vault, GitOps, CI/CD, Docker, Kubernetes

●Data Governance & Security: HIPAA, HITRUST, SOX, Data Lineage, Metadata Management, IAM

●Machine Learning & Analytics: SageMaker, NLP, Computer Vision, Predictive Modeling

WORK EXPERIENCE

Data Engineer Nov 2024 – Present

Nanthealth,USA

●Designed and executed an enterprise-wide data strategy for ingesting and managing clinical, genomic, and claims data across 10+ business units using AWS Glue, Redshift, and S3, improving data accessibility and compliance (HIPAA, HITRUST).

●Engineered reusable transformation pipelines with dbt Cloud on Redshift, enabling standardized reporting and accelerating delivery of audit-ready datasets.

●Implemented automated IaC deployments with Terraform and Vault, reducing environment setup time by 40% and ensuring secure, reusable pipelines.

●Optimized Spark-based workflows on EMR to process multi-terabyte datasets, cutting job execution time by 25% and improving SLA adherence.

●Developed NLP-based machine learning models for automated claim categorization, increasing classification accuracy by 17% and reducing manual review workload.

●Designed Redshift + Glue pipelines with advanced SQL tuning, improving query response times by 30% and enabling faster insights for 10+ business units.

●Integrated AWS S3, Glue, and Redshift pipelines to support analytics and ML workflows, ensuring scalability and reliability across diverse workloads.

●Delivered self-service analytics dashboards in Amazon QuickSight, providing real-time insights for clinicians and executives, improving decision-making speed.

●Implemented end-to-end data lineage and monitoring with AWS Glue Data Catalog, CloudTrail, and CloudWatch, increasing auditability and pipeline reliability for regulated healthcare datasets.

●Partnered with data scientists to integrate computer vision models into analytics pipelines, enabling radiology image metadata analysis and improving diagnostic support insights.

Data Engineer Dec 2023 – Aug 2024

Verizon,USA

●Designed and managed AWS-based data lake architectures (S3, Glue, Redshift), streamlining ingestion, transformation, and governance of telecom subscriber, billing, and network data, reducing SLA breaches by 20%.

●Built CI/CD pipelines with Jenkins and Bitbucket for automated ETL testing and deployment, improving release reliability and reducing deployment errors by 35%.

●Managed Terraform-based multi-region AWS deployments with Vault secrets management, enhancing disaster recovery for critical billing systems and improving RPO by 50%.

●Wrote and optimized complex SQL in Redshift and Glue, reducing ETL runtimes by 25% and ensuring SOX-compliant data validation across finance and customer billing pipelines.

●Delivered QuickSight dashboards with automated reporting pipelines, increasing KPI visibility for telecom product launches, ARPU tracking, and churn analysis, reducing reporting cycles from days to hours.

●Transformed and validated financial data using AWS Glue and Redshift, embedding audit-compliant logic that improved reconciliation accuracy across telecom revenue streams and billing systems.

●Built SLA-compliance and pipeline monitoring dashboards in React integrated with REST APIs, enabling proactive issue detection for network data feeds and reducing downtime by 30%.

●Developed real-time operational KPI dashboards powered by Databricks pipelines, surfacing insights on customer usage, network performance, and service availability for faster executive decision-making.

Data Analyst Jun 2021 - Aug 2023

High Radius,Hyderabad, TG

●Fabricated a Data Warehouse for an E-fitness client with 100+ tables and composed 25+ ETL Pipelines using Azure Data Factory

from MongoDB, MySQL, App Store Connect API and WooCommerce to MSSQL Server.

●Played a role to create scalable, fault-tolerant data pipelines using Kafka and Amazon Kinesis for real-time data ingestion into

Redshift, achieving a 25% reduction in latency through parallel processing techniques.

●Built fault-tolerant real-time ingestion pipelines using Kafka + Amazon Kinesis into Redshift, reducing latency by 25%.

●Developed 25+ ETL pipelines with Azure Data Factory to centralize operational data into Snowflake.

●Optimized SQL models for reporting, improving marketing & sales analytics accuracy by 22%.

●Implemented monitoring and alerting for AWS pipelines using CloudWatch and custom dashboards, enabling proactive troubleshooting and SLA compliance.

EDUCATION

University of Central Missouri Master of Computer Science Aug 2023 - May 2025

Gitam University Bachelor of Electronics and Communication Jul 2019 - Apr 2023

Engineering

Contact this candidate