AKHIL NAGULAPALLY
Southgate, MI (Open to Relocation) +1-313-***-**** **********@*****.***
PROFESSIONAL SUMMARY
Data Engineer / Data Pipeline Engineer with hands-on experience building ETL data pipelines and cloud-based data platforms in a U.S. healthcare environment. Skilled in SQL, Python, and PySpark for processing structured and streaming data using Apache Spark and Kafka, with experience in AWS (S3, Glue, EMR, Redshift). Developed data pipelines, data models, and optimized queries to improve data availability, reduce latency, and support business intelligence reporting. Familiar with Airflow, dbt, and Git for workflow orchestration and DataOps, with a strong foundation in data pipeline engineering and database systems, and supporting analytics engineering and database engineering use cases in cloud data platforms. TECHNICAL SKILLS
Programming & Data Processing: Python, SQL, PySpark (Apache Spark). Data Engineering & Big Data: ETL/ELT Pipelines, Data Pipeline Development, Distributed Data Processing. Streaming & Real-Time Data: Apache Kafka, Spark Structured Streaming. Cloud & Data Platforms: AWS (S3, EMR, Glue, Redshift, Lambda, Kinesis). Databases & Data Warehousing: Amazon Redshift, PostgreSQL, Snowflake, Dimensional Modeling (Star Schema). Orchestration, Transformation & DataOps: Apache Airflow, dbt, Data Validation, Pipeline Monitoring, CI/CD (Git). Analytics & Business Intelligence: Analytical SQL, Data Analysis, Business Intelligence. PROFESSIONAL EXPERIENCE
Data Engineering Intern Jun 2025 - Present
GE Healthcare Chicago, IL
• Built ETL data pipelines using Python, PySpark, and AWS Glue leveraging distributed data processing to ingest healthcare datasets into S3 and Redshift, improving data availability for business intelligence reporting by 25%.
• Developed batch and incremental data pipeline workflows using Apache Spark and Airflow, reducing manual data preparation effort by 28% for analytics teams.
• Designed streaming pipelines with Kafka and Spark Structured Streaming to process prescription data in near real time, improving alert accuracy by 20%.
• Optimized SQL queries and tuned database performance in Amazon Redshift and PostgreSQL, improving query efficiency by 35% for analytical workloads.
• Implemented dimensional data models using SQL and dbt in the data warehouse, improving reporting usability and reducing query latency by 22%.
• Collaborated with analytics engineers and business intelligence teams using Git-based version control and CI/CD workflows to validate datasets and resolve data quality issues, reducing reporting discrepancies by 18%. PROJECTS
E-Commerce Data Warehouse Optimization
• Redesigned ETL workflows using SQL and dbt on Snowflake to structure transactional data into dimensional models, reducing dashboard query latency by 28% for business intelligence reporting.
• Transformed large-scale order datasets using PySpark on AWS EMR to evaluate pipeline performance and improve data processing efficiency by 24% for analytical workloads.
• Applied data validation and transformation checks using Airflow and SQL to ensure dataset consistency, improving data reliability to 95% and reducing inconsistencies in reporting datasets. IoT Streaming Data Pipeline Analysis
• Engineered streaming pipelines using Kafka and Spark Structured Streaming to analyze high-frequency device telemetry, reducing end-to-end data latency by 26% for real-time monitoring.
• Modeled and stored processed data in PostgreSQL and Amazon Redshift using optimized schema design, improving query performance by 22% for analytics queries.
• Leveraged AWS services (S3, Lambda, Kinesis) to simulate scalable data ingestion workflows, improving pipeline stability and supporting continuous data processing.
EDUCATION
PhD in Information Science Jan 2026 - Dec 2030
Trine University Angola, IN
Master of Science in Information Science Jan 2022 - Dec 2022 Trine University Angola, IN
Bachelor of Technology in Electronics and Communications Engineering Aug 2016 - Nov 2020 Jawaharlal Nehru Technological University Hyderabad, India CERTIFICATIONS
• IBM Data Engineering Professional Certificate - Coursera
• Google Cloud Professional Data Engineer - Coursera
• Data Engineering, Big Data, and Machine Learning on GCP - Coursera
• Modern Data Engineering with Databricks - Coursera
• Data Warehousing for Business Intelligence - Coursera
• AWS Cloud Technical Essentials - Coursera