AKHIL GADDAM
Detroit, MI +1-651-***-**** *****.**********@*****.***
PROFESSIONAL SUMMARY
Highly skilled Data Engineer with 5+ years of experience designing, building, and optimizing large-scale, cloud-native data pipelines and ETL workflows across healthcare and e-commerce sectors. Proven expertise in Python, SQL, Apache Spark, Airflow, Snowflake, Redshift, and Docker within Azure and AWS environments. Demonstrated success in driving cost reduction, real-time analytics, and regulatory compliance (HIPAA) through scalable data architecture and automation. Collaborative team player with a strong focus on data quality, performance tuning, and enabling data-driven business decisions through reliable, secure, and efficient engineering solutions. TECHNICAL SKILLS
Programming & Scripting: Python, SQL, PySpark, Bash, Oracle RDBMS, MSSQL Server, No SQL, MySQL, SQL Server, PostgreSQL
Big Data & Data Processing: Apache Spark, Apache Hive, Apache Kafka, Apache Sqoop, HDFS, Hadoop MapReduce Data Warehousing & Modeling: Snowflake, Amazon Redshift, AWS Glue, dbt, Dimensional Modeling (Star/Snowflake Schema), Data Vault
Python Packages: Pandas, NumPy, Matplotlib, SciPy, Scikit-Learn, SeaBorn, PyTorch Data Analytics Skills: Data Manipulation, Predictive Analysis, Data Cleaning, Data Mining, Data Visualization, Statistical Modelling, EDA, Statistics, Data Analysis, Hypothesis Testing, Data Extraction Cloud Platforms & Services: Amazon Web Services (AWS) – S3, EC2, EMR, Lambda, IAM, Lake Formation, Glue Data Catalog, Athena, PageMaker, Redshift, Azure (Databricks)
ETL & Orchestration: Apache Airflow, AWS Glue, Informatica, Custom Python ETL Frameworks Data Governance & Security: HIPAA Compliance, Data Masking, Encryption, Data Lineage, Audit Logging Data Components: HDFS, Kafka, PySpark Airflow, Kafka Snowflake, NIFI DevOps & CI/CD Tools: Linux/Unix, Jupyter Notebooks, Git, Agile/Scrum, Waterfall, Jira, Confluence, VS Code, Microsoft Excel (Pivot table, VLOOKUP), Microsoft office Suite, Looker, Microsoft PowerPoint, Visual Studio Data Tools, Azure SQL Database, ArcGIS, Alteryx, Talend, GitHub Actions, Docker, Terraform, Jenkins, Kubernetes Databases: MySQL, MSSQL, PostgreSQL, Oracle, NoSQL, AWS RDS Business Intelligence & Visualization: Tableau, Power BI, AWS QuickSight PROFESSIONAL EXPERIENCE
Data Engineer CVS Health, USA Feb 2024 – Present
• Developed and optimized scalable data pipelines handling over 500GB of clinical data daily, utilizing Apache Airflow, Spark, and Hadoop, resulting in a 40% reduction in data processing time and enhanced system reliability for critical hospital operations.
• Improved data storage solutions in SQL Server, Azure Data Lake, and Hadoop ecosystems, achieving a 25% reduction in storage costs and a 40% boost in data retrieval speeds for analytics.
• Led the migration of processes to cloud platforms such as Azure Data Factory and Azure Functions, increasing data processing capabilities by 50% and supporting real-time analytics dashboards for over 3,000 healthcare professionals.
• Automated compliance reporting for federal healthcare regulations by developing SQL scripts and Python applications, reducing manual effort by 70% and ensuring 100% reporting accuracy.
• Optimized Snowflake data warehouses, configured Docker containers with Azure Container Instances, and integrated data pipelines across Azure, GCP Storage, and cloud platforms, ensuring efficient ETL processes and scalable data management. Data Engineer Humana, USA July 2022 – Jan 2024
• Designed and implemented automated ETL pipelines using AWS (S3, Glue, Lambda) and Python (Pandas, PySpark) to process structured and unstructured healthcare data, improving operational efficiency by 40%.
• Built scalable data ingestion workflows to consolidate data from EHR systems, Facets, Medicare claims, and eligibility databases.
• Engineered data lakes and Redshift data warehouses to support downstream analytics and regulatory reporting.
• Collaborated with data science teams to deploy production-grade readmission risk models, ensuring robust data preprocessing and model integration using scikit-learn and SQL-based feature engineering.
• Optimized processing of over 100K+ monthly claims transactions to maintain compliance with HIPAA, Medicare, and Medicaid standards.
• Created and maintained CI/CD pipelines for data infrastructure deployment using Terraform, Airflow, and GitLab.
• Led initiatives to improve data reliability and monitoring by implementing alerting systems and automated data quality checks.
• Supported dashboarding infrastructure by building OLAP-ready data marts powering 15+ Power BI dashboards for business users.
Data Engineer Sigma Info Tech, India June 2018 – June 2021
• Designed and maintained distributed data pipelines using Apache Spark, Hive, and HDFS to process clickstream and order data from multiple client platforms, improving query performance by 50%.
• Built reusable ETL frameworks in Python and SQL to extract, transform, and load data into a centralized Redshift warehouse, enabling consistent reporting and dashboarding.
• Created incremental and full-load ingestion jobs using Apache Sqoop and Spark SQL to move data from MySQL to Hadoop and then to client-facing BI tools, reducing manual reporting tasks by 80%.
• Developed custom UDFs and transformations in PySpark to clean and enrich customer behaviour data for downstream ML models predicting customer churn and cross-sell likelihood.
• Implemented data partitioning, bucketing, and compression strategies to optimize Hive tables, reducing query latency and storage costs by over 30%.
• Collaborated with data analysts and product managers to define data quality metrics, building validation scripts that decreased broken pipelines by 25%.
• Assisted in containerizing batch ETL jobs using Docker for smoother deployment across dev and staging environments.
• Supported transition from on-prem Hadoop clusters to early-stage AWS S3 + EMR, setting up secure ingestion flows and IAM roles for staging client data.
• Documented all pipeline logic and metadata lineage to ensure audit readiness and facilitate knowledge transfer across teams. EDUCATION
Master’s of Science in Information Studies Sep 2021 – Dec 2022 Trine University, MI, USA
Bachelor’s in Electronics and Communication Engineering Aug 2015 – May 2019 Teegala Krishna Reddy Engineering College, India
CERTIFICATIONS
• AWS Certified Data Analytics
• Microsoft Azure Data Engineer Associate