Cloud Data Engineer - PySpark, AWS Glue, CI/CD expert

Location:

Chicago, IL

Posted:

April 30, 2026

Contact this candidate

Resume:

SHANTHI KUMAR

Bloomington, IL 309-***-**** ******************@*****.*** LinkedIn

PROFILE SUMMARY

Python Engineer with 6 years of experience designing and implementing cloud-based data migration and processing solutions. Expertise in Python and PySpark on AWS Glue and Spark to build scalable ETL pipelines and layered data architectures. Proven ability to automate deployments with CI/CD and ensure data quality for downstream analytics. PROFESSIONAL EXPERIENCE

UNITEDHEALTH GROUP AWS Data Engineer Dec 2025 - Present

• Built distributed PySpark pipelines on AWS Glue using python for content migration of healthcare claims data, handling ingestion, transformation, and structured data delivery for downstream analytics

• Managed large-scale data processing workloads across multiple datasets in AWS, ensuring reliable ingestion and transformation for downstream analytics systems

• Implemented audit and logging mechanisms to track pipeline execution, data loads from S3, and processing status, improving traceability and debugging

• Designed layered data architecture (raw, processed, curated) to standardize data flow and support reliable downstream analytics

• Developed reusable ingestion frameworks to standardize data intake, transformation, and loading across multiple source systems

• Built and maintained CI/CD pipelines using Jenkins and python scripts to automate AWS Glue data pipeline deployments, reducing manual release effort

• Implemented data validation and quality checks within pipelines to detect schema issues, missing data, and inconsistencies before downstream processing

• Performed data reconciliation and validation across pipelines to ensure consistency and accuracy of processed datasets

• Designed batch and near real-time data processing workflows to support downstream reporting and operational analytics

• Stabilized production pipelines by implementing retry handling, failure recovery, and execution tracking to minimize data processing disruptions

• Optimized AWS Glue Spark jobs by tuning partitions, memory usage, and execution strategies to improve processing performance

• Processed large-scale datasets across multiple pipelines to support high-volume data processing and analytics workloads

• Collaborated with analytics and business teams to deliver clean, structured datasets for reporting and operational use cases ALBERTSONS COMPANIES GCP Data Engineer Aug 2021 - Apr 2023

• Built and maintained large-scale data pipelines using PySpark to process retail and supply chain data across multiple stores

• Designed data transformation pipelines to convert raw transactional data into structured datasets for downstream analytics and reporting

• Developed batch and near real-time data processing workflows to support inventory planning and reporting systems

• Optimized data pipelines for performance and scalability by improving Spark execution plans and reducing processing latency

• Handled high-volume retail data processing workflows, ensuring timely data availability for inventory planning and business reporting

• Worked with business teams to understand data requirements and deliver curated datasets for reporting and analysis

• Refactored legacy scripts into modular Python-based data processing components to improve maintainability

• Implemented basic data validation checks and logging mechanisms to ensure data consistency across pipelines COGNIZANT TECHNOLOGY SOLUTIONS Data Engineer May 2018 - Aug 2021

• Built and maintained ETL pipelines using Python and SQL to process large volumes of structured and semi-structured data

• Ensured data accuracy and consistency by implementing basic validation and transformation checks during data processing

• Performed data profiling and transformation to standardize datasets for downstream reporting systems

• Designed and developed Tableau dashboards by integrating processed data from multiple sources

• Documented data sources, transformations, and validation rules to improve data traceability

• Automated data processing workflows to replace manual Excel-based reporting processes EDUCATION

Trine University Master of Science, Business Analytics Detroit, MI May 2023 - Aug 2025 Kakatiya institute of Technology & Science Bachelor of Engineering, Computer ScienceWarangal, India Jun 2014 - Apr 2018 CORE COMPETENCIES

• Cloud & Data Platforms: GCP(GCS, Dataflow, Bigquery), AWS (Glue, SageMaker, S3, EC2, EKS)

• Programming & Tools: Python, SQL

• Data Engineering: ETL/ELT, Batch & Streaming Processing, Data Modeling, Content Migration

• Big Data & Processing: PySpark, Spark SQL, Hadoop, Spark

• Tools: Jenkins, Git, Jira, Tableau

Contact this candidate