Harshitha Parupalli
NJ *************@*****.*** (*45) 208 - 5178
Professional Summary
Results-driven Data Engineer with hands-on experience designing and implementing scalable data solutions across healthcare, retail, and fintech sectors. Skilled in building high-performance ETL pipelines, real-time data streaming, and deploying machine learning models using AWS, Azure, GCP, PySpark, Kafka, and Airflow. Proven track record of delivering secure, compliant, and analytics-ready data systems supporting critical business decisions and regulatory needs (HIPAA, SOX, PCI DSS). Passionate about transforming complex data into actionable insights and enabling data-driven innovation. Education
Masters in Computer Science; NJIT, NJ, USA Dec 2024 Technical Skills
PROGRAMMING: Python, C/C++, C#, R, Shell
CLOUD TECHNOLOGIES: AWS - Glue, EC2, S3, Lambda, Sage Maker, DynamoDB, Aurora, CloudFormation, Redshift Azure - Synapse Analytics, Data Factory, Azure MySQL, Azure Data Lake, EventHub, Databricks, Google Cloud Platform BIGDATA, DATABASES, ETL: Oracle, Pyspark, MapReduce, Kafka, SSIS, Talend, Airflow, DB T, Informatica, Apache Flink, Splunk MACHINE LEARNING & DEEP LEARNING: Supervise & Unsupervise Learning, Neural Networks, NLP, Time-series analysis CI/CD, CONTAINERIZATION: GIT, Terraform, Ansible, Jenkins, Docker, Kubernetes VISUALIZATION TOOLS: Tableau, SAS, Google, PowerBI, MS Excel PROJECT MANAGEMENT: Agile, Scrum, Jira
ENVIRONMENT: SDLC, Agile, Scrum, Waterfall, Windows, Mac OS, Linux Experience
ELEVANCE HEALTH, United States Data Engineer Feb 2025 - Present
Developed and optimized scalable ETL pipelines using PySpark, Airflow, and AWS Glue to process and ingest millions of patient records from EHR systems into AWS Redshift, ensuring HIPAA compliance and data integrity.
Enabled real-time patient monitoring by implementing streaming data solutions using Apache Kafka and AWS Kinesis to process telemetry data from IoT-enabled medical devices.
Collaborated with data scientists to deploy predictive models for patient readmission and disease progression using SageMaker and TensorFlow, integrating outputs into PowerBI dashboards for clinical decision support.
Built secure data lakes and warehousing solutions on Azure Synapse and Azure Data Lake for clinical trials data, improving query performance by 40% and enabling self-service analytics for research teams.
Automated CI/CD pipelines with Jenkins and Terraform to manage infrastructure-as-code for healthcare data platforms across dev, test, and prod environments, accelerating release cycles while maintaining compliance.
Integrated disparate healthcare data sources including HL7, FHIR, and claims data into a unified data model using Talend and Informatica, enhancing interoperability and supporting population health analytics initiatives. KAISER PERMANENTE, United States Data Engineer Intern Jul 2024 – Dec 2024
Designed and deployed data ingestion workflows using Azure Data Factory and Databricks to consolidate EMR, lab, and imaging data, reducing manual data entry errors and improving analytics accuracy.
Implemented role-based access control (RBAC) and encryption at rest/in-transit in AWS S3 and Azure Data Lake to meet HIPAA and HITRUST security standards for sensitive patient data.
Engineered real-time dashboards in Power BI and Tableau that monitored hospital resource utilization (ICU beds, ventilators, staff) by connecting to streaming data pipelines via Kafka and Flink.
Optimized healthcare claims ETL pipelines using SQL and Spark to cut data processing time by 50%, enabling faster reimbursement cycle tracking and fraud detection analytics.
Worked in Agile/Scrum teams alongside clinicians and analysts to deliver data products for clinical trial recruitment, aligning deliverables with FDA and IRB regulatory requirements. KRANION TECHNOLOGIES, INDIA Jr. Data Engineer Apr 2020 – Jul 2023
Developed robust ETL pipelines using PySpark and AWS Glue to process multi-source retail data including POS transactions, customer behavior, and inventory metrics, driving real-time insights for dynamic pricing and stock optimization.
Built a customer 360 data platform using Azure Synapse and Data Lake, integrating CRM, e-commerce, and loyalty data to power personalized marketing and segmentation models.
Implemented real-time analytics solutions using Kafka, Kinesis, and Flink to track online customer journeys and trigger location-based promotions, boosting customer engagement by 25%.
Deployed interactive dashboards in Power BI and Tableau for merchandising and sales teams, enabling SKU-level performance tracking and improving category planning across regions.
Engineered highly secure, scalable data pipelines using AWS Lambda, Redshift, and DynamoDB to ingest financial transactions, supporting fraud detection and compliance workflows (AML, KYC).
Collaborated with data scientists to operationalize credit risk and churn prediction models in production using Airflow and SageMaker, improving decision-making for loan underwriting.
Automated data quality and reconciliation processes for financial reporting using DBT and Informatica, ensuring SOX and PCI DSS compliance across data assets.
Containerized and deployed microservices architecture using Docker and Kubernetes for real-time payments processing data flows, reducing latency and increasing system resilience in high-volume trading environments.