Post Job Free
Sign in

Data Engineer Machine Learning

Location:
Jersey City, NJ
Posted:
October 15, 2025

Contact this candidate

Resume:

Harshitha Parupalli

NJ *************@*****.*** (*45) 208 - 5178

Professional Summary

Results-driven Data Engineer with hands-on experience designing and implementing scalable data solutions across healthcare, retail, and fintech sectors. Skilled in building high-performance ETL pipelines, real-time data streaming, and deploying machine learning models using AWS, Azure, GCP, PySpark, Kafka, and Airflow. Proven track record of delivering secure, compliant, and analytics-ready data systems supporting critical business decisions and regulatory needs (HIPAA, SOX, PCI DSS). Passionate about transforming complex data into actionable insights and enabling data-driven innovation. Education

Masters in Computer Science; NJIT, NJ, USA Dec 2024 Technical Skills

PROGRAMMING: Python, C/C++, C#, R, Shell

CLOUD TECHNOLOGIES: AWS - Glue, EC2, S3, Lambda, Sage Maker, DynamoDB, Aurora, CloudFormation, Redshift Azure - Synapse Analytics, Data Factory, Azure MySQL, Azure Data Lake, EventHub, Databricks, Google Cloud Platform BIGDATA, DATABASES, ETL: Oracle, Pyspark, MapReduce, Kafka, SSIS, Talend, Airflow, DB T, Informatica, Apache Flink, Splunk MACHINE LEARNING & DEEP LEARNING: Supervise & Unsupervise Learning, Neural Networks, NLP, Time-series analysis CI/CD, CONTAINERIZATION: GIT, Terraform, Ansible, Jenkins, Docker, Kubernetes VISUALIZATION TOOLS: Tableau, SAS, Google, PowerBI, MS Excel PROJECT MANAGEMENT: Agile, Scrum, Jira

ENVIRONMENT: SDLC, Agile, Scrum, Waterfall, Windows, Mac OS, Linux Experience

ELEVANCE HEALTH, United States Data Engineer Feb 2025 - Present

Developed and optimized scalable ETL pipelines using PySpark, Airflow, and AWS Glue to process and ingest millions of patient records from EHR systems into AWS Redshift, ensuring HIPAA compliance and data integrity.

Enabled real-time patient monitoring by implementing streaming data solutions using Apache Kafka and AWS Kinesis to process telemetry data from IoT-enabled medical devices.

Collaborated with data scientists to deploy predictive models for patient readmission and disease progression using SageMaker and TensorFlow, integrating outputs into PowerBI dashboards for clinical decision support.

Built secure data lakes and warehousing solutions on Azure Synapse and Azure Data Lake for clinical trials data, improving query performance by 40% and enabling self-service analytics for research teams.

Automated CI/CD pipelines with Jenkins and Terraform to manage infrastructure-as-code for healthcare data platforms across dev, test, and prod environments, accelerating release cycles while maintaining compliance.

Integrated disparate healthcare data sources including HL7, FHIR, and claims data into a unified data model using Talend and Informatica, enhancing interoperability and supporting population health analytics initiatives. KAISER PERMANENTE, United States Data Engineer Intern Jul 2024 – Dec 2024

Designed and deployed data ingestion workflows using Azure Data Factory and Databricks to consolidate EMR, lab, and imaging data, reducing manual data entry errors and improving analytics accuracy.

Implemented role-based access control (RBAC) and encryption at rest/in-transit in AWS S3 and Azure Data Lake to meet HIPAA and HITRUST security standards for sensitive patient data.

Engineered real-time dashboards in Power BI and Tableau that monitored hospital resource utilization (ICU beds, ventilators, staff) by connecting to streaming data pipelines via Kafka and Flink.

Optimized healthcare claims ETL pipelines using SQL and Spark to cut data processing time by 50%, enabling faster reimbursement cycle tracking and fraud detection analytics.

Worked in Agile/Scrum teams alongside clinicians and analysts to deliver data products for clinical trial recruitment, aligning deliverables with FDA and IRB regulatory requirements. KRANION TECHNOLOGIES, INDIA Jr. Data Engineer Apr 2020 – Jul 2023

Developed robust ETL pipelines using PySpark and AWS Glue to process multi-source retail data including POS transactions, customer behavior, and inventory metrics, driving real-time insights for dynamic pricing and stock optimization.

Built a customer 360 data platform using Azure Synapse and Data Lake, integrating CRM, e-commerce, and loyalty data to power personalized marketing and segmentation models.

Implemented real-time analytics solutions using Kafka, Kinesis, and Flink to track online customer journeys and trigger location-based promotions, boosting customer engagement by 25%.

Deployed interactive dashboards in Power BI and Tableau for merchandising and sales teams, enabling SKU-level performance tracking and improving category planning across regions.

Engineered highly secure, scalable data pipelines using AWS Lambda, Redshift, and DynamoDB to ingest financial transactions, supporting fraud detection and compliance workflows (AML, KYC).

Collaborated with data scientists to operationalize credit risk and churn prediction models in production using Airflow and SageMaker, improving decision-making for loan underwriting.

Automated data quality and reconciliation processes for financial reporting using DBT and Informatica, ensuring SOX and PCI DSS compliance across data assets.

Containerized and deployed microservices architecture using Docker and Kubernetes for real-time payments processing data flows, reducing latency and increasing system resilience in high-volume trading environments.



Contact this candidate