• Built and maintained end-to-end ETL pipelines using Azure Data Factory, SSIS, and AWS Lambda, automating ingestion of structured and semi-structured healthcare datasets and boosting throughput by 30%.
• Architected an Azure Synapse Analytics data warehouse integrating EMR/EHR sources (HL7, FHIR), applying compression and partitioning strategies that cut storage costs by 45% while supporting scalable patient data analytics.
• Developed and optimized data models (star and snowflake schemas) and created data marts in Snowflake and SQL Server, reducing query runtimes and improving KPI reporting for clinical and operations teams.
• Orchestrated real-time streaming pipelines with Apache Kafka and Spark Streaming, reducing data latency by 80% and enabling timely patient monitoring and anomaly detection.
• Ensured data quality, integrity, and governance by implementing automated validation tests in Python and PySpark, achieving 95% test coverage and eliminating 50+ hours of manual QA effort per month.
• Monitored and troubleshot pipelines with proactive logging/alerting, ensuring reliability and reducing downtime by 25%.
• Implemented HIPAA-compliant security controls (encryption, masking, role-based access), reducing privacy breach risks by 90% and ensuring compliance with PHI regulations.
• Designed and delivered Power BI dashboards for executives and clinical teams, translating data into actionable insights for patient care and financial operations.
• Pioneered CI/CD automation using Git, Jenkins, and Docker, improving deployment frequency by 40% and reducing environment-related incidents.
• Documented data flows, pipeline processes, and governance practices, enabling transparency, reproducibility, and compliance audits.
• Partnered with Data Science teams to operationalize ML models (readmission risk, anomaly detection) by building scalable data pipelines in Databricks and Snowflake.
• Automated feature engineering workflows in PySpark, reducing model training data prep time by 60% and improving experiment consistency.
• Integrated predictive model outputs into Power BI dashboards, enabling clinicians to act on real-time risk insights.