Data Engineer Machine Learning

Location:

Texas City, TX

Salary:

75000

Posted:

October 15, 2025

Contact this candidate

Resume:

• Built and maintained end-to-end ETL pipelines using Azure Data Factory, SSIS, and AWS Lambda, automating ingestion of structured and semi-structured healthcare datasets and boosting throughput by 30%.

• Architected an Azure Synapse Analytics data warehouse integrating EMR/EHR sources (HL7, FHIR), applying compression and partitioning strategies that cut storage costs by 45% while supporting scalable patient data analytics.

• Developed and optimized data models (star and snowflake schemas) and created data marts in Snowflake and SQL Server, reducing query runtimes and improving KPI reporting for clinical and operations teams.

• Orchestrated real-time streaming pipelines with Apache Kafka and Spark Streaming, reducing data latency by 80% and enabling timely patient monitoring and anomaly detection.

• Ensured data quality, integrity, and governance by implementing automated validation tests in Python and PySpark, achieving 95% test coverage and eliminating 50+ hours of manual QA effort per month.

• Monitored and troubleshot pipelines with proactive logging/alerting, ensuring reliability and reducing downtime by 25%.

• Implemented HIPAA-compliant security controls (encryption, masking, role-based access), reducing privacy breach risks by 90% and ensuring compliance with PHI regulations.

• Designed and delivered Power BI dashboards for executives and clinical teams, translating data into actionable insights for patient care and financial operations.

• Pioneered CI/CD automation using Git, Jenkins, and Docker, improving deployment frequency by 40% and reducing environment-related incidents.

• Documented data flows, pipeline processes, and governance practices, enabling transparency, reproducibility, and compliance audits.

• Partnered with Data Science teams to operationalize ML models (readmission risk, anomaly detection) by building scalable data pipelines in Databricks and Snowflake.

• Automated feature engineering workflows in PySpark, reducing model training data prep time by 60% and improving experiment consistency.

• Integrated predictive model outputs into Power BI dashboards, enabling clinicians to act on real-time risk insights.

Contact this candidate