Post Job Free
Sign in

Machine Learning Engineer

Location:
United States
Posted:
July 18, 2025

Contact this candidate

Resume:

PROFESSIONAL SUMMARY

Over *+ years of experience as a Data and Machine Learning Engineer, with deep expertise in building scalable, AI-powered data platforms, fraud detection systems, and large-scale ML infrastructure across healthcare, finance, and e-commerce industries.

Expert in the full ML lifecycle—from data ingestion, processing, model development, MLOps, to real-time deployment—leveraging tools such as TensorFlow, PyTorch, Apache Spark, Azure Synapse, and Databricks.

Extensive hands-on experience designing LLM-driven solutions, document parsing pipelines, and time-series forecasting systems. Adept at integrating ML pipelines into cloud-native environments using AWS, Azure, and Google Cloud.

Strong skills in NLP, generative AI, RAG, computer vision, and deep learning. Adept at stakeholder collaboration, mentoring teams, and driving business transformation through data-driven insights and automation.

Strong hands-on expertise in Large Language Models (LLMs) including designing and deploying LLM-driven systems for document parsing, recommendation engines, and generative AI applications.

Deep knowledge of machine learning algorithms such as classification, clustering, time-series forecasting, and ensemble models for real-world predictive analytics and risk modeling.

Built and optimized RAG (Retrieval-Augmented Generation) pipelines using hybrid retrieval methods to enhance contextual recommendations and content generation.

Proficient in deep learning frameworks such as TensorFlow and PyTorch, with successful use cases in OCR, Graph Convolutional Networks (GCNs), and image-based document intelligence.

Implemented MLOps pipelines using MLflow, Docker, Kubernetes, and CI/CD tools (GitHub Actions, Bitbucket, Jenkins) to automate training, deployment, monitoring, and versioning of ML models.

Developed and maintained high-throughput data preprocessing pipelines using Apache Spark, PySpark, Azure Data Factory, and Flink to transform terabytes of raw structured and unstructured data.

Designed real-time fraud detection systems and medical alert systems using Kafka, Google Pub/Sub, and anomaly detection models, reducing fraud and improving patient response times.

Architected cloud-native ML solutions on both Microsoft Azure and AWS, including Azure Synapse, Cosmos DB, S3, Glue, and Lambda, ensuring cost-efficiency and performance at scale.

Built robust document parsing systems combining OCR, LLMs, NER, and layout analysis that reduced manual review time and improved data quality for healthcare providers.

Developed enterprise-wide data lakes on Azure and AWS to centralize structured, semi-structured, and unstructured datasets for seamless downstream analytics and model integration.

Applied model monitoring, drift detection, and retraining workflows using MLflow and Azure Monitor, ensuring models remain accurate and relevant over time.

Enhanced governance and data quality through Great Expectations and version-controlled feature stores to meet compliance and business audit requirements.

Integrated ML pipelines with interactive BI tools like Power BI and Looker for real-time insights into patient health metrics, fraud alerts, and financial trends.

SKILLS

Programming Languages: Python, Java, SQL, Bash, Scala

Data Engineering & Processing: Apache Spark, Hadoop, Kafka, Flink, Apache NiFi, dbt (Data Build Tool), Apache Hive, Presto/Trino

Cloud Platforms: AWS (S3, EMR, RDS, Redshift, Glue, Lambda), Microsoft Azure (Azure Data Lake, Synapse, Cosmos DB, Azure Data Factory), Google Cloud Platform (BigQuery, Cloud SQL, Dataproc, Dataflow)

Databases & Warehousing: PostgreSQL, MySQL, Microsoft SQL Server, Snowflake, MongoDB, Cassandra

Machine Learning Frameworks: TensorFlow, PyTorch, Scikit-learn

Machine Learning & AI: Deep learning, Generative AI, Natural Language Processing (NLP), Computer Vision

(Object Detection, OCR, Document Parsing), Large Language Models (LLMs), Time-Series Forecasting, Recommendation Systems, RAG, Random Forest, Decision Trees, Classification, Clustering, K-Nearest Neighbors(K-NN), Kernel SVM

Deployment & MLOps: CI/CD Pipelines, Model Monitoring & Versioning, Docker & Kubernetes for Containerization, Apache Airflow, MLflow for Experiment Tracking, Real-Time Data Streaming, Terraform

Testing & CI/CD: Pytest, Unit Testing, Continuous Integration & Deployment (CI/CD), Model Monitoring & Versioning

Data Orchestration: Apache Airflow, Talend

Data Governance & Quality: Great Expectations, Alation

Business Intelligence & Visualization: Power BI, Looker

WORK HISTORY

Senior ML Engineer

CitiusTech – Irving, TX

May 2021 – Current

Developed and deployed LLM-powered document parsing pipelines using OCR, deep learning, and Graph Convolutional Networks (GCNs), improving healthcare data extraction accuracy by over 40%.

Built end-to-end ML pipelines for fraud detection and patient monitoring using TensorFlow, PyTorch, and Azure Synapse, enabling real-time analysis of millions of medical claims and patient signals.

Engineered real-time data streaming systems using Apache Kafka and Google Cloud Pub/Sub, significantly reducing emergency alert response times and improving critical care outcomes.

Designed and deployed LLM-based systems for document parsing, leveraging OCR, Named Entity Recognition, and Graph Convolutional Networks to extract structured data from unstructured formats.

Integrated LLMs with recommendation engines using Retrieval-Augmented Generation (RAG), enhancing content relevance and personalization.

Orchestrated scalable machine learning solutions on Azure ML integrated with Azure DevOps for deployment automation.

Designed and maintained cloud-native data lakes and ML workflows across Azure and AWS, enabling scalable ingestion, transformation, and unified processing of structured and unstructured healthcare datasets.

Integrated MLflow, Great Expectations, and CI/CD pipelines (Bitbucket, Azure Repos) to establish a robust MLOps framework supporting continuous deployment, monitoring, and governance.

Collaborated on RAG-based solutions to enhance retrieval and contextual understanding within clinical documentation systems, improving query relevance and response generation.

Provided technical mentorship on deep learning model integration, ML architecture design, and productionization of AI applications, promoting team-wide adoption of best practices in MLOps and cloud engineering.

Integrated OpenCV and TensorFlow to develop computer vision components that enhanced fraud detection and document classification accuracy.

Implemented NLP pipelines using transformer models for tasks such as intent detection, named entity recognition, and document summarization.

Machine Learning Engineer

Sift – San Francisco, CA

May 2019 – Apr 2021

Engineered real-time fraud detection pipelines using Azure Synapse, Apache Spark, and Kafka, reducing fraudulent activity by 30% across millions of daily transactions.

Developed high-performance ETL workflows in PySpark and SQL, improving e-commerce data processing speeds by over 50%.

Integrated LLM-powered recommendation systems that increased user retention by 25% through context-aware product suggestions.

Migrated legacy on-prem SQL Server databases to Azure PaaS solutions, improving scalability and reducing infrastructure costs.

Built real-time Power BI dashboards that enabled business teams to monitor fraud trends and customer behavior.

Implemented CI/CD workflows via GitHub Actions and Bitbucket for automated deployment and model versioning, enhancing MLOps efficiency.

Conducted unit testing and implemented data validation checks using Great Expectations to ensure end-to-end data integrity.

Data Engineer

Rivery – Avanue, NY

Nov 2017 – Apr 2019

Developed robust ETL pipelines using Azure Data Factory, Python, and SQL to aggregate and cleanse transactional data from multiple banking systems for regulatory reporting and analytics.

Designed and deployed fraud detection models using Isolation Forests and One-Class SVM, reducing financial fraud risks across millions of transactions.

Engineered an automated regulatory reporting system that minimized manual compliance work and cut audit preparation time by 40%.

Processed high-volume financial data using Apache Spark and Hadoop, enabling real-time anomaly detection and predictive analytics at scale.

Implemented end-to-end encryption and data masking workflows to meet strict global compliance standards like GDPR and SOX.

Built and maintained Power BI dashboards to provide executive leadership with near real-time insights on financial performance, fraud trends, and risk exposure.

Collaborated with finance, risk, and IT stakeholders to integrate predictive models and improve decision-making in fraud risk management and customer segmentation.

Junior Data Engineer

Fivetran – San Francisco, CA

Nov 2015 – Oct 2017

Assisted in developing scalable ETL pipelines using Python and SQL to support foundational ML infrastructure across multiple cloud platforms.

Built automated data ingestion workflows from on-prem and cloud sources into Azure Data Lake, improving data availability for downstream analytics.

Applied data normalization and cleansing techniques to enhance data quality, consistency, and compatibility with machine learning models.

Developed preprocessing scripts for structured and semi-structured datasets, optimizing storage efficiency and query response time.

Created technical documentation and workflow diagrams for ETL logic, improving onboarding speed and codebase understanding across engineering teams.

Monitored real-time data pipelines for failures, implemented logging and alert mechanisms to support reliable 24/7 operations.

Supported the transition from legacy systems to cloud-native architecture, contributing to early-stage cloud adoption and modernization initiatives.

EDUCATION

Master of Science in Information Science

University of Texas at Austin, TX

Bachelor of Science in Information Science

University of Texas at Austin, TX

Jeroen S

Machine Learning Engineer

Email: *********@*******.***

Contact: 470-***-****

LinkedIn: https://www.linkedin.com/in/jeroen-w-s-473441265/



Contact this candidate