Data Scientist - Azure, Snowflake, MLOps

Location:

Fort Lauderdale, FL

Posted:

April 19, 2026

Contact this candidate

Resume:

Rushikesh Dhumal

*.******@*******.*** 732-***-****

linkedin.com/in/rushikeshdhumal/

rushikeshdhumal.github.io

Education

Master of Science in Statistics - Data Science, Rutgers University, New Brunswick, August 2024 — May 2026

Microsoft Certified: Azure Data Scientist Associate DP-100 July 2025

Skills

Programming & Software: Python, R, SQL, SAS, SPSS, Pandas, NumPy, PySpark, Git, Docker Machine Learning & Gen AI: Scikit-learn, TensorFlow, PyTorch, SHAP, spaCy, Gensim, LangChain, RAG Cloud & Databases: Azure, AWS, Snowflake, dbt, PostgreSQL, Spark, Hive, Databricks Reporting & Visualization: Power BI, Tableau, Looker Studio, Superset, Streamlit, Alteryx, Microsoft Office

Experience

AI/ML Software Engineering Intern Micronotes June 2025 – December 2025

• Accelerated analytics delivery for 50+ financial services clients by 2x by architecting a PySpark/Databricks ETL pipeline to ingest and wrangle 230M+ Experian credit bureau records

• Reduced data integrity errors by 30% across Databricks Bronze-to-Gold dimensional layers by deploying statistical anomaly detection with Z-score and IQR-based outlier flagging

• Cut credit-offer cycle time by 40% across 5 loan channels by automating consumer prescreen campaign reporting with scheduled PySpark jobs on Databricks

Data Scientist Fields Data July 2022 – July 2024

• Improved partner matching accuracy by 70% for 600+ organizations by building an NLP engine using spaCy and Gensim to match organizations with suitable project partners based on project and organization descriptions

• Prioritized resource allocation across 120+ at-risk regions by performing population segmentation using K-Means clustering and silhouette analysis on large disaster-preparedness datasets

• Attained 80% efficiency gain in cross-functional reporting by formalizing analytical workflows into repeatable process improvements in Power Automate & Power BI

Data Scientist Intern Nestle March 2022 – May 2022

• Quantified commercial distribution effectiveness with 95% statistical significance by designing A/B tests with hypothesis testing across 1,600+ retail outlets

• Drove 15% improvement in budget allocation decisions by building Tableau KPI dashboards and delivering compelling analytic presentations to business partners across multiple organizational levels

Data Scientist Intern Fields Data April 2021 – May 2022

• Expanded data processing capacity by 25% by refining ETL pipelines ingesting from 5 sources, including Oracle NetSuite

• Enabled 3 downstream predictive modeling pipelines by collaborating with cross-functional teams to automate feature engineering using Pandas and SQL transformations across 5 data sources

• Identified top 8 key predictors for disaster response prioritization models by performing EDA with correlation analysis, variance inflation factor testing, and feature importance ranking on disaster-impact variables

• Reduced data quality errors by 25% by building validation checks, ensuring data integrity across ingested datasets

Projects

End-to-End MLOps Pipeline for Urban Mobility Analytics

• Built an ELT pipeline on Snowflake + dbt + Airflow ingesting 94M+ NYC TLC records from Azure Blob Storage into a Medallion architecture, with automated monthly orchestration, row-count reconciliation, and incremental dbt models.

• Engineered a LightGBM demand forecasting model with time-based splits, 12 lag and rolling features, and MLflow experiment tracking, achieving lower MAPE than the naive lag-168h baseline

• Designed a fully automated MLOps retraining loop using Astronomer Cosmos + Airflow TaskFlow API: monthly DAG retrains the model, logs to MLflow, and writes predictions back to Snowflake; downstream Superset dashboard surfaces KPIs, revenue trends, and demand heatmaps against Gold layer aggregates.

Calibrated Loan Default Risk Scoring System (LightGBM + SHAP)

• Built an isotonically calibrated LightGBM credit risk scoring model with SHAP interpretability for financial services risk management, outperforming the regularized logistic baseline by 12% PR-AUC

• Projected 24% revenue increase through prescriptive decision threshold optimization using false positive vs false negative cost tradeoff analysis

ATHENA (Adaptive Training & Hyperparameter Exploration Natural Assistant) MLOps Platform

• Reduced model degradation response time by 60% by architecting a LangChain/LangGraph MLOps platform with data drift detection, concept drift monitoring, and automated model retraining

• Enhanced pipeline reliability by 40% by implementing Chain-of-Thought prompt engineering with RAG & automated testing guardrails

Contact this candidate