Data Scientist

Location:

Boston, MA

Salary:

130000

Posted:

June 22, 2026

Contact this candidate

Resume:

Mohan Bhosale

Boston MA 415-***-**** *******.**@************.*** Linkedin Github Portfolio EDUCATION Architecting Solutions for Real-World Data Challenges Northeastern University (NEU) Boston, MA

Khoury College of Computer Sciences: Masters in Data Science; GPA: 3.96/4.0 April 2026 Vellore Institute of Technology (VIT) Vellore, TN

Bachelor of Technology, Computer Science Jan 2021

PROFESSIONAL EXPERIENCE - 4+ YEARS

Data Scientist Co-op Boston, MA

Cohere Health (Healthcare Prior Auth SaaS Platform) Jan 2025 – Aug 2025

• Built production LLM pipelines on AWS SageMaker using LangChain, RAG, FAISS vector embeddings, and prompt engineering to extract structured clinical signals from 10M+ unstructured EHR and claims records, deploying via MLflow and CI/CD with automated monitoring and recovering $8.5M in operational costs

• Developed end-to-end ML scoring and risk stratification models (XGBoost, logistic regression) through full lifecycle: problem framing, feature engineering, training, evaluation, validation, and production deployment with automated retraining pipelines on S3/Glue ETL infrastructure

• Designed causal inference frameworks (DiD, propensity matching) to isolate incremental model impact from confounders, producing credible ROI measurement adopted by senior leadership, and built Tableau dashboards tracking 15+ KPIs, cutting report time by 40% Data Scientist Hyderabad, IN

Mediamint (Digital Marketing & AdTech Analytics) Feb 2022 – Jul 2023

• Designed and deployed NLP and embedding pipelines using transformer-based models (BERT, TF-IDF), sentiment analysis, and RAG with vector DB retrieval on large-scale unstructured customer interaction data, feeding structured signals into downstream classification and recommendation models

• Built Marketing Mix and forecasting models (ARIMA, Prophet, SARIMAX, LSTM) with diminishing returns and saturation curve estimation, enabling $2M+ budget reallocation toward higher-yield channels; formulated multi-channel spend as a MIP using CPLEX, boosting ROI by 15%

• Designed sequential A/B testing and experimentation frameworks with power analysis, FDR controls, and holdout group design, delivering statistically defensible measurement of algorithmic changes at 95% confidence for product and business leadership

• Engineered distributed PySpark/Kafka/Airflow data pipelines processing 500K+ daily events with automated schema validation, quality gates, and CI/CD, reducing incidents by 50% and ensuring reliable feature stores for downstream ML and optimization models

• Built customer segmentation and behavioral scoring models (K-Means, DBSCAN, gradient boosting) partitioning large-scale interaction data by engagement signals, lifting targeted conversion by 25% and mentoring junior analysts on modeling best practices and code quality

Data Analyst Bangalore, IN

Allround Club (EdTech E-Commerce Platform) Jan 2021 – Feb 2022

• Built hybrid recommendation engine (collaborative filtering + neural embedding retrieval) in Python, TensorFlow, and Spark with automated feature pipelines, lifting purchase conversion by 25% and retention by 20% through scalable production ML deployment

• Developed churn prediction models (logistic regression, survival analysis) on longitudinal behavioral data, conducting cohort and funnel analysis to inform personalized retention strategies that reduced 30-day attrition by 12%

• Designed Tableau and Power BI dashboards tracking LTV/CAC, funnel drop-off, and MoM retention curves, translating complex ML outputs into clear strategic narratives for product and business leadership PROJECTS & RESEARCH

§ ClassifyAI: Multi-Agent LLM System LangChain, RAG, OpenAI, Docker, MLflow, CI/CD

• Architected production-grade multi-agent LLM system with 8 specialized agents via LangChain and RAG with FAISS vector search, prompt engineering, JSON-schema outputs, and human-in-the-loop governance; deployed in Docker with CI/CD and MLflow model registry, achieving 83.6% accuracy on clinical datasets

§ HIMAS: Federated Learning Healthcare AI Flower, Google ADK, Airflow, GCP, Multi-Agent AI 1st Place Google Hackathon

• Built production MLOps platform on GCP with full CI/CD, Docker, Airflow orchestration, and automated model monitoring enabling privacy-preserving ICU mortality prediction (ROC-AUC 0.85) across 3 hospital systems with HIPAA-compliant 6-agent clinical AI system

§ Hallucination Evaluation in LLMs LLM Benchmarking, Responsible AI, Factuality Scoring, NLP Research

• Authored research paper developing multi-dimensional LLM evaluation framework benchmarking GPT-4, Claude, and Llama-2 with novel factuality scoring and hallucination detection methodology, improving precision by 23% over baselines TECHNICAL SKILLS

Languages: Python (Expert, OOP), SQL (Expert), R, Scala, Shell Scripting Data: Feature Engineering, EDA, Data Wrangling, Validation, Bayesian Methods, Version Control, Production Support, Git, Model Evaluation GenAI & LLMs: LangChain, RAG, FAISS/Pinecone, Prompt Engineering, Fine-tuning (LoRA/QLoRA), Multi-Agent Systems ML & Deep Learning: PyTorch, TensorFlow, Scikit-learn, XGBoost, Hugging Face, CNNs, LSTMs, Transformers, BERT, NLP MLOps & Deployment: Docker, Kubernetes, MLflow, CI/CD, Airflow, Model Monitoring, Feature Stores, API Serving Cloud: AWS (SageMaker, S3, Glue, EMR, Bedrock) GCP (Vertex AI, BigQuery) Azure (Azure ML) Snowflake, dbt Data Engineering: PySpark, Spark, Kafka, ETL/ELT, PostgreSQL, MongoDB, Redis, Automated Pipelines, Data Quality Science & Analytics: A/B Testing, Causal Inference, DiD, Forecasting (ARIMA/Prophet/SARIMAX), Optimization (Gurobi/CPLEX) Certification: Google Data Analytics, 1st Place Google Cambridge MLOps Hackathon, AI Agents from Hugging Face

Contact this candidate