Data Scientist III

Location:

Brooklyn, NY

Posted:

June 12, 2025

Contact this candidate

Resume:

Page * of *

ANISHA THAKRAR

New York, USA • +1-617-***-**** • **************@*****.*** • https://www.linkedin.com/in/anisha-thakrar/ Result-oriented data scientist with expertise in fraud detection, consortium analytics, and applied machine learning. Proven success in building and deploying scalable models across financial, telecom, and e-commerce sectors. Skilled in aligning technical solutions with business strategy by partnering cross-functionally with product, engineering, and risk teams. Experienced in managing full project lifecycles - from data ingestion, feature engineering to model deployment and governance - with a focus on explainability, regulatory compliance, and measurable business outcomes. Education

Master of Science, Data Science - Northeastern University, Boston, USA Dec 2022 Bachelor of Technology, Information Technology - Indus University, India May 2020 Technical Skills

Python, SQL, Spark, LLM, AI, Tableau, C, C++, Databricks, R, Neo4j, A/B Testing, GitHub, Airflow, AWS SageMaker Professional Experience

Senior Data Scientist – FPF Consortium Feb 2025 – Present Socure, New York, USA

Collaborated with clients to gather requirements and conducted rigorous data quality checks across ~20B records of personally identifiable information (PII), account, and transaction data from multiple financial institutions using PySpark and Python.

Partnered with clients and internal stakeholders across risk, product, and engineering to define fraud detection objectives, align data strategies, and translate business needs into actionable machine learning solutions.

Engineered advanced fraud features for a first-party fraud consortium model, integrating anonymized data from diverse banks, fintech and neo-banks partners to uncover cross-institutional fraud patterns. Performed Information Value (IV) analysis to rank and select the most predictive features, improving model explainability and targeting capability.

Trained scalable machine learning models using H2O’s XGBoost classifier to detect first-party fraud; optimized model performance through iterative feature selection and hyperparameter tuning. Evaluated models using a suite of performance metrics including KS, AUC, AUCPR, F1, and fraud capture at varying risk depths, ensuring balanced precision-recall tradeoffs in high-risk segments.

Built an AI-powered Fraud Analyst leveraging LLMs to relabel noisy training data by analyzing top SHAP-ranked features impacting fraud and non-fraud predictions. Engineered dynamic prompts using the top 10 influential features and their definitions to guide contextual fraud interpretation. Enabled parallelized processing to accelerate relabeling at scale, improving data quality and business context. Retraining with corrected labels led to a 5% lift in KS and a 3% increase in fraud detection in the riskiest 2% of transactions.

Designed and implemented real-time monitoring dashboards in Databricks and AWS SageMaker to track ingestion pipeline health, daily API volumes, and model drift across model deployments. Data Scientist Feb 2023 – Feb 2025

Vesta Corporation, Atlanta, USA

Designed and developed telecom consortium models, including fraud models, bank authorization models, and stacking models. Leveraged predictive insights from bank authorization models to proactively identify transactions likely to be declined, enhancing fraud prevention strategies. Applied expertise in customer behavior analytics, real-time fraud detection, and consortium analytics to deliver scalable, privacy-compliant solutions for telecom and financial industries.

Led the sampling team, overseeing the development of the pipeline and aggregation of events data for feature Page 2 of 2

generation and synthetic data integration to address class imbalances. Managed strategic data splitting and implemented sample re-weighting to address downsampling and row importance. Developed a dynamic consortium data framework enabling customizable tuning of partner/channel data composition and fraud-to-nonfraud ratios within datasets.

Spearheaded the development and implementation of machine learning tree-based models using Python, including XGBoost, Random Forest, Gradient Boosting and LGBM, for a multiplatform e-commerce and telecom consortium. Formulated rules to prevent fraudulent patterns in real-time, leading to a 5% increase in approval rates and a 10% decrease in chargebacks.

Performed ad-hoc analysis on model performance and chargebacks, presenting findings to key stakeholders, including VP, SVP, COO, and CEO. Also initiated client calls to diagnose issues and provided feedback to enhance the model.

Engineered automated pipelines for sample creation and feature generation using PySpark and SQL and created graph features to enhance data insights using Neo4j. Managed end-to-end processes, from model building and evaluation to deployment, resulting in a 75% reduction in the time required from inception to deployment.

Contributed to the IT pipeline by conducting stress testing to identify potential vulnerabilities and weaknesses in the data. Collaborated closely with product managers, engineers, and cross-functional teams in LATAM, APAC, and EU to identify business opportunities and drive data-driven product initiatives.

Evaluated model performance using metrics such as confusion matrix, KS, AUC, F1 score, precision, recall, approval rate, chargeback rate and custom metrics such as Normalized Area Above the Curve (NAAC), Max Ratio of AUC and AAC

(RACA), etc.

Played a pivotal role in architecting an on-premises decisioning system, establishing scalable frameworks for feature engineering, rule creation, and model integration to support real-time decision-making. Data Science Co-op Jan 2022 – Aug 2022

Rue Gilt Groupe, Boston, USA

Developed and presented a Customer Lifetime Value model utilizing ML models (BetaGeoFitter, GammaGamma) to identify potential/lapsed members and predict future purchases. Extracted key per-customer metrics from raw transactions in Databricks, including Frequency, Age, Recency, and Monetary Value.

Utilized the MAE metric to calculate % error between training and holdout samples, achieving an average order error of

0.24 and $12.17 per order. Demonstrated high precision and accuracy with 94% of total members within the 0-1 order error bucket, and 98% within the error range of $20.

Conducted over 20 test setups and created 25 custom metrics in SnowFlake for more than 15 A/B tests. Designed dashboards in Tableau for monitoring customer behavior and test performance. Generated insights from post analysis of A/B test which drove a projected annual gross demand of $2.1M. Machine Learning Intern Dec 2019 – Apr 2020

Radix Analytics Pvt. Ltd, India

Performed time series analysis for advertisement (commercial time) demand forecasting for ZEE (Indian TV channel) using statistical techniques such as ARIMA, Exponential Smoothing, Regression and Perceptron in Python.

Implemented demand forecasting to predict the demand of flights for Kenyan Airlines in R.

Made a webpage utilizing R Shiny.

Contact this candidate