Post Job Free
Sign in

Data Science Scientist

Location:
Redmond, WA
Posted:
July 21, 2024

Contact this candidate

Resume:

DIVYE ANAND GUPTA

Redmond, Washington 206-***-**** *********@*****.*** Linkedin Github

PROFESSIONAL EXPERIENCE

Amazon Data Scientist Bellevue, Washington 06-07-2021 – present

● Formulated psychometric analysis approach for improving classification accuracy of claude LLM models. Applied the tech-stack to categorize more than 500 insurance claims into 10 categories using agent written incident notes achieving 98% accuracy.

● Projected route capacity for delivery partners (DSP) using XGB-Linear model for long-range capacity planning using 500 features specific to DSP performance, station volume, network health, reliability, vehicles, drivers. Enhanced ability to predict for less tenured DSPs achieving 500 bps improvement over baseline model saving $70M.

● Optimized long-range capacity planning decisions using mixed integer linear programming using Gurobi and Xpress solvers to generate target deployment plans for stations. Using conditional indicator constraints, calculated temporal aspects of capacity utilization and generated eligible DSP, role and station mapping to make workforce deployment decisions across the North America region for peak planning. Using chance constraints, added the fluctuations in volume signals allowing planners to do scenario analysis at station level and save $200M in planning costs.

● Launched logistic regression control design to calculate weights of telematic driving behaviors and modified safety evaluation scorecards for last mile drivers. Summarized the change in experience using volatility analysis and descriptive statistics to show improvements in driving behaviors fuelled by reduced incentive cost savings of $10M.

● Modeled XGBoost classification algorithm for internal safety risk evaluation models using telematics data, weather data and workplace conditions to build decision mechanisms for targeted coaching aimed at reducing Accidents per Million Miles. Using a minimum detectable effect of 2% and statistical power of 90%, implemented an experiment to observe cost savings of $16M.

● Estimated effectiveness of existing safety programs using generalized synthetic control causal inference framework by calculating counterfactual outcomes and performing gap analysis to estimate impact on safety metrics through training period. Using Average Treatment Effect for Treated Units (ATT) vs time of treatment, estimated training efficiency to be 90 days.

● Sketched global time series forecasting design using CatBoost regression to forecast point estimates for labor attendance 28 days into the future using data distribution-based features, time related features, holiday features and categorical features achieving average improvement of 159 bps over baseline model reducing labor hiring costs by $30M.

● Established causal relationships between sort center operations and package defects to highlight controllable vs uncontrollable factors. Used A/B testing to highlight actionable insights for improving controllable metrics such as increasing throughout per hour and reducing delivery promise misses by average improvement of 10%. Amazon Business Intelligence Engineer Seattle, Washington 01-06-2020 – 06-04-2021

● Devised customer service associate personalized coaching models using probabilistic prediction of successful delivery on customer contacts using tree based classifiers. Pioneered performance vs mix contribution index to bridge KPI variance summarizing results using Tableau Dashboards reducing manual effort by 50% for 726 users savings $7M in improved efficiency. Deloitte Data Science Specialist McLean, Virginia 08-05-2019 – 01-03-2020

● Time Series Forecasting of Call Volume Data (Insight Studio): Developed multivariate time series forecasting model of call volume data for pharmaceutical company using SARIMA-X and XG-Boost models using data-distribution based features, time based features, holidays and call center capacity constraints. Visualized feature importance using partial dependence plots, scatter plots and SHAP values to improve model explainability achieving average improvement of 210 bps over baseline model. Tagbin Business Partner (Entrepreneurship) Gurugram, Delhi 05-01-2016 – 03-31-2018

● Designed communication and collaboration tool “Eazespot” showing most relevant tasks, chats and emails on a single platform for 800 live beta users. Using bi-grams labeled emails into tasks, meetings, to-do’s linking them with a calendar to show relevant information on a single UI called Ease-Today.

Snapdeal Business Analyst - Product Management Gurugram, Delhi 07-01-2015 – 04-30-2016

● Customer Promise Engine: Build random forest classification algorithm on R, taking delivery intervals as 0-2, 3-5, 4-7, 8 or more. Input features included range of delivery time between origin and destination zip codes, holiday features and time related features achieving average accuracy of 94%, average improvement of 2600 bps over baseline model. TECHNICAL SKILLS

Masters and 8+ years of experience in Data Science modeling and Analytics. Delivered projects with cost entitlements $300M+. Scaled startup from $60,000 to $1M+ and managed team of 16 engineers, designers and scientists to develop Eazespot. Languages: Python, SQL, R; Tools: Tableau, Streamlit, Pyspark, SKLearn, XGBoost, CatBoost, AWS S3, Batch, Lambda, Cloudwatch, Sagemaker, Glue, Athena; Science and Business: Machine Learning, Statistics, Regression, Decision Science and Analytics, Optimization, Mixed Integer Linear Programming, Causal Inference Analysis, Experiment Design, Generalized Synthetic Control, A/B testing, Neural Networks, LLM applications, Entrepreneurship, Business Psychology; EDUCATION

Purdue University Masters Business Analytics and Information Management West Lafayette, IN 06-13-2018 – 05-08-2019 IIT Roorkee Bachelors Production and Industrial Engineering Roorkee, Uttarakhand, India 07-15-2011 – 05-15-2015



Contact this candidate