Post Job Free
Sign in

Machine Learning Data Scientist

Location:
San Jose, CA
Salary:
120000 USD per annum
Posted:
February 04, 2025

Contact this candidate

Resume:

ARPAN HAZRA

San Jose, CA USA ***** • *****.*****.****@*****.*** • +1-678-***-**** • linkedin.com/in/arpan-hazra/ SUMMARY

• Data scientist with 5+ years of experience in developing, deploying, and optimizing machine learning models across aviation, healthcare, and energy sectors.

• Proficient in Statistical Modeling and Machine Learning Techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, KNN, Bayesian, XG Boost, Clustering) in Forecasting or Predictive Analytics.

• Expert in AI/ML deployment, NLP with LLMs, and AI orchestration, with hands-on MLOps and LLMOps experience.

• Expertise in Python, SQL, and AI/ML frameworks like Keras, TensorFlow, LangChain and Hugging Face.

• Hands-on experience in productionizing ML pipeline which performs data extraction, data cleaning, feature engineering, model training and validation, model deployment and monitoring the model’s performance.

• Skilled in cloud-based AI deployments using AWS (SageMaker, Glue) and containerization tools (Docker).

• Recognized for delivering innovative, high-impact solutions that drive business results.

• Authorized to work in the USA under L2S VISA.

PROFESSIONAL EXPERIENCE

GE VERNOVA ADVANCED RESEARCH CENTRE – CONTROLS & DIGITAL IN Data Scientist– AI & Machine Learning AUG 2020 - Present

● Led the development and deployment of scalable AI/ML models for wind turbine gearbox failure prediction, integrating real-time data pipelines for predictive analytics and anomaly detection, improving failure detection rates by 45%.

● Structured a robust model quality monitoring tool using statistical and machine learning techniques

(clustering), enabling automated early detection, achieving a 60% improvement in process efficiency.

● Developed a GenAI-driven Text-to-SQL automation tool leveraging LLMs, improving database query efficiency by 40%.

● Built and deployed AI-powered Q&A chatbot solutions using RAG pipeline and vector databases, enabling seamless natural language query handling for power plant component specifications.

● Created a scalable AWS Glue platform in PySpark, employing clustering and anomaly detection to enable real- time failure predictions, allowing proactive identification of issues by 45%.

● Implemented an AI-based risk prediction tool using BERT and NLP techniques (Text classification and NER), enhancing decision-making efficiency in power and energy sectors by 30%.

● Deployed that scalable AI solutions on AWS SageMaker and containerized ML models using Docker for seamless production integration.

● Engineered an AI-based disposition recommendation tool utilizing BERT for root cause analysis and Streamlit for visualization, streamlining the gas power component turnaround time by 20%, thereby optimizing operational workflow.

● Designed and implemented an AI orchestration framework to manage the end-to-end lifecycle of AI models, including model deployment, scaling, and automated feedback loops for continuous improvement.

● Forecasted failure modes of MR coils by exploring LSTM, ARIMA, and time series clustering techniques, providing critical insights with a minimum 3-day lead time and achieving an F1-score of 87%.

● Improved an existing distress ranking prediction model using XGBoost, optimizing hyperparameters via Bayesian tuning, and enhancing RMSE by 80%.

GE RESEARCH – INDUSTRIAL AI & SOFTWARE IN

Associate Data Scientist – Data Analytics & Machine Learning JAN 2020 - JUL 2020

● Designed a predictive model using statistical and clustering strategy for monitoring the deterioration of classification system performance, enabling early detection of concept drift with a 90-day lead time and achieving an F1-score of 91%.

Assisted in integrating ML model evaluation metrics and feedback loops into existing AI pipelines to enhance model reliability and accuracy.

GE RESEARCH – PHYSICAL DIGITAL ANALYTICS IN

Data Science (Intern) – Data Analytics & Machine Learning MAY 2019 - JUL 2019

● Built a dynamic ensemble model for multi-class classification, enhancing accuracy by 10% through clustering and nearest neighbor methods, achieving an overall accuracy of 85%. CALCUTTA ELECTRIC SUPPLY CORP. (CESC) LTD. IN

Power Distribution Engineer JUL 2016 - SEP 2017

● Initiated a predictive model to reduce electricity theft, resulting in a 5% increase in annual revenue. EDUCATION

INDIAN STATISTICAL INSTITUTE IN

Master of Technology in Quality, Reliability and Operation Research; Major in Statistics & ML, Rank: 1ST (88.25%) 2018 -2020 INDIAN INSTITUTE OF ENGINEERING SCIENCE AND TECHNOLOGY IN Bachelor of Technology in Electrical Engineering

Major in Electrical Engg. Rank: 5TH (CGPA: 8.6) 2012 -2016 TECHNICAL SKILLS

Programming: Python, SQL (Proficient), R (Experienced) Machine Learning & AI: Regression, Classification, Clustering, XGBoost, LSTM, ANN, BERT. Statistical Analysis: A/B Test, Hypothesis Test, Nonparametric Test, PCA, Feature Engineering, ARIMA. NLP & LLMs: BERT, Transformer Models, PEFT, Prompt Engineering, RAG, Text-to-SQL Automation Big Data & AI Orchestration: PySpark, Vector Databases (VectorDB) AI/ML Frameworks: TensorFlow, PyTorch, Keras, LangChain, Hugging Face Cloud & DevOps: AWS (SageMaker, Glue), Docker, MLOps Visualization & Analytics: Tableau, Matplotlib, Seaborn, Plotly Certifications: Generative AI with LLM, Machine Learning in Production (MLOps), Data Science Foundations

(PadhAI-IIT Madras)

AWARDS & RECOGNITIONS

• SPOT LIGHT & IMPACT AWARD (GE Research) – For excellence in delivering AI-powered solutions.

• 1st Position at FORSIT (Tech Event by IEEE, India).



Contact this candidate