ARPAN HAZRA
San Jose, CA USA ***** • *****.*****.****@*****.*** • +1-678-***-**** • linkedin.com/in/arpan-hazra/ SUMMARY
• Data scientist with 5+ years of experience in developing, deploying, and optimizing machine learning models across aviation, healthcare, and energy sectors.
• Proficient in Statistical Modeling and Machine Learning Techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, KNN, Bayesian, XG Boost, Clustering) in Forecasting or Predictive Analytics.
• Expert in AI/ML deployment, NLP with LLMs, and AI orchestration, with hands-on MLOps and LLMOps experience.
• Expertise in Python, SQL, and AI/ML frameworks like Keras, TensorFlow, LangChain and Hugging Face.
• Hands-on experience in productionizing ML pipeline which performs data extraction, data cleaning, feature engineering, model training and validation, model deployment and monitoring the model’s performance.
• Skilled in cloud-based AI deployments using AWS (SageMaker, Glue) and containerization tools (Docker).
• Recognized for delivering innovative, high-impact solutions that drive business results.
• Authorized to work in the USA under L2S VISA.
PROFESSIONAL EXPERIENCE
GE VERNOVA ADVANCED RESEARCH CENTRE – CONTROLS & DIGITAL IN Data Scientist– AI & Machine Learning AUG 2020 - Present
● Led the development and deployment of scalable AI/ML models for wind turbine gearbox failure prediction, integrating real-time data pipelines for predictive analytics and anomaly detection, improving failure detection rates by 45%.
● Structured a robust model quality monitoring tool using statistical and machine learning techniques
(clustering), enabling automated early detection, achieving a 60% improvement in process efficiency.
● Developed a GenAI-driven Text-to-SQL automation tool leveraging LLMs, improving database query efficiency by 40%.
● Built and deployed AI-powered Q&A chatbot solutions using RAG pipeline and vector databases, enabling seamless natural language query handling for power plant component specifications.
● Created a scalable AWS Glue platform in PySpark, employing clustering and anomaly detection to enable real- time failure predictions, allowing proactive identification of issues by 45%.
● Implemented an AI-based risk prediction tool using BERT and NLP techniques (Text classification and NER), enhancing decision-making efficiency in power and energy sectors by 30%.
● Deployed that scalable AI solutions on AWS SageMaker and containerized ML models using Docker for seamless production integration.
● Engineered an AI-based disposition recommendation tool utilizing BERT for root cause analysis and Streamlit for visualization, streamlining the gas power component turnaround time by 20%, thereby optimizing operational workflow.
● Designed and implemented an AI orchestration framework to manage the end-to-end lifecycle of AI models, including model deployment, scaling, and automated feedback loops for continuous improvement.
● Forecasted failure modes of MR coils by exploring LSTM, ARIMA, and time series clustering techniques, providing critical insights with a minimum 3-day lead time and achieving an F1-score of 87%.
● Improved an existing distress ranking prediction model using XGBoost, optimizing hyperparameters via Bayesian tuning, and enhancing RMSE by 80%.
GE RESEARCH – INDUSTRIAL AI & SOFTWARE IN
Associate Data Scientist – Data Analytics & Machine Learning JAN 2020 - JUL 2020
● Designed a predictive model using statistical and clustering strategy for monitoring the deterioration of classification system performance, enabling early detection of concept drift with a 90-day lead time and achieving an F1-score of 91%.
Assisted in integrating ML model evaluation metrics and feedback loops into existing AI pipelines to enhance model reliability and accuracy.
GE RESEARCH – PHYSICAL DIGITAL ANALYTICS IN
Data Science (Intern) – Data Analytics & Machine Learning MAY 2019 - JUL 2019
● Built a dynamic ensemble model for multi-class classification, enhancing accuracy by 10% through clustering and nearest neighbor methods, achieving an overall accuracy of 85%. CALCUTTA ELECTRIC SUPPLY CORP. (CESC) LTD. IN
Power Distribution Engineer JUL 2016 - SEP 2017
● Initiated a predictive model to reduce electricity theft, resulting in a 5% increase in annual revenue. EDUCATION
INDIAN STATISTICAL INSTITUTE IN
Master of Technology in Quality, Reliability and Operation Research; Major in Statistics & ML, Rank: 1ST (88.25%) 2018 -2020 INDIAN INSTITUTE OF ENGINEERING SCIENCE AND TECHNOLOGY IN Bachelor of Technology in Electrical Engineering
Major in Electrical Engg. Rank: 5TH (CGPA: 8.6) 2012 -2016 TECHNICAL SKILLS
Programming: Python, SQL (Proficient), R (Experienced) Machine Learning & AI: Regression, Classification, Clustering, XGBoost, LSTM, ANN, BERT. Statistical Analysis: A/B Test, Hypothesis Test, Nonparametric Test, PCA, Feature Engineering, ARIMA. NLP & LLMs: BERT, Transformer Models, PEFT, Prompt Engineering, RAG, Text-to-SQL Automation Big Data & AI Orchestration: PySpark, Vector Databases (VectorDB) AI/ML Frameworks: TensorFlow, PyTorch, Keras, LangChain, Hugging Face Cloud & DevOps: AWS (SageMaker, Glue), Docker, MLOps Visualization & Analytics: Tableau, Matplotlib, Seaborn, Plotly Certifications: Generative AI with LLM, Machine Learning in Production (MLOps), Data Science Foundations
(PadhAI-IIT Madras)
AWARDS & RECOGNITIONS
• SPOT LIGHT & IMPACT AWARD (GE Research) – For excellence in delivering AI-powered solutions.
• 1st Position at FORSIT (Tech Event by IEEE, India).