Data Scientist

Boston, MA
April 16, 2024

Boston, MA +1-857-***-**** 2.5 yrs experienced Data Scientist seeking full time immediate DS/ML roles EXPERIENCE

Moderna Cambridge, USA

Data Science Coop July 2023 - December 2023

● Spearheaded the FTO Analysis with Automatic Keyword Discovery on 2.5M patents that mitigated the risk of keyword oversight and decreased the time spent by lawyers on manual patent document infringement analysis.

● Discovery of keywords leveraged SME expertise, medical ontologies, definition extraction, OpenAI embeddings and density clustering resulting in a robust mechanism for staying in trend with evolving patent terminologies.

● Optimized HR functions by utilizing benefit selection data to generate employee persona clusters that worked as a metadata to an LLM chatbot prompt which enabled personalized benefit recommendations.

● Devised an automatic risk score calculation system covering all 265M domains, reducing the time needed to detect potential domain cyber squatters by 93% which decreased instances of brand infringement. Lowe's Companies, Inc. Charlotte, USA

Graduate Data Science Intern May 2023 - July 2023

● Enhanced product information accuracy on by identifying inconsistencies and implementing measures to optimize data quality with the use of LLM, resulting in increased sales and reduced product returns.

● Designed a highly effective super prompt through an iterative prompt generation approach, incorporating feedback from a prompt engineer, which resulted in a remarkable 13% increase in recall compared to the current LLM.

● Assessed the performance of LLM algorithms in production, analyzing API responses and comparing predictions against human labels, contributing to the validation and optimization of algorithm error categories for better results. Fluid AI Mumbai, India

Data Scientist September 2021 - August 2022

● Implemented an ensemble of random forest, ARIMA from an insurance agent's sales history to forecast future quarter performance with 0.76 MAPE in Watson and Frontier with QOQ and YOY sales reports.

● Deployed a defaulter system for a national bank with ETL from multiple data warehouses with SMOTE and an xgboost model that makes inference about whether a customer will default in upcoming days on real-time data daily.

● Engineered and launched the Sanatio Python library, boosting Data Scientist productivity by 200% through automating repetitive tasks and enhancements such as model validation reports, code generation, drift reports, and a fillna module.

● Designed library release of Sanatio with CICD in place to perform unit tests reaching a net coverage of 81%, binary compile, create builds, and push builds to PyPI as well as server-side authentication with air table.

● Maintained a lead creation service generates 4M USD values yearly with conversation data from chatbot with pipelines to filter potential customers from conversations, producing unique leads which are assigned to agents in real time. CapeStart Inc Nagercoil, India

Machine Learning Trainee February 2021 - August 2021

● Developed and deployed a recommendation system on AWS utilizing Pubmed SLR data, BioBERT model, and PICO annotation, resulting in look up efficiency for clinicians in identifying relevant clinical trials.

● Boosted forecasting precision by 10% within the FullIntel social media ecosystem by deploying a voting classifier of

(xgboost, random forest) leveraging Alexa metadata which enabled strategic decision-making for marketing. EDUCATION

Northeastern University - Khoury College of Computer Sciences, Boston, USA May 2024 Master of Science in Artificial Intelligence, GPA: 3.8 Alagappa Chettiar Government College of Engineering and Technology, Karaikudi, India March 2021 Bachelor of Engineering in Computer Science, GPA: 3.46 TECHNICAL SKILLS

Programming Languages: Python, SQL, R, C++

Tools\Methodology: Prompt Engineering, Fine tuning LLM, Git, Agile, CICD, Jira, Linux, AWS, Docker Libraries: Pytorch, Hugging Face, Langchain, Spacy, Nltk, Sklearn, Tensorflow, Keras, Numpy, Pandas, Matplotlib, Soft skills: Data analysis, Data Visualization, Model Explainability, Storytelling, Presentation, Leadership, Problem Solving Domains: Machine Learning, Deep Learning, Artificial Intelligence, Data Science, Natural Language Processing, LLM

