Data Science Lead - NLP, ML Pipelines, LLMs

Location:

Columbus, OH

Posted:

February 26, 2026

Contact this candidate

Resume:

Confidential-Internal

Sakthikala Palanisamy

Email: *************@*****.*** Contact: +1-380-***-**** LinkedIn Columbus Ohio PROFESSIONAL SUMMARY

Data Science Professional with 6 years of experience in Machine Learning and Predictive Modeling. Expert in building and deploying end-to-end ML pipelines that solve critical business challenges. Specialist in NLP and advanced analytics with extensive experience taking models from local production- grade environments using Docker and Kubernetes. Expert in the full model lifecycle, from classical statistical modeling to fine-tuning Large Language Models (LLMs). Proven track record in developing RAG (Retrieval-Augmented Generation) pipelines and Vector DBs that reduced operational costs by 30%. Authorized to work in the U.S (No Sponsorship Required). CORE COMPETENCIES

Generative AI: RAG, Vector DB, LLM, Lang Chain, Lang Graph, Fine Tuning, Chroma DB Machine Learning: Scikit-learn, NLTK, Pandas, NumPy, TensorFlow, SciPy, Spacy, ML Models Cloud Technologies: Azure Databricks, Azure Data Factory, Azure Kubernetes services (AKS), GCP Deployment Tools: Docker, Kubernetes, Fast Api, FLASK, GIT, CI/CD Pipelines Programming Languages: Python, SQL

Data Visualization Tool: Tableau, Power BI

PROFESSIONAL EXPERIENCE

Publicis Sapient Senior Associate - Data Science Mar 2020 – Oct 2023 Search Recommendation

• Built a scalable product search recommendation system using Elasticsearch to deliver highly relevant search results and personalized product suggestions in an e-commerce environment. Conducted data wrangling, preprocessing using Python and NLP techniques including tokenization, stop word removal, lemmatization, and TF-IDF vectorization to ensure clean and meaningful textual data.

• Created a custom synonym dictionary to enhance Elasticsearch relevance and user query expansion for improved audience targeting. Applied string similarity techniques including lexical similarity and cosine similarity to match and recommend relevant user segments.

• Indexed the cleaned and enriched data into Elasticsearch, and designed optimized queries using filters, analyzers, and boosting techniques to return highly accurate search recommendations. Implemented Fast API framework to build scalable REST APIs for serving recommendations, integrated with Redis to cache results and ensure low-latency responses for high-throughput requests.

• Containerized the entire application using Docker, creating lightweight and reproducible images for deployment with Uvicorn as the ASGI server. Predicated model had increased average order values by 30% through cross-selling and upselling.

Confidential-Internal

Risks Score Prediction

• Designed and implemented an end-to-end ML pipeline to identify customers falling into the “Porting hell” category—those highly likely to port out and create churn-related challenges. Leveraged daily refreshed data in Big Query to generate accurate risk scores and actionable insights for the retention team.

• Developed automated ETL and feature engineering workflows in Google Big Query using SQL to refresh pipelines daily. Trained and tuned XGBoost, lightGBM, Random Forest models, achieving ROC-AUC of 86% and precision of 78% for high-risk classification.

• Deployed model outputs to Big Query for integration into Power BI dashboards used by retention and operations teams. Automated the full pipeline using Python, scheduled batch scoring, and built monitoring to track feature drift and model performance.

• Monitored pipeline performance and data quality, ensuring robustness and scalability. Streamlined daily operational workflows with automated data refresh and scoring, reducing manual efforts by 40%. Improved customer retention by 23% through targeted retention campaigns. Tata Consultancy Services Ltd Senior Associate Jan 2017 – Dec 2019 Sentiment Analysis

• Provided Sentiment analysis about the feedback given by customers and importantly provided solutions for the dates that the client wants. By means, clients will check the improvement of the product towards the end user and take necessary steps in case of an increase in negative feedback.

• Collected and pre-processed raw review text using advanced NLP techniques including noise removal, tokenization, stop word removal, lemmatization, and handling negations to enhance data quality. Developed feature extraction techniques using Bag-of-Words and TF-IDF vectorization, resulting in richer text representations for model training.

• Created separate Word Cloud visualizations for positive and negative sentiments to identify dominant keywords and key themes, facilitating business strategy decisions. Applied topic modeling (LDA) to extract key themes from customer reviews and visualized topics using pyLDAvis.

• Implemented and compared machine learning algorithms including Naive Bayes, Logistic Regression for multi-class sentiment classification. Conducted hyper parameter tuning and cross-validation using GridSearchCV to optimize model performance. Evaluated models with metrics such as accuracy, precision, recall, F1-score, and confusion matrix for detailed performance analysis to refine models.

• Presented insights and recommendations to stakeholders through comprehensive visual reports. Collaborated with project stakeholders and support partners to identify needs, goals and business requirements.

• Improved customer satisfaction by 15% through early detection of negative feedback and enabling proactive customer support. Increased sales conversion rates by 10% by providing sentiment-driven insights to optimize marketing messaging and product descriptions. Price Prediction

• Built a predictive modeling system using python to estimate prices. Performed data cleaning, feature engineering and EDA to analyze key factors affecting resale value.

• Trained and compared multiple regression models (Linear Regression, Random Forest, XGBoost). Used grid search and cross-validation to optimize the hyper parameters. Confidential-Internal

• Created visualizations to provide stakeholders with actionable insights into pricing trends. Collaborated with DevOps team to integrate CI/CD pipelines for model retraining and deployment using GitHub Actions.

• The final model has provided optimal prices based on the vehicle condition with 89% accuracy and reduced manual pricing time by 85%.

EDUCATION:

Sri Krishna College of Technology Bachelor of Technology (B.tech-IT) Aug 2010-May 2014 CGPA - 82% CERTIFICATIONS:

• Achieved the “Microsoft Certified: Azure Data Scientist Associate (DP-100)” certification in advanced AI techniques within Azure cloud environments to solve complex data problems.

• Certified in “Azure Data Fundamentals (DP-900)”, showcasing proficiency in foundational cloud and data concepts to support modern digital solutions.

• Completed the "Applied Data Science with Python" certification from the University of Michigan through Coursera, equipping with advanced data analysis and machine learning skills. AWARDS AND RECOGNITIONS:

• Recognized as a “Best Team Player” in the organization for devising machine learning solutions that enhanced operational efficiency and business impact.

• Received “Client Excellence Award” for outstanding performance.

Contact this candidate