Data Scientist Machine Learning

Location:

Atlanta, GA

Posted:

July 23, 2025

Contact this candidate

Resume:

BALANJANI KAMASANI

+1-636-***-****

*.*********@*****.***

https://www.linkedin.com/in/balanjani-kamasani-937058104/

Professional Summary

Around 8 years of experience in Data Scientist and AI/ML engineering, and data analytics across diverse industries, including e-commerce, healthcare, financial services, and technology.

Led the development and deployment of Generative AI models and LLMs for demand forecasting, fraud detection, and inventory optimization, driving 20% improvements in business operations and reducing costs.

Expert in designing scalable ETL processes and data pipelines using Apache Nifi, Talend, Apache Kafka, and Airflow, improving data accuracy, reducing manual effort by 40%, and streamlining operations.

Skilled in leveraging cloud services such as AWS (S3, SageMaker, RDS) and Azure ML for scalable data processing, model deployment, and cost-efficient solutions.

Developed and deployed predictive models and machine learning algorithms for fraud detection, risk management, supply chain optimization, and customer segmentation, improving business performance by 25%.

Utilized NLP techniques including Hugging Face Transformers, LSTM, and NLTK for text classification, sentiment analysis, and sequence-to-sequence modeling, driving insights from unstructured data.

Proficient in building real-time dashboards with Tableau and Power BI, providing stakeholders with actionable insights and improving decision-making by 20%.

Applied advanced statistical methods and machine learning techniques, including Random Forest, Gradient Boosting, KNN, and ARIMA, to address complex business problems and enhance model accuracy.

Developed APIs using Flask and Django, enabling seamless communication between front-end and back-end systems and improving system integration and user experience.

Expertise in database management with SQL, PostgreSQL, and MySQL, improving query performance, optimizing data warehousing solutions, and supporting analytics at scale.

Demonstrated leadership skills by mentoring junior team members, leading cross-functional teams, and aligning technical solutions with business objectives.

Proven experience in optimizing algorithm performance using hyperparameter tuning, reducing model computation time by 25%, and improving model accuracy.

Extensive experience with Hadoop and Apache Spark for large-scale data processing, improving data pipeline performance, and ensuring real-time insights for business operations.

Professional Experience

Humana, Atlanta, Georgia (Remote) Sept 2023 - Till Date

Role: Data Scientist (Gen AI)

Responsibilities:

Designed, developed, and fine-tuned LLM-powered applications utilizing platforms including OpenAI GPT-4, Anthropic Claude, and Hugging Face models, demonstrating expertise in transformer architectures and LLM APIs.

Constructed and implemented complex agentic workflows, prompt chains, and Retrieval-Augmented Generation (RAG) systems to enhance AI application performance and relevance.

Integrated Generative AI capabilities into existing enterprise-grade software and microservices using Python and JavaScript/TypeScript, ensuring seamless functionality and scalability.

Collaborated cross-functionally with product, data, and engineering teams to translate business requirements into technical specifications for Gen AI solutions, driving successful project delivery.

Developed and maintained robust APIs, scalable data pipelines, and backend services leveraging Gen AI models deployed on AWS, Azure, and GCP cloud platforms.

Optimized model inference performance and system latency for Gen AI applications, achieving significant improvements in efficiency and user experience.

Ensured the development and deployment of safe, ethical, and compliant AI technologies, incorporating best practices for data privacy and bias mitigation within Generative AI solutions.

Utilized vector databases (e.g., Pinecone, FAISS, Weaviate) and embeddings for efficient data retrieval within RAG systems and implemented DevOps practices for CI/CD pipelines.

Environment: Python, JavaScript/TypeScript, OpenAI GPT-4/API, Anthropic Claude, Hugging Face, AWS, Azure, GCP, Pinecone, FAISS, Weaviate, LLM APIs, Transformer Architectures, Prompt Engineering, RAG Systems, DevOps, Microservices.

Bank Of America, Charlotte, NC, Remote Feb 2022 – Aug 2023

Role: Data Scientist with AI/ML

Responsibilities:

Led the development of a machine learning-based fraud detection and claims processing automation system for State Farm, improving fraud detection accuracy by 25% and reducing manual claims processing time by 30%.

Collaborated with State Farm’s claims and legal teams to identify key fraud indicators and define business requirements, ensuring the fraud detection model aligned with business goals and legal compliance.

Analyzed historical claims data, customer interactions, and fraud patterns using SQL and Python, creating detailed specifications for the predictive model and claims classification systems.

Designed the architecture for the fraud detection and claims classification system using machine learning models, integrating TensorFlow, Keras, and LSTM/GRU networks to process time-series claims data and capture fraud patterns.

Architected scalable, real-time ETL pipelines with Apache Airflow on AWS to extract, transform, and load data from claims databases, third-party systems, and customer interactions.

Developed a predictive fraud detection model using Python, scikit-learn, and advanced deep learning techniques, improving fraud detection accuracy by 25% and reducing false positives by 15%.

Implemented feature engineering techniques like recursive feature elimination (RFE), one-hot encoding, and feature scaling, reducing feature dimensionality by 20% and improving model interpretability for claims adjusters.

Built and deployed real-time NLP models using Hugging Face Transformers and NLTK to process unstructured textual claims data, improving text classification accuracy and fraud detection by 20%.

Deployed fraud detection models using Docker and Kubernetes on AWS, ensuring high availability and scalability with 99.9% uptime across State Farm’s cloud infrastructure.

Applied machine learning algorithms, including Linear Regression, SVM, KNN, Naive Bayes, Logistic Regression, Random Forest, and Boosting, comparing their performance and selecting the optimal model based on evaluation metrics.

Utilized clustering techniques like K-means and Hierarchical Clustering to segment claims data, identifying fraud patterns and reducing claims processing time by 15%.

Conducted A/B testing and cross-validation using stratified sampling and k-fold techniques, optimizing the fraud detection model based on F1-score, ROC-AUC, and precision-recall metrics.

Optimized model performance through hyperparameter tuning using grid search and Bayesian optimization, improving prediction accuracy by 20% and reducing model latency by 25%.

Automated ETL pipelines with Apache Airflow on AWS to extract, transform, and load claims data from internal systems and external sources, reducing manual data processing time by 40%.

Integrated model monitoring systems using Jenkins and Prometheus on AWS, establishing alerts for model drift and ensuring timely retraining, maintaining fraud detection accuracy.

Performed load and stress testing on Elasticsearch and PyTorch-based recommendation systems, ensuring they handled high transaction volumes during peak claims processing times.

Created interactive dashboards using Tableau and Power BI, providing real-time insights into fraud detection, claims processing, and model performance for claims adjusters and stakeholders.

Leveraged collaborative filtering and neural networks for personalized communication with customers during the claims process, increasing customer satisfaction by 12% and reducing complaint resolution time by 10%.

Documented the architecture, data flows, and models, ensuring clarity for State Farm’s IT and claims processing teams.

Led training sessions for claims adjusters and IT teams, explaining the use of Tableau and Power BI dashboards, empowering stakeholders to monitor fraud detection and claims processing in real-time.

Environment: Python (scikit-learn, TensorFlow, Keras, Pandas, NumPy, Hugging Face Transformers, NLTK), Apache Airflow, Docker, Kubernetes, SQL, Jenkins, Prometheus, AWS (S3, SageMaker), Apache Spark, Tableau, Microsoft Power BI, R, C, Linear Regression, SVM, KNN, Naive Bayes, Logistic Regression, Random Forest, Boosting, K-means Clustering, Hierarchical Clustering, Collaborative Filtering, Neural Networks, NLP.

Swiggy, Hyderabad, India May 2019 – February 2022

Role: Data Scientist with AI/ML

Responsibilities:

Developed predictive models using Python (scikit-learn) to forecast demand for medical-surgical equipment, reducing stock shortages by 25% and optimizing production schedules for efficiency improvements of 20%.

Executed data cleansing, mining, and transformation using T-SQL and PostgreSQL to integrate data from manufacturing plants and healthcare systems, improving data accuracy by 30% and enabling more precise supply chain forecasting.

Leveraged LSTM Recurrent Neural Networks and ARIMA to analyze time-series data for demand forecasting, achieving 90% accuracy in predicting surgical equipment demand, which reduced production delays by 18%.

Applied Char CNNs to analyze unstructured data from clinical notes and operational logs, resulting in a 15% improvement in identifying potential equipment failures and reducing downtime in manufacturing lines.

Built ensemble models such as Random Forest and Gradient Boosting to optimize inventory management for medical supplies, reducing overstock by 12% and improving supply chain responsiveness.

Deployed machine learning models using Azure ML, validating them with KNN, Logistic Regression, and SVM, achieving a 93% accuracy rate in production and inventory predictions.

Orchestrated automated data pipelines with Apache Airflow, automating the extraction, transformation, and loading (ETL) processes for hospital and manufacturing data, reducing manual data processing time by 35%.

Utilized Hadoop (HDFS, Hive, MapReduce) and Apache Spark for large-scale processing of manufacturing and healthcare data, improving data processing speed by 25% and ensuring real-time decision-making in supply chain management.

Applied text analytics and sentiment analysis using NLTK to analyze customer feedback and internal reports, extracting insights that informed product improvements and led to a 10% increase in customer satisfaction.

Implemented K-means Clustering and Hierarchical Clustering to segment demand patterns and production processes, optimizing manufacturing workflows and reducing idle time by 15%.

Integrated collaborative filtering and neural networks to recommend optimal scheduling for production and personalized inventory management, improving resource allocation by 12% and reducing understock situations.

Enhanced model performance through hyperparameter tuning using grid search and Bayesian optimization, increasing model efficiency by 30% and reducing computation time by 25%.

Created real-time dashboards using Power BI and Tableau to visualize demand forecasts, production schedules, and supply chain bottlenecks, enabling stakeholders to make data-driven decisions with real-time insights.

Conducted A/B testing and cross-validation to evaluate the robustness of machine learning models, ensuring high accuracy in predicting demand and equipment usage.

Monitored and tracked machine learning models using MLflow and scheduled updates with Apache Airflow, ensuring continuous model improvement and 99.9% system uptime.

Regularly updated predictive models to reflect new trends in demand and manufacturing capabilities, maintaining 93% accuracy in forecasts and enhancing production scalability.

Documented all system architectures, data workflows, and model usage, ensuring clear communication with manufacturing teams, IT departments, and stakeholders.

Led training sessions for production and supply chain teams to ensure proper understanding and utilization of the predictive analytics system, increasing overall operational efficiency by 20%.

Environment: T-SQL, PostgreSQL, Python (scikit-learn, TensorFlow, Keras, Pandas, NumPy, NLTK), Azure ML, Power BI, Tableau, Apache Spark, Hadoop (HDFS, Hive, MapReduce), Random Forest, Gradient Boosting, LSTM Recurrent Neural Networks, Char CNNs, KNN, ARIMA, Logistic Regression, SVM, Text Analytics, Sentiment Analysis, K-means Clustering, Hierarchical Clustering, Collaborative Filtering, Neural Networks.

Adobe, Hyderabad, India May 2017 – May 2019

Role: Data Analytics/Python

Responsibilities:

Designed and implemented ETL processes using Apache Nifi and Talend to integrate financial transaction data, operational metrics, and sales performance, ensuring high-quality data for risk analysis and fraud detection.

Optimized data pipelines with Apache Spark, reducing processing time by 40% and enabling real-time financial risk insights for Mastercard’s payment systems.

Developed predictive models using Python (Pandas, NumPy, SciPy) and machine learning techniques to detect financial fraud, improving fraud detection accuracy by 25% and supporting proactive risk management.

Deployed machine learning models on Azure ML, validated with KNN, Logistic Regression, and SVM, achieving a 93% accuracy rate in financial risk predictions and improving fraud prevention capabilities.

Leveraged AWS services (EC2, S3, RDS) for scalable data storage and processing, reducing storage costs by 15% while maintaining high availability for global transaction data.

Automated data cleaning and transformation processes using Python and Bash scripting, reducing data preparation time by 35% and improving data quality for accurate risk modeling.

Developed data warehousing solutions using MySQL and PostgreSQL to store and manage large volumes of financial transaction data, supporting risk analysis and compliance reporting.

Conducted root cause analysis on financial discrepancies and anomalies using Jupyter Notebook, reducing data anomalies by 30% and ensuring the accuracy of financial risk assessments.

Created interactive dashboards using Tableau and Power BI, enhancing stakeholder engagement by 20% by providing real-time insights into key financial and risk metrics.

Implemented continuous integration and deployment workflows using Git and Jenkins, automating the deployment of data pipelines and risk models for seamless updates and real-time insights.

Enhanced database performance by optimizing SQL queries, enabling faster data retrieval for real-time financial risk monitoring and reporting.

Leveraged Hadoop and Apache Hive for distributed processing of large-scale financial datasets, improving the efficiency of data storage and processing for Mastercard’s global network.

Environment: Apache Spark, Apache Nifi, Talend, AWS (EC2, S3, RDS), MySQL, PostgreSQL, Python (Pandas, NumPy, SciPy), Bash scripting, Tableau, Microsoft Power BI, Jupyter Notebook, Git, Jenkins, Hadoop, SAS, MATLAB, Apache Hive.

Areas of Expertise

Languages & Tools

Python (pandas, NumPy, matplotlib, seaborn, scikit-learn, TensorFlow, Keras, Hugging Face Transformers, NLTK, Flask, Django), SQL (T-SQL, MySQL, PostgreSQL, Hive), Bash scripting.

Data Engineering

ETL: Apache Nifi, Talend, Apache Airflow; Data Pipelines: Apache Spark; Data Warehousing: MySQL, PostgreSQL, AWS Redshift; Database Management: T-SQL, Netezza.

Cloud Services

AWS (S3, Redshift, SageMaker, EC2, RDS), Azure ML.

Data Visualization

Tableau, Microsoft Power BI, Jupyter Notebook, Pivot Table, matplotlib, seaborn.

Machine Learning & AI

Machine Learning: scikit-learn, TensorFlow, Keras, Random Forest, Gradient Boosting, KNN, ARIMA, Logistic Regression, SVM; NLP: Hugging Face Transformers, NLTK, Text Analytics, Sentiment Analysis; Deep Learning: LSTM recurrent neural networks, Char CNNs.

DevOps & Collaboration Tools

Git, Docker, Kubernetes, Jenkins, Hadoop, SAS, MATLAB, Apache Hive, Linux

Project Management & Collaboration

Cross-functional team collaboration, defining project requirements and objectives, mentoring junior team members, conducting code reviews, and knowledge-sharing sessions.

Education

MS in Applied Computer Science, Northwest Missouri State University, MO, 3.5 GPA

Contact this candidate