Machine Learning Data Scientist

Location:

Atlanta, GA

Posted:

February 19, 2025

Contact this candidate

Resume:

Akhil Devang

Senior Machine Learning Engineer (Gen AI)

Phone: +1-404-***-****

Email: ************@*****.***

PROFESSIONAL SUMMARY:

9+ years of professional experience as a Data Scientist, specializing in predictive analytics, time series forecasting, natural language processing (NLP), Generative AI (Gen AI), Large Language Modelling (LLMs) and advanced machine learning models to deliver data-driven solutions across industries including healthcare, Finance, and fraud analysis.

Expertise in end-to-end machine learning model lifecycle, including data acquisition, preprocessing, feature engineering, model building, evaluation, and deployment into scalable production systems using cloud platforms (AWS, Azure).

Skilled in time series analysis and forecasting techniques (SARIMA, Prophet, LSTM), with proven success in optimizing operational efficiency and resource allocation, delivering an 80% improvement in scheduling accuracy.

Proficient in NLP and deep learning frameworks, including BERT, GPT, LangChain, Transformers, and text summarization, with applications in clinical note processing, fraud detection, and customer engagement analytics.

Proven ability to apply Generative AI (LLMs) for document summarization, predictive analytics, and chatbot development. Leveraged LangChain and fine-tuned OpenAI models to build Retrieval-Augmented Generation (RAG) solutions, integrating vector databases (FAISS, Pinecone, Weaviate, Chroma) for efficient document retrieval and question-answering systems, enhancing knowledge discovery and automation in real-world business scenarios.

Developed and deployed cutting-edge machine learning models (Regression, Random Forest, XGBoost, Neural Networks) to solve real-world problems, achieving high accuracy (90%+) and generating business value exceeding $10M in savings.

Advanced data visualization skills, using Tableau, Power BI, DOMO, and Python libraries like Matplotlib and Seaborn to build interactive dashboards and communicate insights effectively to stakeholders.

Hands-on experience in cloud platforms (AWS, Azure ML Studio, Databricks) for scalable model deployment and big data processing, leveraging tools like SageMaker, Lambda, Snowflake, Azure ML Studio, and Databricks for efficient workflows.

Strong foundation in statistical methodologies (ANOVA, hypothesis testing, clustering, PCA, descriptive analytics) and expertise in handling structured and unstructured datasets for comprehensive data analysis.

Demonstrated leadership in collaborating with cross-functional teams to deliver actionable insights, streamline operations, and enhance decision-making efficiency by over 40% across multiple projects.

Recognized for automation and optimization efforts, including building Python-based solutions for OCR, ETL, and NLP workflows, improving efficiency by up to 75%.

Proficient in Python, SQL, R, and cloud-based data engineering tools, with certifications in Power BI, SQL, and NLP, ensuring up-to-date knowledge and expertise in modern data science practices.

In-depth knowledge of Snowflake Database, Schema, and Table Structures Experience in writing Spark programs in Scala and Python to process large datasets.

Expert knowledge in machine learning algorithms such as Ensemble Methods Linear, Polynomial, Logistic Regression, Regularized Linear Regression, SVMs, Neural Networks, Extreme Gradient Boosting, Decision Trees, K-Means, Gaussian Mixture Models, Hierarchical models, Naïve Bayes.

TECHNICAL SKILLS:

Programming Languages

Python, SQL, R, Pyspark.

Python Libraries

Pandas, NumPy, Scikit-learn, TensorFlow, Keras, PyTorch, NLTK, Gensim, Matplotlib, Seaborn.

Machine Learning

Linear-Logistic Regression, Decision Trees, Support Vector Machines, KNN, Random Forest, Clustering (K-means), Ensemble Modelling Techniques (Boosting, Stacking, Bagging), XGBoost, LightGBM, AdaBoost, Deep Learning, Neural Networks,LSTM, CNN, RNN, NLP, SARIMA, ARIMA, Prophet, OCR.

Generative AI

LLMs (GPT, BERT, BART), Open AI, Llama 2, LangChain, Hugging Face Transformers, Text Summarization, RAG (Retrieval-Augmented Generation) Agentic AI.

Data Visualization

Microsoft Power BI, Tableau, DOMO.

Databases

Relational Databases (SQL Server, MySQL, PostgreSQL), NoSQL Databases (MongoDB, DynamoDB), Snowflake, Redshift

Cloud Services

AWS (SageMaker, Lambda, Glue, EC2, S3), Azure ML Studio, Databricks

Developer Tools

Jupyter Notebook, Alteryx, Git, VScode, Anaconda, Eclipse, Docker, Azure Data Studio, Salesforce.

PROFESSIONAL SUMMARY:

Elevance Health - Atlanta, Georgia Sept 2023 - Present

Generative AI Engineer

Responsibilities:

Designed and deployed a Retrieval-Augmented Generation (RAG)-based Q&A chatbot leveraging LangChain and Open AI’s GPT 3.5 Turbo (LLM) to assist internal users in accessing claim processing status and domain-specific knowledge. This tool reduced query resolution times by 50% and improved employee productivity.

Developed a clinical note summarization pipeline using LLMs (e.g., BERT) to automate the extraction of key insights from unstructured medical documents, reducing manual review efforts by 40% and enhancing claim validation processes.

Built anomaly detection frameworks using Isolation Forest and DBSCAN in Azure ML Studio, identifying irregular claim patterns to support fraud detection and risk mitigation efforts.

Engineered a claim price prediction model using Random Forest and Decision Trees with hyperparameter tuning, achieving an R of 0.92, which reduced claim discrepancies and saved the organization millions annually in overpayments.

Employed SHAP and LIME for model interpretability, creating visually intuitive reports that bridged the gap between technical outputs and strategic business decisions, empowering leadership with actionable insights.

Created interactive Tableau dashboards to visualize claim trends, highlight overpayment risks, and uncover inefficiencies across reimbursement workflows, driving data-informed decisions for cost optimization.

Designed ETL pipelines using Databricks Data Studio workflows, integrating claim data from Snowflake and SQL Server to streamline data ingestion, transformation, and storage, ensuring a seamless real-time analytics pipeline.

Collaborated with cross-functional teams to develop Gen AI Based Solutions and advanced analytics frameworks, improving decision-making processes for claims management by 30%.

Deployed and monitored ML models in Azure ML Studio, ensuring scalability, reliability, and consistent model performance in production.

Applied advanced NLP techniques (TF-IDF, embeddings, and topic modeling) to analyze claims and reimbursement data, delivering insights that improved operational efficiency across departments.

Enhanced data storytelling through custom visualizations in Tableau and Python (using Matplotlib, Seaborn, Plotly), tailored to claims data, overpayment risks, and provider trends.

Utilized Databricks Data Studio for big data processing and ETL pipeline automation, enabling real-time insights and efficient model deployment.

Streamlined claims data processing using Data Studio workflows in Databricks, leveraging notebooks, workflow scheduling, and query optimization to improve data accessibility for stakeholders.

Mentored junior data scientists and analysts in advanced machine learning techniques, LLMs, Gen AI, data pipeline optimization, and visualization best practices, fostering a culture of innovation and knowledge sharing.

SoFi - Charlotte, North Carolina June 2020 - Aug 2023

AI/ML Engineer

Responsibilities:

Built credit risk models (XGBoost, LightGBM, Random Forest) to predict loan default, improving risk assessment accuracy by 15%.

Deployed a Generative AI-powered Topic Modeling system for analyzing customer transcripts, fine-tuned using Retrieval-Augmented Generation (RAG), achieving 80%+ accuracy in generated tags to categorize customer concerns and intent.

Automated document generation pipelines, reducing processing time by 50%, enhancing loan documentation formatting, content generation, and regulatory compliance using QLoRA and LoRA techniques for model quantization.

Developed an internal STAT tool with a Gradio-powered UI, enabling stakeholders to perform text summarization, Bag of Words analysis, and interact with a ChatBot for quick access to lending policies and risk assessment insights.

Built an LLM-driven financial assistant using GPT-4 to assist customers with real-time loan eligibility checks, personalized financial recommendations, and regulatory inquiries.

Created Implemented an entity recognition model using spaCy to extract key financial details from customer-submitted documents, automating income verification and reducing manual processing time by 40%.

Optimized GenAI model deployment time by 90%, reducing runtime from 5 hours to 45 minutes by implementing multiprocessing in credit risk modeling and document automation pipelines.

Developed AI-driven anomaly detection models using AutoEncoders and Isolation Forest, detecting fraudulent applications and reducing financial risk exposure by 20%.

Designed a scalable AI model deployment pipeline using AWS SageMaker and Docker, ensuring real-time inference for underwriting, risk modeling, and financial fraud detection.

Ensured AI compliance with financial regulations, fine-tuning NLP models for Fair Lending and Model Risk Management (MRM) to minimize bias and align AI decisions with SoFi’s responsible lending policies.

MetLife - New York City, NY Feb 2018 - May 2020

Data Scientist

Responsibilities:

Developed and deployed machine learning models, including Logistic Regression and Random Forest, to predict policy lapse risks and identify high-risk claims, improving customer retention by 15%.

Conducted in-depth analysis on claims and premium datasets, leveraging SQL to extract and preprocess data, ensuring clean and structured datasets for modeling.

Built predictive models to forecast claims processing times, enabling faster resource allocation and reducing operational delays by 20%.

Applied advanced feature engineering techniques (e.g., handling missing values, encoding categorical features) to prepare datasets for supervised learning models.

Designed dynamic Power BI dashboards to visualize key performance metrics, including claims settlement rates, customer retention trends, and fraud detection insights, empowering leadership with actionable insights.

Implemented classification models to detect fraudulent claims, achieving a precision score of 90%, which helped reduce financial losses in the claims processing workflow.

Created and validated SQL queries to aggregate and analyze customer policy data, ensuring robust data pipelines to support machine learning workflows.

Conducted exploratory data analysis (EDA) to uncover insights in structured and unstructured data, driving feature selection and model optimization.

Collaborated with cross-functional teams, including actuarial and underwriting departments, to tailor machine learning models to MetLife’s business goals and insurance datasets.

Performed hyperparameter tuning using grid search and cross-validation to optimize model performance for predicting customer churn and policy renewals.

Monitored model performance post-deployment and retrained models as necessary to maintain prediction accuracy over time.

Conducted root cause analysis of data inconsistencies in claims datasets, ensuring clean and reliable inputs for downstream machine learning applications.

Developed data pipelines using SQL for seamless integration of claims and customer policy data into machine learning workflows.

Delivered impactful presentations and reports to stakeholders, translating machine learning outcomes into actionable business strategies to enhance claims processing and customer engagement.

Innova Solutions – Hyderabad, India June 2015 - Oct 2017

Data Analyst

Responsibilities:

Conducted data preprocessing and exploratory data analysis (EDA) on structured and unstructured datasets, identifying patterns, trends, and anomalies to support decision-making.

Built and validated SQL queries to extract and aggregate data, supporting the development of predictive and analytical models.

Developed basic classification and regression models to predict customer behaviors and identify trends in policy and claims datasets, achieving measurable business insights.

Created visually impactful dashboards in Excel and Tableau to present key metrics such as customer retention rates, policy performance, and claims trends, enabling data-driven decisions.

Supported the development of a data warehouse using Star Schema and Snowflake Schema, enabling efficient querying and reporting for business intelligence applications.

Applied statistical techniques, such as descriptive statistics and hypothesis testing, to analyze customer data and identify actionable insights for business improvements.

Conducted data cleaning tasks, including handling missing values and outliers, ensuring data quality for downstream analytics and machine learning workflows.

Collaborated with cross-functional teams to understand business requirements and deliver analytics solutions tailored to customer and policy data.

Created and optimized stored procedures, functions, and views to support data analysis workflows and model deployment.

Automated repetitive data preparation tasks using SQL scripts, improving efficiency and reducing manual intervention.

Developed predictive insights into customer policies and claims trends, providing actionable intelligence to improve business performance.

Documented project workflows, including data preparation, analysis, and model development, ensuring transparency and reproducibility of results.

Education:

University of Hyderabad, India June 2011 - May 2015

Bachelor of Engineering, Electronics

Contact this candidate