Machine Learning Data Mining

Location:

Chicago, IL

Posted:

October 14, 2025

Contact this candidate

Resume:

SAI SUPRAJA MOTAMARRI

Gen AI/ AI Engineer

Mail: ************@*****.***

Phone:940-***-****

LinkedIn: https://www.linkedin.com/in/sai-supraja-m-775a62272/

SUMMARY

Over 10 years of Experience in Machine Learning, Statistical Modelling, Predictive Modelling, Data Analytics, Data Modelling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP)

Proficient in gathering and analyzing Business Requirements with experience in documenting System Requirement Specifications (SRS) and Functional Requirement Specifications (FRS).

Extensive experience in Text Analytics, developing different Statistical Machine Learning and data mining solutions to various business problems, and generating data visualizations using R, Python, and Tableau.

Experience in data mining using both R and SAS to perform statistical tests on the mined data further.

Experience in multiple linear regression-based analysis using R and understanding the importance of the independent variable being over-dependent.

Experienced in utilizing Neo4j for designing and implementing graph-based databases and relationship management systems

Experienced in data modelling to design and optimize database structures for efficient data management and analysis.

Leverage demand planning and supply chain expertise to drive data-driven insights and optimize inventory management, enhancing operational efficiency and forecasting accuracy.

Experience in agile and waterfall project management methodologies to produce high-quality deliverables that meet or exceed timeline and budgetary targets.

Experienced in implementing ML Ops/AI Ops pipelines to automate and streamline machine learning workflows and model deployments.

Proficient in managing LLM (Large Language Models) from training to deployment, ensuring optimal performance and scalability.

Developed LLM-powered tools to generate and validate structured XML/HTML content from unstructured inputs using schema-constrained prompting and LangChain agents.

Skilled in developing and fine-tuning Gen AI models for various applications, including natural language processing and image generation.

Proficient in leveraging Elasticsearch for building scalable search engines and optimizing data indexing and querying

Designed and implemented natural language processing (NLP) models to improve language understanding and generation tasks, enhancing AI-driven communication systems.

Performed market research and segmentation analysis using R, SAS, and Excel to get the potential target customers and identify the market trends and opportunities.

Utilized R Studio to apply regression and classification methods (such as Random Forest) to predict popularity; evaluated the performance of different models and explored the influential rank of factors for customer survey.

Knowledge of calculating and understanding the time complexities of the different algorithms used in machine learning

Experience with machine learning algorithms, such as KNN (K nearest neighbour), Random Forest, Decision tree, Factor analysis, etc.

Hands-on experience in data visualization and desktop building tools like Tableau.

Expertise in using various SAS report-generating procedures like PROC REPORT, PROC SQL, PROC FREQ, PROC MEANS, PROC TABULATE, PROC TRANSPOSE and PROC PRINT.

Experience in handling client meetings for updating current projects and discussing further projects.

Have strong experience working with Data which includes various Healthcare claims and Membership.

Experience in Base SAS, SAS Enterprise Guide, Enterprise Miner, SAS/AML, SAS/Stat, SAS Macros, SAS/Graph, SAS/Connect, and SAS/Access on Windows and Unix platforms and competence in carrying out statistical analysis of large-scale data management.

Modified existing SAS programs and created new SAS programs using SAS Macros to improve ease and speed of modification as well as consistency of results.

Experienced in fine-tuning and optimizing LLMs (LoRA/QLoRA) for banking, healthcare, and enterprise use cases.

Strong background in Python, PyTorch, TensorFlow, SQL, and cloud-native AI architectures.

XML/JSON schema design, semantic markup integration, metadata tagging for document intelligence.

Experience with exporting SAS Results to different formats such as XML, PDF, and Excel using SAS/Export, and SAS/ODS for reporting and presentation.

EDUCATION

Bachelors in computer science

TECHNICAL SKILLS

Programming Languages: Python, SQL, Pyspark

Querying languages: Spark SQL, MySQL, Microsoft SQL.

GenAI&Machine Learning: LLMs, LangChain, RAG, Scikit-learn, Keras, TensorFlow, Numpy, Pandas, ggplot2, Scrapy, Stats Models, Logistic Regression, Naive Bayes, Decision Tree etc.

Visualization Tools: Tableau, Power BI, Python - Matplotlib, Seaborn, Plotly

Databases: Azure Data Lake Gen2, MySQL, Oracle,

IDE Tools: Databricks, Visual Studio, Jupyter Notebook

Deployment Tools: Azure DevOps, Anaconda Enterprise, GitHub

SDLC Methodologies: Agile, Scrum, Waterfall

Project Management: JIRA, Azure DevOps Boards

Cloud: Azure Cloud, AWS Cloud,

NoSQL Databases: HBase, Cassandra, MongoDB.

Version control: GitHub, Jenkins.

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns.

PROFESSIONAL EXPERIENCE

Client: US Bank, Chicago, IL / Role: GenAI Engineer / February 2023 – To Date

Project Description: Designed and deployed LLM-based automation for document summarization, semantic search, and intelligent chatbots using LangChain, RAG, and Hugging Face. Built scalable ML pipelines on Databricks with PySpark and Python, integrating Neo4j for graph-based insights and Elasticsearch for real-time analytics. Optimized LLMs using LoRA/QLoRA and prompt engineering to enhance accuracy, reduce hallucinations, and support banking-specific use cases.

Responsibilities:

Worked with several R packages including, dplyr, Causal Infer, and Space-Time.

Coded R functions to interface with Deep Learning Framework.

Adept at integrating ML Ops/AI Ops practices to monitor and maintain AI systems in production environments.

Expertise in optimizing LLM models for specific tasks, enhancing their accuracy and efficiency.

Knowledgeable in applying Gen AI techniques to create innovative solutions, leveraging advanced machine learning algorithms.

Leveraged GPT-3 APIs (OpenAI beta access) to build enterprise knowledge assistants for customer service and internal query resolution.

Integrated GPT-3 services with business applications (CRM, ticketing tools) via secure API connectors.

Developed scalable GPT-3 applications via Azure Functions, AWS Lambda, and Kubernetes microservices.

Experienced in developing and deploying Large Language Models (LLMs) for a range of NLP tasks, including text generation and understanding.

Proficient in prompt engineering to optimize LLM performance and achieve desired outcomes for specific applications.

Expert in data modelling techniques, including conceptual, logical, and physical models, to support scalable and efficient data architectures

Skilled in fine-tuning and benchmarking Large Language Models to enhance their accuracy, relevance, and efficiency.

Skilled in using Neo4j to model complex data Management relationships and perform efficient graph queries.

Adept at using Lang Chain and LlamaIndex frameworks to build, manage, and integrate LLMs into scalable solutions.

Utilize Strong data modelling to develop and refine data structures that enable seamless data integration, analysis, and reporting.

Knowledgeable in leveraging prompt engineering techniques to tailor LLM responses and improve interaction quality.

Developed and deployed scalable machine learning models on the Databricks platform using PySpark and Python.

Implemented RAG (Retrieval-Augmented Generation) technologies to enhance model accuracy in natural language processing tasks.

Skilled in data modelling for designing normalized and denormalized schemas to meet diverse business requirements and performance objectives.

Automated data management processing pipelines and model training scripts using Python for deep learning applications.

Designed and optimized deep learning models leveraging Python, TensorFlow, and Databricks for large-scale data analysis.

Designed and fine-tuned LLMs (Large Language Models) using techniques like LoRA, QLoRA for task-specific performance enhancements.

Experienced in evaluating and comparing LLM performance using comprehensive benchmarking methods to ensure optimal model effectiveness.

Proficient in developing and deploying machine learning models using PyTorch for various AI applications.

Experienced in leveraging TensorFlow for creating scalable deep learning models and neural networks.

Skilled in utilizing Hugging Face Transformers for implementing state-of-the-art NLP models and fine-tuning tasks.

Hands-on experience with Elasticsearch for real-time search and analytics across large datasets.

Data Manipulation and Aggregation from a different source using Nexus, Business Objects, Toad, Power BI, and Smart View.

Developed and deployed scalable machine learning models using Databricks on Azure, optimizing data pipelines and workflows.

Built AI-powered chatbots and virtual assistants using LLMs with LangChain and OpenAI APIs for intelligent automation.

Engineered prompt optimization strategies to enhance LLM accuracy and reduce hallucination risks in generative AI applications.

Leveraged vector databases for semantic search and efficient retrieval in generative AI applications.

Implemented fine-tuned LLM-based document summarization and automated content generation solutions.

Utilized Hugging Face models and transformers for pre-training and fine-tuning domain-specific LLMs.

Implemented Retrieval-Augmented Generation (RAG) technology for enhancing natural language understanding and generation models.

Wrote efficient Python scripts to automate data processing tasks, model training, and deployment workflows.

Used Machine Learning, Time Series, and Natural Language Processing (NLP) to build an intelligent supply chain management system in Python.

Implemented Agile Methodology for building an internal application.

Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.

Experienced in data modelling for both OLTP and OLAP systems, ensuring effective data storage, retrieval, and analytical capabilities.

Set up storage and data analysis tools in Amazon Web Services cloud computing infrastructure.

Implemented end-to-end systems for Data Analytics and Data Automation and integrated with custom visualization tools using R, Hadoop, and MongoDB.

Experienced in implementing cutting-edge AI models with PyTorch, ensuring high performance and scalability.

Architected and optimized end-to-end data pipelines on Databricks for real-time analytics and machine learning.

Leveraged RAG frameworks to enhance the accuracy of AI models by integrating external knowledge sources.

Developed AI models combining text, image, and audio processing using transformer architectures.

Designed custom embeddings for semantic similarity analysis in GenAI-powered recommendation systems.

Developed AI-powered code-generation tools using GenAI models like Codex and Star Coder for software development acceleration.

Utilized Python scripting for automating ETL processes and improving data pipeline efficiency.

Skilled in deploying TensorFlow models in production environments, optimizing for speed and accuracy.

Used Spark Data frames, Spark-SQL, and Spark MLLib extensively and developed and designed POCs using Scala, Spark SQL, and MLLib libraries.

Contributed to open-source GenAI tooling, including LangChain agents, prompt templates, and benchmarking utilities for LLM evaluation.

Authored internal whitepapers on LLM benchmarking, hallucination mitigation, and prompt engineering strategies for enterprise GenAI adoption.

Shared technical insights on GenAI orchestration and model evaluation through blog posts and internal knowledge-sharing sessions.

Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.

Experienced in configuring Elasticsearch clusters for high availability and fault tolerance.

Extensively worked on data modelling tools Erwin Data Modeler to design the Data Models.

Participated in all phases of data mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization, and Performed Gap Analysis.

Integrated Databricks ML workflows with Azure for end-to-end machine learning lifecycle management.

Utilized RAG techniques to improve contextual understanding and response generation in AI systems.

Created Python scripts for automating data ingestion, transformation, and model training in cloud environments.

Applied deep learning techniques to process unstructured data, achieving high accuracy in predictive analytics models.

Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Power BI, and SmartView.

Implemented Agile Methodology for building an internal application.

Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.

Environment: AWS, R, Informatica, Python, PyTorch, Hugging Face, LangChain, LLMs, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, MS Vision, Map-Reduce, SQL, t-tests, Databrick, RAG, Python scripting, Deep learning, ANOVA and MongoDB.

Client: Cigna, Atlanta, GA / Role: AI Data Scientist /December 2020 – February 2023

Project Description: Developed LLM-powered chatbots and RAG-driven retrieval systems to automate healthcare member interactions. Integrated Hugging Face Transformers and LangChain to handle claims, benefits, and policy queries. Improved response accuracy and reduced manual support overhead, enhancing member satisfaction.

Responsibilities:

Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.

Responsible for the design and development of advanced R/Python programs to prepare to transform and harmonize data management sets in preparation for Modelling.

Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.

Specialized in data modelling to create robust data architectures, supporting business intelligence, predictive analytics, and operational reporting.

Proficient in utilizing ML Ops/AI Ops frameworks to ensure continuous integration, delivery, and monitoring of AI models.

Designed and fine-tuned GPT-2 models for text summarization, intent classification, and FAQ generation.

Used conversational chatbots for customer support using GPT-2 with rule-based fallback logic.

Developed document summarization pipelines (news, policy documents) using Hugging Face Transformers + GPT-2.

Developed AI models that integrate text, image, and audio data for enhanced content generation and analytics.

Implemented federated learning techniques for decentralized AI model training, ensuring data privacy and security compliance.

Optimized Databricks clusters for large-scale deep learning tasks, improving computational efficiency.

Leveraged Python for fine-tuning deep learning models and conducting hyperparameter optimization.

Developed RAG-based solutions to enhance document retrieval and question-answering capabilities.

Used Databricks to implement distributed deep learning models, reducing training time across large datasets.

Utilized Neo4j to optimize recommendation systems and social network analysis

Proficient in deploying Large Language Models (LLMs) to solve complex natural language processing challenges and deliver scalable AI solutions.

Skilled in applying advanced prompt engineering strategies to enhance the responsiveness and accuracy of LLMs in various contexts.

Implemented advanced machine learning algorithms on Databricks for large-scale data processing and model deployment.

Developed Python-based solutions for data preprocessing, feature engineering, and model optimization in Databricks.

Integrated RAGs workflows into existing AI systems to improve content retrieval and response generation.

Experienced in fine-tuning Large Language Models to adapt to specific domains and improve their performance through iterative testing and refinement.

Knowledgeable in performing rigorous benchmarking of LLMs to assess their capabilities, reliability, and alignment with business objectives.

Designed AI/ML warehousing solutions using vector databases, semantic indexing, and metadata-driven retrieval for GenAI applications.

Adept in data modelling for designing relational and dimensional data structures that streamline data flow and enhance query performance

Expertise in optimizing LLM performance through targeted fine-tuning and benchmarking, leveraging LangChain and Llama Index for advanced model management.

Experienced in deploying LLM models on scalable infrastructure, optimizing them for diverse real-world applications.

Adept at leveraging Gen AI for developing creative and adaptive AI solutions across various domains.

Adept at building and training custom neural networks using PyTorch for complex data-driven projects.

Expertise in using TensorFlow for designing and optimizing deep learning pipelines and architectures.

Proficient in integrating Hugging Face Transformers for advanced NLP solutions, including text classification and generation tasks.

Deployed deep learning models on Databricks to accelerate data analysis and model inference.

Applied RAG techniques to bridge structured and unstructured data for enhanced AI-driven insights.

Developed and optimized Python-based machine learning pipelines on Databricks for real-time analytics.

Built deep learning models using Python libraries like TensorFlow and PyTorch for image and text classification.

Experienced in implementing cutting-edge AI models with PyTorch, ensuring high performance and scalability.

Expert in Elasticsearch tuning for performance improvement in distributed search applications

Skilled in deploying TensorFlow models in production environments, optimizing for speed and accuracy.

Knowledgeable in applying Hugging Face Transformers for transfer learning and model fine-tuning on diverse datasets.

Proficient in using data modelling tools and techniques to define data relationships, constraints, and hierarchies for optimized database design

Used NLP to improve the flexibility of the supply chain management system by leveraging user search historical data and adding voice search to the search interface.

Expertise in fine-tuning LLM models to meet specific business requirements, ensuring they deliver high-quality outputs.

Extensive experience in data modelling for developing and maintaining enterprise data warehouses, ensuring data accuracy and accessibility.

Knowledgeable in integrating Gen AI with existing systems, driving innovation, and improving user experiences.

Implemented self-supervised learning techniques to enhance unsupervised learning performance for GenAI models.

Proficient in creating custom Elasticsearch queries to support fast, full-text searches across multiple data sources

Implemented Neo4j for advanced graph analytics, enhancing insights into connected data

Designed the prototype of the Data mart and documented possible outcomes from it for end-users.

Environment: Python, MDM, MLLib, PL/SQL, Tableau, PyTorch, TensorFlow and Hugging Face Transformers, SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Cassandra, SQL, PL/SQL, SSRS, Informatica, Spark, Azure, R Studio, MongoDB, JAVA, HIVE.

Client: State Of Utah, Salt Lake City, Utah / Role: Data Science Analyst/ September 2017 – December 2020

Project Description: Developed predictive and forecasting models using Python, R, SQL, and QlikView to support statewide analytics and reporting. Built data models and dashboards to improve data accuracy, governance, and decision-making. Delivered actionable insights that enhanced policy planning and operational efficiency across departments.

Responsibilities:

Participated in meetings to coordinate with the Marketing and CRM team to gather data and discuss possible marketing strategies.

Designed data models and prepared data samples for machine learning applications.

Worked with Data extraction and Data engineers on normalization and analytical processes.

Work in an Agile manner with the data engineers to understand and discover the potential business value of new and existing data sets.

Collaborated with Data Offices and cross-functional teams to define, validate, and improve data governance practices.

Worked in Agile environments, providing continuous feedback to development teams and surfacing defects early.

Supported sales forecasting & planning team by improving time series & principal component analysis.

Utilized machine learning techniques for predictions & forecasting based on the Sales training data.

Executed overall data aggregation/alignment & process improvement reporting within the sales dept.

Managed Data quality & integrity using skills in Data Warehousing, Databases & ETL.

Monitored and maintained elevated levels of data analytic quality, accuracy, and process consistency.

Assisted sales management in data modelling.

Ensured on-time execution and implementation of sales planning analysis and reporting objectives.

Worked with sales management team to refine predictive methods & sales planning analytical process.

Executed and monitored the accuracy and efficiency for sales forecasts & reporting.

Prepared Dashboards using calculations, parameters in QlikView.

Supported consistent implementation of company reporting and sales process initiatives.

Used Python to identify customer classification, tree map, and regression models.

Performed forecasting and time series analysis of customer likes and dislikes.

Environment: R, QlikView, Python, Machine Learning, SQL

Client: Syntel Private LTD, India / Role: Data Analyst/August 2013 – September 2016

Project Description: Developed data processing and analytics solutions using Hadoop, Hive, and Python to support health insurance and claims analysis. Built dashboards and reports in Tableau and Excel to improve fraud detection, KPI tracking, and business decision-making.

Responsibilities:

Involved in analysis of report design requirements and actively participated and interacted with Project Lead, Technical Manager and Lead business analyst to understand the specifications of the operations.

Writing complex queries to extract data from the data warehouse to perform the analysis.

Formulated KPIs to generate Weekly/Monthly performance reports for internal departments by identifying the dimensions, measures and measure values.

Characterizing false positives and false negatives to improve a model for predicting overpaid claims, Consumer segmentation and characterization to predict behaviour.

Outlier detection using high-dimensional historical data.

Analysing promoters and detractors (defined using Net Promoter Score).

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Python for data cleaning and preprocessing.

Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.

Experienced in managing and reviewing Hadoop log files.

Involved in writing Hive queries to load and process data in Hadoop File System.

Exported data from Impala to Tableau reporting tool, created dashboards on live connection.

Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.

Maintained integrity of the database by implementing different validation techniques to the data uploading procedures.

Defended conjunction between ERP system and database using SQL reporting services.

Developed dashboards in Excel with SQL connections to ensure smooth procedural flow and analysis.

Identified and proposed process enhancements.

Quantified multiple stocking proposals and contracts with customers which increased business by approximately 20% per customer.

Developed ad-hoc reports upon requests using MS Access and MS Excel.

Environment: Hadoop, MapReduce, Hive, Impala, Python (pandas, NumPy, scikit-learn), SQL Server, Excel, Tableau

Contact this candidate