Machine Learning Engineer

Location:

Irving, TX

Salary:

65/hr

Posted:

April 15, 2025

Contact this candidate

Resume:

PraveenKumar. S

Sr. Machine Learning Engineer

Dallas, TX

+1-940-***-****

Professional Summary

● A Passionate, team-oriented AL/ML Engineer, With over 6 years of experience in the dynamic field of artificial intelligence, I specialize in building innovative, data-driven solutions that drive real-world impact. Currently working as a Sr. Machine learning engineer, I focus on leveraging large language models (LLMs) to solve complex problems and create intelligent systems that deliver value across various domains. My expertise spans advanced use cases such as Retrieval-Augmented Generation (RAG), code generation and transformation, text summarization, entity extraction, and the development of sophisticated Q&A chatbots.

● Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, and developing data mining and reporting solutions across a massive volume of structured and unstructured data.

● Extensive experience in Machine Learning solutions to various business problems and generating data visualizations using Python.

● Experienced in building end-to-end GenAI solutions integrating LLMs with real-world data for applications like fraud detection, loan risk prediction, and customer segmentation.

● Hands-on expertise with OpenAI API, Hugging Face Transformers, Lang Chain, Lang Graph, and GCP Vertex AI to deliver production-grade conversational AI systems.

● Skilled in architecting Retrieval-Augmented Generation (RAG) pipelines that fuse unstructured documents, tabular data, and knowledge graphs into coherent, explainable LLM responses.

● Proven ability to design and fine-tune LLMs for regulated industries, ensuring output quality, bias mitigation, and compliance with domain-specific guidelines.

● Developed intelligent AI assistants and chatbots that automate analyst workflows, support risk management, and scale decision-making across enterprise teams.

● Strong command over embedding models, vector stores (e.g., Pinecone, FAISS), and prompt engineering techniques for grounded and task-specific GenAI applications.

● Passionate about building human-centric, secure, and interpretable GenAI systems that align with ethical AI practices and business impact.

● Adept at bridging the gap between data science and software engineering to deploy scalable AI pipelines using tools like Lang Chain, Vertex Pipelines, and Python microservices.

● Outstanding performance in Data Governance, Data Mining, Exploratory Data Analysis, Data Validation, Predictive Modelling, Data Lineage, and Data Visualization in all the phases of the project Life Cycle.

● Profound Knowledge in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forests, Support Vector Machine, K-Nearest-Neighbours, K-means Clustering, Neural Networks, Gradient-Boosting, and Ensemble Methods.

● Meticulously skilled in Python 2.x/3.x programming with various packages including NumPy, SciPy, Pandas, and Scikit Learn.

● Solid knowledge and experience in Deep Learning techniques including Feedforward Neural networks, Convolutional Neural networks (CNN), and Recursive Neural networks (RNN).

● Implemented deep learning models and numerical Computation with the help of data flow graphs using Machine Learning.

● Excellent proficiency in model validation and optimization with Model selection, Parameter/Hyper-Parameter tuning, K-fold cross-validation, Hypothesis Testing, and Principal Component Analysis (PCA). z

● We implemented and analysed RNN-based approaches for automatically predicting implicit relations in text. The disclosure relation has potential applications in NLP tasks like Text Parsing, Text Analytics, Text Summarization, and Conversational systems.

● Collaborated with cross-functional teams to design and implement data pipelines using Azure Data Factory, improving data accessibility and reliability.

● Experience in designing visualizations using Tableau software and publishing and presenting dashboards, Storylines on web and desktop platforms.

● Write SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgres SQL, and Oracle, and NoSQL databases such as MongoDB to handle unstructured data.

● Solid understanding of RDBMS database concepts including Normalization and master in creating database objects such as tables, stored procedures, triggers, row-level audit tables, cursors, indexes, and user-defined data types.

● Choose appropriate machine learning models for AML tasks, such as classification algorithms.

● Masterly skilled in building and publishing customized interactive reports and dashboards with customized parameters and user filters using Tableau, and Excel.

● Good knowledge of Hadoop components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.

● Progressive involvement in Agile methodology, and SCRUM process.

● Strong business sense and ability to communicate data insights to technical and non-technical clients. TECHNICAL SKILLS :

Python Libraries Scikit, Pandas, NumPy, SciPy, Matplotlib, Seaborn Version Control Git, GitHub

Data Visualization Tableau, Matplotlib, Seaborn, Plotly, Excel, PowerPoint Programming Languages Python, SQL, MATLAB, Bash

Operating System Windows, Unix, Linux Ubuntu

Backend Frameworks Django, Fast API, Flask

Data Processing Pandas, NumPy, SQLAlchemy

Large Language Models Hugging Face, OpenAI, Lang chain, Llama, GPT2,3, BERT Database Management SQL, MongoDB, Redis

Cloud Platforms AWS (EC2, S3, RDS), Azure, Google Cloud Containerization Docker, Kubernetes

API Integration Third-party APIs, Streamlit, Flask, Fast API, REST APIs NLP Techniques Sentiment analysis, Keyword extraction, Document classification, Entity recognition Machine Learning

Linear regression, SVR, KNN, Logistic regression, SVM, Random Forest, K-means, CNN, RNN

CI/CD Jenkins, GitHub Actions

Education:

Bachelors: SRM Institute of Science and Technology, 2018

Masters: University of North Texas in Masters in Artificial Intelligence, 2022 EXPERIENCE

LPL Financial, Austin, Tx (April 2023 to Present)

Role: Sr. Machine Learning Engineer

The project involved the development of predictive models for loan issues, risk management for new foreign exchange products, fraud detection, and customer segmentation, for the data sourced from multiple Data sources. Responsibilities :

Built a retrieval-augmented chatbot that supports analysts with real-time insights from structured (databases) and unstructured (PDFs, docs) financial data.

Developed custom pipelines in Vertex AI to automate fine-tuning, embeddings, and monitoring of LLMs.

Leveraged Lang Chain & Lang Graph to create dynamic, multi-turn conversation agents for fraud detection and FX product analysis.

Improved explainability by embedding domain knowledge graphs and leveraging context-aware responses via RAG.

Ensured compliance with financial regulations by implementing prompt filters, audit trails, and model response constraints.

Designed a financial assistant chatbot powered by OpenAI and Lang Chain to handle domain-specific queries like loan approval criteria, credit risk scores, and FX exposure breakdowns.

Enabled multi-turn conversation capabilities with memory and context retention using Lang Graph and Lang Chain agents for seamless user interactions.

Integrated the chatbot with RAG pipelines to pull contextual data from internal documents, transaction records, and customer profiles—ensuring accurate, grounded responses.

● Actively develop predictive models and strategies for effective fraud detection for credit and customer banking activities using clustering K-means.

● Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.

● Integrate AWS-related experiences into your professional experience section, emphasizing how you used AWS services to enhance data science workflows. Assisted in migrating on-premises data infrastructure to Azure cloud, ensuring seamless transition and improved scalability.

● Quantify your impact with specific achievements related to Spark projects.

● Used Python Matplotlib packages to visualize and graphically analyse the data.

● Work closely with AML experts and financial domain professionals to understand the intricacies of money laundering.

● Data pre-processing, splitting the identified data set into a Training set and Test set.

● Performed Data Wrangling to clean, transform and reshape the data utilizing the Pandas library.

● Data cleaning, Data wrangling, manipulation, and visualization. Extract data from relational databases and perform complex data manipulations. Also conducted extensive data checks to ensure data quality.

● Collaborate with AML compliance teams to integrate models into existing AML monitoring systems.

● Integrate Spark-related experiences into your professional experience section, highlighting projects and achievements.

● Assisted in migrating on-premises data infrastructure to Azure cloud, ensuring seamless transition and improved scalability.

● Used Python programming languages to graphically analyse the data and perform data mining. Also Built and analysed datasets using Python, and MATLAB.

● Understand transaction data and develop analytics insights using statistical modelling using Artificial Intelligence

(AI) using Python.

● Specify your experience with AWS cloud computing and infrastructure management.

● Analysed performance of recurrent Neural Networks for data over time.

● Used Python NumPy, SciPy, and Pandas packages to perform dataset manipulation.

● Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, MATLAB, and Python. Environment: Python, NumPy, SciPy, Pandas, Matplotlib, Scikit-learn Verizon, Dallas, TX (Aug 2022 to Mar 2023)

Role : Machine learning Engineer

Responsibilities:

Collaborate with ML Engineers and Data Scientist to build data and model pipelines and help in running machine learning tests and experiments

Built different use cases and extensively worked on Jupyter Notebook for Data Cleaning, converted data into structured format, removed outliers, dropped irrelevant columns & missing values, imputed missing values with median/mode/average/min/max other statistical methods and ML techniques.

Worked on libraries like NumPy, Pandas, SciKit Learn, matplot, seaborn, etc.

Develop AI/ML algorithms with lite fine tuning and accuracy.

Developed and scaled machine learning and deep learning models like Logistic Regression, Random Forest, Gradient Boosting Machines, SVM (Support Vector Machines), etc. for classification.

Programmed using python to prototype and deploy Machine Learning, Deep Learning, Predictive models, Probabilistic and Statistical Modelling based approach with user interface development.

Experience with Statistical Modelling, Data Extraction, Data cleaning, Data screening, Feature engineering, PCA, Data Exploration and Data Visualization of structured and unstructured datasets

Ability to implement large scale Deep Learning and Machine Learning algorithms to deliver resource full insights and inferences

● Used Advanced SQL queries to perform manipulation on the structured data.

● Visualized the data in python using pandas, matplotlib, and seaborn packages.

● Applied regularization models, random forest, gradient boost, and neural network to predict the bike rental demand.

● Implemented a recommendation model for customer product use.

● Designed and optimized Graph Database schemas in Neo4j to model complex relationships for AI and data-driven applications.

● Developed Cypher queries to efficiently traverse large-scale graph data, improving query performance and data retrieval speed.

● Integrated Neo4j with Python using the neo4j driver to enable seamless interaction with AI and analytics pipelines.

● Imputed missing value by KNN strategy and selected important features by random forest on the customer transaction dataset.

● Implemented supervised machine learning models including Logistic Regression, SVM, and gradient boosting classifier to predict churn of users and applied regularization with optimal parameters to overcome overfitting.

● Deployed the model as a web service on Azure, enabling real-time predictions and integration with CRM systems.

● Evaluated model performance of classification by confusion matrix, ROC, and AUC curves.

● Defined cost-benefit matrix for personalized coupons achieving a max profit of $79 per user.

● Performed A/B tests to app page to increase conversion rate and engagement measure impact solely from the change.

● Monitored brand KPIs (registered users, subscription purchase) using Google Analytics and Tableau.

● Worked to improve conversion rates by running weekly split/multivariate tests of each product's funnel. Environment: Python, Logistic Regression, Support Vector Machine, SQL, Tableau, Google Analytics Tetra soft, Hyderabad, India (Mar 2019 to July 2021) Role: Data Scientist

The client is a major financial house that offers financial services such as credit cards, personal loans, Mortgages, and business loans, as a data scientist The project was to develop models that help to predict the interest rates of credit cards and car loans using regression and customer segmentation using clustering. Responsibilities:

● Collaborated with technologists and business stakeholders to drive innovation from conception to production.

● Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot; Box plots to determine the condition of the data.

● Worked on data processing on very large datasets that handle missing values in data.

● Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, and making it ready for statistical analysis.

● Design, development and implementation of performant ETL pipelines using python API (PySpark) of Apache Spark, AWS glue on AWS EMR.

● Utilized Kubernetes, Docker and Cloud Formation for the runtime environment of the CI/CD system to build, PyTest deploy.

● Implemented AJAX to update necessary section of webpages and hence, avoiding the need for reloading the entire web page.

● Creating different S3 buckets and write Lambda to move files from S3 to Redshift.

● Implemented various regression models to predict car loan and credit card interest rates Implemented various machine learning models such as regression, Tree-based, and Ensemble models.

● Developed SQL queries and scripts to extract, transform, and load data into Azure SQL Database for reporting and analysis purposes.

● Performed model Tuning by adjusting the Hyperparameters and raised the model accuracy Conducted validation of models by different measures such as k-Fold cross-validation, AUC, and ROC to identify the best performing model.

● Performed Segmentation on customers’ data to identify target groups for new loans using Clustering techniques such as K-Means and further processed using Support Vector Regression.

● Accomplished multiple tasks from collecting data to organizing and interpreting statistical information. Documentation of the model was done, and recommendations were forwarded to the company specifying the target customer base for the policy for achieving maximum success. Environments: Python, MLlib, Regression, Cluster analysis, SVM, Random Forest. B r o a d

ridge Hyderabad, India (Jan 2018 to Feb 2019)

Role: Data Scientist,

The client was in the business of manufacturing and selling clothing. The project was to create statistical machine learning models for fraud detection, implementing automated customer scoring systems, sentiment analysis, etc. Responsibilities:

● Involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.

● Created classification models to recognize web requests with product association to classify the orders and score the products for analytics which improved the online sales percentage by 13%.

● Worked on NLTK library in python for doing sentiment analysis on customer product reviews and other third-party websites using web scrapping.

● Used Pandas, NumPy, and Scikit-learn in Python for developing various machine learning models such as Decision Trees and Random Forest.

● Used cross-validation to test themodelswithdifferentbatchesofdatatooptimizethemodelsandpreventoverfitting.

● Implemented and developed a fraud detection model by implementing a Feed Forward Multilayer Perceptron, a type of ANN.

● Used pruning algorithms to cut away the connections and perceptron’s to significantly improve the performance of the back-propagation algorithm.

● Implemented a structured learning method that is based on the search and scoring method.

● Customer segmentation based on their behaviour or specific characteristics like age, region, income, and geographical location, and applying Clustering algorithms to group the customers based on their similar behaviour patterns.

● Created andmaintainedreportstodisplaythestatusandperformanceofdeployedmodelandalgorithmwithTableau.

● Worked with numerous data visualization tools in python like matplotlib, seaborn, pilot, and pygal. Environment: Python, SQL Server, SQL, Tableau, Shell Scripting, Excel, PowerPoint

Contact this candidate