AI/Machine Learning Engineer
Name: Satya Gattu
Email: ***********@*****.***
PH: +1-972-***-****
LinkedIn: https://www.linkedin.com/in/satya-gattu-17238a1b8/
PROFESSIONAL SUMMARY
Have over 9+ yrs. of experience as Data Scientist and experienced into Artificial Intelligence/Machine learning Algorithms.
Good hands-on experience on Software languages with Python, R studio. Highly accurate and experienced in executing data - driven solutions to increase efficiency, accuracy, and utility of internal data processing, collecting, analysing, and interpreting large datasets, developing new forecasting models, and performing data management tasks.
Experienced at creating data regression models, using predictive data modelling, and analysing data mining algorithms to deliver insights and implement action-oriented solutions to complex business problems
Using machine learning techniques supervised, unsupervised, reinforcement learning and understand the requirement & design for AI/ML use cases
Strong hands-on experience in Python, (Frameworks/libs: NumPy, SciPy, Pandas, NumPy, Matplotlib/Plotly, scikit-learn, Stats models)
Skilled in applying AI/ML to user behavior analytics, workforce engagement insights, and predictive modeling.
Designed and deployed LLM and AI-based solutions in collaboration with global cross-functional teams to optimize business operations.
Experienced on Data Preparation, modelling build, Feature Engineering and production/deployment
Hands on experience in predictive analytics and statistical tools, machine learning algorithms and big data tools
Use pandas for data analysis and sea born for statistical plots
Experienced in building adaptive ML systems that learn from user behaviour to improve predictions and usability over time.
Experienced in R studio (Frameworks/libs: Matrices, Data Frames, Lists, Data manipulation using Dplyr and Tidyr, Data visualization with ggplot2, Data input and output)
Experienced with Neural Networks, Naive Bayes, SVM, Decision Forests
Experienced delivering products or solutions that utilized Machine Learning, Natural Language Processing or other forms of AI (Artificial Intelligence) solutions like machine vision
Strong experience in SQL (using one of database Pgadmin(PostgreSQL), Cassandra, and oracle) and Tableau
Experienced with one of Deep Learning (Tensor flow)
Knowledge on different technology trends, salary trends, Big Data market and different job roles in Big Data
Knowledge on complex architectures of Hadoop and its component
Understand how Map Reduce, Hive and Pig can be used to analyze big data sets
Sample data sets and scripts (HDFS commands, Hive sample queries, Pig sample queries, Data Pipeline sample queries)
Running HDFS commands, Hive queries, Pig queries
Understand modern data architecture: Data Lake
Experience major components of GCP (Google cloud platform), know how to use Tensor Flow on cloud, understand machine learning fundamentals
AWS and highly skilled in CNN, RNNs & LSTMs, scikit-learn, keras, Tensor Flow.
Technical skills:
Machine Learning Algorithms: Linear regression, SVM, KNN, Naive Bayes, Logistic Regression, Random Forest, Boosting, K-means clustering, Hierarchical clustering, Collaborative Filtering, Neural Networks, NLP
Analytic Tools: R 2.15 / 3.0 (Reshape, ggplot2, Dlpr, Car, Mass and Lme4), Excel, Data Studio
Programming Language: R 3.X, Python 2.X & 3.X (numpy, scipy, pandas, seaborn, beautiful soup, scikit-learn, NLTK),
Database: Postgre SQL, Oracle 11g, MySQL, SQL Server, MongoDB, Neo4j
Big-Data Framework: Hadoop Ecosystem 2.X (HDFS, MapReduce, Hive 0.11, Hbase 0.9), Spark Framework 2.X (Scala 2.X,
SparkSQL, Pyspark, Spark, Mllib)
Data Visualization: Tableau 8.0 /9.2 / 10.0, Plotly, R-ggplot2, Python-Matplotlib, Logi Analytics
Data Engineering: dbt (Data Build Tool), Apache Airflow (already in experience but good to list), Delta Lake, Great Expectations
Version Control: Git 2.X
Applied Data Science and Business Analytics: Workforce Analytics, Talent Segmentation, Strategic Workforce Planning, Survey Methodology, HRIS Data Pipelines, Scenario Modeling, Engagement Analysis, KPI Forecasting, Intelligent Work Assistants (Rovo AI-style Copilots), Jira/Confluence Integration, Workflow Automation using LLMs
Operation System: UNIX, MacOS, Windows
AI Infrastructure and Platforms: Red Hat OpenShift, NVIDIA GPU (CUDA, A100, T4), OpenShift GPU Operator, Kubernetes, Docker, HPC Clusters, ML Workload Optimization, TensorRT, CUDA Toolkit.
GenAI & LLMs: OpenAI API (GPT-3.5/4), Anthropic Claude, Vertex AI (PaLM), LangChain, LlamaIndex, RAG pipelines, Prompt Engineering, Fine-tuning transformer-based models (BERT, GPT, T5)
MLOps & Deployment: MLflow, SageMaker Pipelines, Kubeflow, ONNX, FastAPI, CI/CD for ML with GitHub Actions, Jenkins + Docker + Kubernetes, Model versioning, monitoring (e.g., EvidentlyAI), drift detection
Security & Privacy: HIPAA, SOC2, GDPR compliance in AI/ML workflows
PROFESSIONAL EXPERIENCE
Capital One, Plano, TX Feb 2024 – Till date
Gen AI/Machine Learning Engineer
Responsibilities:
Worked on PostgreSQL (pgAdmin), Cassandra (DataStax), SQL Developer tools to query and extract logs from BPM and MTAS database.
Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS and Big Data environments.
Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python.
Streaming. Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
Designed and implemented LLM-powered features to improve user personalization and content relevance in business applications.
Integrated Amazon SageMaker models with AWS Lambda for real-time, event-driven inference, enabling low-latency predictions in production-grade applications.
Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields.
Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database
Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
Utilized Spark, Scala, Hadoop, Baes, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensional reduction.
Built different use cases and extensively worked on Jupyter Notebook for Data Cleaning, converted data into structured format, removed outliers, dropped irrelevant columns & missing values, imputed missing values with median/mode/average/min/max other statistical methods
Analyzed user interaction data across deployed ML solutions to identify improvement opportunities and enhance system intelligence.
Used LangChain and LlamaIndex for orchestration of custom LLM workflows integrated with enterprise document repositories
Implemented containerized AI workloads using Docker and Kubernetes on OpenShift, enabling scalable and portable deployment of transformer-based models.
Deployed and monitored LLM APIs (OpenAI, Claude, Vertex AI) within containerized microservices on OpenShift.
Configured OpenShift GPU Operator to orchestrate hardware acceleration for deep learning workloads using TensorFlow and PyTorch.
Implemented IAM-based role access control, VPC configurations, and encryption protocols (KMS, S3 bucket policies) to secure end-to-end machine learning workflows in AWS cloud.
Collaborated with infrastructure teams to manage high-performance computing (HPC) clusters and optimize distributed training on GPU-backed OpenShift nodes.
Worked on libraries like NumPy, Pandas, SciKit Learn, mathplot, seaborn, psycopg2 etc.
Using machine learning techniques supervised, unsupervised, reinforcement learning and understand the requirement & design for AI/ML use cases
Optimized machine learning pipeline performance by leveraging NVIDIA A100 GPUs within a Red Hat OpenShift environment, accelerating training and inference workloads.
Enhanced existing AI models using continuous learning and real-time behaviour feedback loops to optimize model performance
Developed and scaled machine learning and deep learning models like Logistic Regression, Random Forest, Gradient Boosting Machines, SVM (Support Vector Machines), etc. for classification and analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques
Built, train and deploy ML models on AWS with Amazon sage maker
Built Retrieval-Augmented Generation (RAG) pipelines with OpenAI embeddings and FAISS for internal knowledge management solutions
Designing and executing processes related to machine learning and predictive modelling, data mining, and research on large scale complex data sets using statistical models, graph models, text mining and other modern techniques
Collaborated with global product and engineering teams to architect AI-driven solutions that align with enterprise business objectives.
Ensured compliance with enterprise data privacy standards and regulatory frameworks including HIPAA and SOC 2, while designing and deploying AI/ML pipelines involving sensitive customer data.
Environment: Agile, Python 3.x, Airflow, AWS, Azure Cloud, Databricks, Spark, PySpark, PyTorch, SQL, Erwin r9.6, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, Regression, Cluster Analysis, Scala, NLP, Cassandra, MapReduce, Kafka, MongoDB, Logistic Regression, Hadoop, Hive, Teradata, Random Forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, XML, LLM APIs (OpenAI, Anthropic),Vertex AI
Maxim Healthcare - Columbia, MD May 2022 – Jan 2024
Machine Learning/ Artificial Intelligence Engineer
Responsibilities:
Build classification models based on advisor performance and deduced knowledge rules for high, medium performing advisors.
Reduced treatment plans for advisors by inspiring from interactions of high performing advisors.
Prioritized leads and created nurturing journey based on learnings from previous modelling practices.
Identified data vendors to optimize the performance of existing models and pave the way to build new models.
Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements (Data Ingestion).
Performed data analysis by retrieving the data from the Hadoop cluster.
Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
Under the PMO Organization, led the company wide AGILE Transformation initiative in conjunction with AGILE Sponsor team represented by areas of Business, Technology, Marketing, HR, Compliance, IT etc.
As an overall AGILE transformation lead, developed implemented and promoted AGILE best practices and standards across the enterprise and AGILE teams. Drove the organization-wide AGILE adoption strategy and rollout plans. Provided solutions for scaling AGILE across projects, programs, and portfolios and improve Application delivery.
Explored and analysed the customer specific features by using Matplotlib in Python and ggplot2 in R
Performed data imputation using Scikit-learn package in Python.
Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn pre-processing.
Used Python (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R to develop a variety of models and algorithms for analytic purposes.
Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis (LLM Integration).
Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests, and KNN to predict customer churn.
Conducted analysis of customer behaviour’s and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means
Used Clustering Gaussian Mixture Model and Hierarchical Clustering Model. Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different models’ performance.
Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Environment: Hadoop, HDFS, Python, R, Tableau, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA, GitHub, Agile/ SCRUM, GCP
Client: Amazon, Seattle, WA Aug 2020 - Apr 2022
Senior Data Scientist
Responsibilities:
Have Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.
Had Built machine learning models to identify whether a user is legitimate using real-time data analysis and prevent fraudulent transactions using the history of customer transactions with supervised learning.
Extracted data from SQL Server Database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
Leveraged Databricks platform for scalable data processing and machine learning tasks, including data exploration, feature engineering, and model training on large datasets.
Used AWS Sage Maker to quickly build, train and deploy the machine learning models (Data Ingestion).
Implemented NLP for Sentiment Analysis and fraud detection (LLM Integration).
Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.
Assisted the Product Owner in creating a Proof of Concept by analysing months of syslog data in the proposed Splunk platform in terms of reporting, monitoring and alerts.
Built scalable ML pipelines using SageMaker Pipelines and integrated experiment tracking with MLflow.
Implemented ONNX runtime models for cross-platform deployment and latency optimization.
Had Used Jenkins pipelines to drive all microservices builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes.
Have Built and Maintained Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker, on GCP (Google Cloud Platform). Utilized Kubernetes and Docker for the runtime environment of the CI / CD system to build, test deploy.
Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
Utilized PCA, t-SNE and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of scikit-learn library
Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python.
Was Responsible for setting up the continuous mobile delivery on Ionic Enterprise with Appflow.
Worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.
Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
Implemented a Python-based distributed random forest via PySpark and MLlib.
Used cross-validation to test the model with different batches of data to find the best parameters for the model and optimized, which eventually boosted the performance.
Developed and implemented Spark python modules for machine learning & predictive analytics in Databricks.
Snowflake data engineers will be responsible for architecting and implementing very large-scale data intelligence solutions around Snowflake Data Warehouse.
Responsible for Migration of key systems from on-premises hosting to Azure Cloud Services Snow SQL Writing: SQL queries against Snowflake.
Participated in as-is to-be analysis between the available system information and event monitoring (SIEM) enterprise tool and Splunk enterprise platform to prepare the business case
Transformers design for renewable (solar/wind farm) energy grid tie application, grounding application, step-up application and TWACS application. Distribution class transformer design. stack core and wound core designs with RGO and amorphous material.
Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.
Used GitHub and Jenkins for CI/CD (DevOps operations). Also familiar with Tortoise SVN, Bitbucket, JIRA, Confluence.
Technology Stack: Machine Learning, AWS, Databricks, Python (Scikit-learn, SciPy, NumPy, Pandas, Matplotlib, Seaborn, SQL Server, Hadoop, HDFS, Hive, Pig Latin, Apache Spark/PySpark/MLlib, GitHub, Linux, Tableau.
Western alliance bank - Phoenix, AZ Feb 2018 – Jul 2020
Data scientist / Machine Learning Engineer
Responsibilities:
Collaborated with ML Engineers and Data Scientist to build data and model pipelines and help in running machine learning tests and experiments
Experienced in Statistical Modeling, Data Extraction, Data cleaning, Data screening, Data Exploration and Data Visualization of structured and unstructured datasets
Had Ability to implement large scale Deep Learning and Machine Learning algorithms to deliver resourceful insights and inferences
Used machine learning techniques supervised, unsupervised, reinforcement learning as applied to the travel, transport or logistics industry.
Collaborated with business and team members to understand company needs, Employees predictive modelling, text mining, and forecasting techniques, keeps up to date with analytics technology and understands their utility for daily
Developed Machine Learning and NLP framework, models and service that are flexible to extend to new features
Customized in-house NLP framework and skills according to product requirement
Enhanced Natural Language Understanding models with measurable metrics and deliver better results on user experience
Had Ability to synthesize quantitative results to determine implications and make actionable recommendations
Assembled large, complex data sets that meet functional / non-functional business requirements.
Used predictive analytics such as machine learning and data mining techniques to forecast company sales of new products with a min 90% accuracy rate
Had Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and Tableau technologies
Used data visualization tools (Pie, Histogram, Boxplot, ggplot2 and Bar plot, etc) to share ongoing insights
Worked closely with product owners and other team members to meet delivery goals
Designed elegant data visualizations to present complex analysis and insights to customers with Tableau or other related tools
Had Built and configure SQL and Excel data spread sheet implanted into the business Strong hand on experience Microsoft Office, Excel. Power point
Had Applied Machine Learning methodologies such us with R and Python and have solved several real-life problems using these. K Means, K Nearest, Linear regression, Logistic regression, Support vector machines, Decision Tree methods
Environment: Stack: R 3.X, Python 3.X, AWS, GCP, MS SQL server 2008, Git, Oracle 12c, HDFS, MongoDB 3.3, Spark 1.6 / 2.0 (SparkR, MLlib, Spark SQL), Tableau 10.0, Git 2.X
Cap Gemini – Bangalore, India Jul 2015 – Aug 2017
Data Analyst
Responsibilities:
Acted as a business consultant for the use of Data Analytics to drive business decisions, business strategy
Have used machine learning techniques supervised, unsupervised, reinforcement learning and understand the requirement
Translated business needs to technical requirements and implementation
Have Used predictive analytics such as machine learning and data mining techniques to forecast company sales of new products with a min 90% accuracy rate
Applied data science approaches and methodologies (such as liner and logistic regression decision tress) to improve business outcomes
Had Taken Initiative ideas and developed related model with R
Developed data analysis model according to business scenarios
Evaluated the method and technical solution of data analytics projects
Designed elegant data visualizations to present complex analysis and insights to customers with Tableau or other related tools
Have Shared the data science knowledge to other team members and build up the production internally.
Discovered the information hidden in vast amounts of data and help us make smarter decisions to deliver even better products
I am Responsible for providing educational guidance and assistance for students by planning schedules, recommending courses and determining appropriate education solutions for different types of students. Prepares portfolio manager reports for assigned client review meetings. Delivers to advisor for review day prior to meeting. Back up to front desk, answers phones and participate in company meetings
Environment: Git, Jira, Slack, AirFlow, Python 2.7, 3, Tensor Flow, Keras, FB Prophet, SQL, PySpark, SparkQL, GCP, AWS, Lambda, Sagemaker, Kubernetes, Docker, Tableau, Logi Analytics, REST API
Education Details:
Bachelor’s degree in computer science – Vignan Institute of Technology and Science, 2015