Data Scientist / AI / ML Engineer
Email: ********************@*****.***
Phone: 630-***-****
Linked In:
KRISHNA BAGVAN KANDIMALLA
PROFESSIONAL SUMMARY
•Accomplished Data Scientist with over 10 years of experience in delivering data-driven products, including 6+ years in a dedicated Data Scientist role. Skilled in applying Machine Learning algorithms, Predictive Analytics, Statistical Modeling, Natural Language Processing (NLP), and Artificial Intelligence to solve complex business problems.
•Previously served 4+ years as a Data Analyst, with strong expertise in extracting actionable insights from large and complex datasets. Proven ability to translate data into strategic recommendations that support informed business decisions.
•Proven expertise in end-to-end data science project execution—from data ingestion, wrangling, and exploration to model deployment and performance monitoring.
•Designed predictive algorithms and analytical models used across multiple industries, supporting data-driven decision-making and business transformation.
•Proficient in applying AI and machine learning techniques, graph analytics, and advanced data visualization to create executive dashboards and deliver actionable insights to senior leadership.
•Familiar with SAP Business Warehouse (BW) and CRM platforms, coupled with exposure to a variety of predictive modeling techniques to improve customer engagement and forecasting.
•Extensive experience in designing supervised and unsupervised learning models, deep learning frameworks (CNN, RNN, LSTM), NLP pipelines, and recommender systems to solve complex business challenges.
•Strong command of Python, R, SQL, and PySpark; hands-on with key libraries such as Pandas, NumPy, Scikit-learn, TensorFlow, Keras, Matplotlib, Seaborn, and NLTK.
•Adept at building scalable big data solutions using Spark, Hadoop, and Kafka, and deploying models via MLOps pipelines on AWS, Azure, and GCP platforms.
•Demonstrated ability to drive actionable insights through advanced statistical analysis (regression, classification, clustering, time-series, A/B testing, ANOVA) and business intelligence tools (Tableau, Power BI).
•Skilled in feature engineering, dimensionality reduction (PCA, t-SNE), hyperparameter tuning, and evaluation techniques for robust and production-ready ML models.
•Solid understanding of deep learning techniques, including Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and model training using TensorFlow and Keras.
•Expert in applying statistical methods such as EDA, regression models, time-series forecasting, ANOVA, A/B testing, Monte Carlo simulation, and cross-validation for rigorous analysis and model validation.
•Skilled in Natural Language Processing (NLP) and fundamental mathematics, including calculus, linear algebra, statistics, and optimization techniques such as gradient descent and regularization.
•Experienced in statistical programming using Python (NumPy, Pandas, Scikit-learn, Seaborn, Matplotlib), R, SAS, MATLAB, and PySpark for advanced analytics and model development.
•Proven ability in real-time data analytics using Apache Kafka and Spark Streaming on high-performance distributed computing platforms like AWS EMR and Hadoop clusters.
•Extensive experience with data visualization tools such as Tableau, Power BI, QlikView, and Python/R-based libraries for presenting complex insights clearly and effectively.
•Well-versed in modern AI/ML platforms including AWS SageMaker, Azure ML Studio, TensorFlow, and Keras for scalable experimentation and deployment of deep learning models.
•Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
•Experience on Migrating SQL database to Azure Data Lake, Azure data Lake Analytics, Azure SQL Database, Data Bricks and Azure Synapse.
•Experience in Developing Spark applications using PySpark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
•Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
•Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream processing).
•Strong hands-on experience with both NoSQL (MongoDB, HBase) and traditional RDBMS (SQL Server, MySQL), supporting hybrid data architecture and integration workflows.
•Collaborative team player with experience working in Agile/Scrum environments, familiar with tools such as JIRA, ProjectLibre, and GitHub for project tracking, sprint planning, and version control.
TECHNICAL EXPRETISE
Data Sources
AWS snowflake, Postgres SQL, MS SQL Server, MongoDB, MySQL, HBase, Amazon Redshift, Teradata.
Statistical Methods
Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Time Series, Correlation (Chi-square test, covariance), Multivariate Analysis, Bayes Law.
Supervised and Unsupervised:
XG Boost, Light GBM, Artificial Neural Networks, Auto encoders, Convolutional Neural Networks, Recurrent Neural Networks, LSTM, Bi-directional LSTM, ANN, CNN, Multi-Layer perceptron's, Linear Regression, Polynomial Regression, Logistic Regression, SVM, Random Forests, Decision Trees, K-NN, Naive Bayes, K-Means, Hierarchical clustering, Association Rule Learning, Reinforcement Learning, Self-organizing maps
Natural Language Processing (NLP):
Text Analytics, Text processing (Tokenization, Lemmatization), Text Classification, Text clustering, Name Entity Recognition (NER), Word Embedding and Word2Vec, POS Tagging, Speech Analytics, Sentimental Analysis.
Data Visualization
Tableau, Python (Matplotlib, Seaborn), R(ggplot2), Power BI, QlikView
Languages
Python (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R, SQL
Operating Systems
UNIX Shell Scripting (via PuTTY client), Linux, Windows, Mac OS
Other tools and technologies
TensorFlow, Keras, AWS ML, Azure ML studio, GCP, NLTK, SpaCy, Gensim, MS Office Suite, Google Analytics, GitHub, AWS—(EC2/S3/Redshift/EMR/Lambda)
EDUCATION:
●Masters from USA.
PROFESSIONAL EXPERIENCE
Client: Lam research, phoenix, Az. Oct 2024 – Till Date
Role: Data Scientist
Responsibilities:
•Involved in requirement analysis, application development, application migration, and maintenance using Software Development Lifecycle (SDLC) and Python technologies.
•Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
•Performing data parsing, data ingestion, data manipulation, data modelling and data preparation with methods including describing data contents, computing descriptive statistics of data such as regex, split and combine, remap, merge, subset, reindex, melt, and reshape.
•Built Support Vector Machine algorithms for detecting the fraud and dishonest behaviors of customers by using several packages: Scikit-learn, Numpy, Scipy, Matplotlib, and Pandas in Python.
•Used AWS S3, Dynamo DB, and AWS Lambda, AWS EC2 for data storage and models& deployment. Worked extensively on AWS services like Sage Maker, Lambda, Lex, EMR, S3, Redshift etc.
•Used AWS transcribe to obtain call transcripts, perform text processing (cleaning, tokenization, and lemmatization)
•Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
•Designed the data marts in dimensional data modeling using Snowflake schemas.
•Generated executive summary dashboards to display performance monitoring measures with Power BI.
•Developed and implemented predictive models using Artificial Intelligence/ Machine Learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, RandomForest, K-means clustering, KNN, PCA and regularization for Data Analysis.
•Leverage AWS Sage Maker to build, train, tune and deploy state of art Artificial Intelligence/ Machine Learning and Deep Learning models.
•Built classification models include: Logistic Regression, SVM, Decision Tree, and Random Forest.
•Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
•Worked with creating ETL specification documents, & creating flowcharts, process workflows and data flow diagrams.
•Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas.
•Worked on the Snow-flaking the Dimensions to remove redundancy.
•Created reports utilizing Excel services and Power BI.
•Creating new Documentation, DB Schema designing, creating new ERD's using Visio.
•Implemented various features of Crystal Reports like Parameters, report filters, formula, variables.
•Developed data dictionary for project coordination; created SQL create, alter, and insert scripts.
•Acquiring access to data to support data analysis work.
•Performed data analysis and profiling of source data to better understand the source systems.
•Applying Deep Learning (RNN) to find the Optimum route for guiding the tree trim crew.
•Using XGBOOST algorithm predicting storm under different weather conditions and using Deep Learning analyzing the severity of after storm effects on the Power lines and Circuits.
•Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud.
•Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
•Produced A/B test readouts to drive launch decisions for search algorithms including query refinement, topic modeling, and signal boosting and machine-learned weights for ranking signals.
•Implemented an Image Recognition (CNN + SVM) anomaly detector and convolutional neural nets (CNN/ Image Recognition) to determine fraud purchase direction.
•Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.
CResponsibilities:
•Participated in Status meetings with the clients, conducting reviews of the deliverable's and defect tracking.
•Using Python and Tableau to identify customer trends affecting 15% increase in revenue for ‘Commercial’ and ‘Government Programs’ for Client (UPMC)
•Collecting and analyzing data to solve business problems and clearly communicating the benefits, risks and trade-offs to business by using Logistic regression, Decision trees and Random forest models
•Established and managing data governance process to define data quality and standards resulting in 10% reduction in operational costs.
•Collaborated with business to understand company needs and devise possible solutions using data.
•Analyzed and solved business problems, and found patterns and insights within structured and unstructured data.
•Cleaned, analyzed and did featuring engineering as part of data pre-processing.
•Implemented new statistical and mathematical methodologies to identify hidden patterns.
•Used algorithms and programming to efficiently go through large datasets and apply treatments, filters, and conditions as needed.
•Created meaningful data visualizations to communicate findings and relate them back to how they create business impact using Tableau.
•Presented proposals and results in a clear manner backed by data and coupled with actionable conclusions to drive business decision.
•Designed data model from the scratch to assist the team in data collection to be used for Tableau visuals.
•Used Joins, correlated and non-correlated sub-queries for complex business queries involving multiple tables from different databases.
•Performed prediction tasks using deep learning models such as RNN, LSTM, etc. on time series data.
•Created E/R Diagrams, Data Flow Diagrams, grouped and created the tables, validated the data, identified lookup tables.
•Ensemble models in Python. Evaluate and validate the models with test data and chose the best option.
•Extensive use of Python Machine Learning libraries NumPy, Pandas,keras, Pytorch,Scikit-Learn, Matplotlib, Seaborn, Stats models, PySpark, Deeplearning4j etc.
•Responsible for design and development of advanced Python programs to prepare transform and harmonize data sets in preparation for modeling.
•Performed data blending and build Tableau dashboards to provide aggregated data view to the decision makers on real time basis.
•Supporting the maintenance of up-to-date procedures for the programmer’s wider governance library, including data dictionaries for sources used by the team.
•Ensuring conformance with Data Governance principles and standards, Data Privacy etc.
•Integrate data from different source systems such as AWS, Hadoop (MapReduce), data warehouse and Big Data platforms.
•To write complex SQL queries, procedures, functions to retrieve data from DBMS.
•To measure and boost the performance of bundle of solutions to enhance value of each sale. Conduct market basket analysis to come up with new bundles.
•Actively participated in the team’s cross-training initiative with a focus on automation initiatives to lead, inspire and coach other employees in an inclusive and ever-changing work environment.
•Conducted design sessions with Business Analysts and ETL developers to come up with a design that satisfies the organization’s requirements.
•Worked with Database Administrators (DBA's) to finalize the physical properties of the tables such as Partition key, based on volumetric.
•Member of Data model and Design Review group that oversees data model changes and system design across Enterprise Data Warehouse.
lient: American Express
Role: Data Scientist / AI Engineer. June 2022 – Sep 2024
Client: Nationwide Mutual Insurance, Phoenix, AZ, USA.
Role: Data Scientist. Sep 2021 to May 2022
Responsibilities:
•Working closely with marketing team to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters.
•Characterizing false positives and false negatives to improve a model for predicting customer churn rate.
•Consumer segmentation and characterization to predict behavior.
•Analyzing promoters and detractors (defined using Net Promoter Score).
•Used advanced analytics techniques like Clustering and Linear Regression to gain valuable insights that enabled clients to track Business performance and optimize pricing for vendors.
•Completed Exploratory data analysis (EDA) in Jupyter and communicated results to Business.
•Built models for highly imbalanced data sets using Bias/variance trade-off.
•Developed and interpreted information by creating dashboards using Tableau that assisted management with decision making to enhance customer service.
•Reviewed and analyzed customer behavior and trends to assist in solving business problems.
•Responsible for generating ideas for product changes that improve key metrics.
•Write SQL queries to perform data analysis, data modeling and prepare data mapping documents to explain the transformation rules from source to target tables.
•Provided data analytics of the web-portal to the team for feedback and improvement.
•Work with a variety of business units throughout the organization to help translate their requirements into specific analytical deliverables.
•Outlier detection using high-dimensional historical data. Acquiring, cleaning and structuring data from multiple sources and maintain databases/data systems. Identifying, analyzing, and interpreting trends or patterns in complex data sets.
•Developing, prototype and test predictive algorithms. Filtering and "cleaning" data and review computer reports, printouts, and performance indicators to locate and correct code problems.
•Developing and implementing data collection systems and other strategies that optimize statistical efficiency and data quality.
•Used different statistical models like regression and classification models to create contact scoring models. Also used clustering to the customer data profiles to do customer segmentation and analysis.
•Interpreting data analyze results using statistical techniques and provide ongoing reports.
•Leveraged Machine learning algorithms using Microsoft Azure Machine Learning Platform.
•Building a recommender system based on client’s past renewal history to upsell and cross-sell them other related products or services. Created a recommendations engine that finds related customers, products.
•Worked with storage databases like VLDB in Azure to store large information.
Client: HP, Houston, TX.
Role: ML Engineer. Oct 2019 to Aug 2021
Responsibilities:
•As one of the youngest employees in the company to be promoted to the role of team lead, I led teams to gather business requirements, build BI applications to provide insights and provide business solutions based on discoveries made.
•Analyzed data quantitatively and qualitatively with different tools like SAS, Tableau, R Programming, Talend, Informatica and Qlik Sense.
•Collaborated with the business community and other BI team members on the definition, construction and deployment of reports, dashboards, metrics and analytic components, as well as data migration design and testing.
•Scheduled data refresh on Tableau server for weekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately and sent those to production after editing.
•Experience in trouble shooting and performance tuning of reports and resolving issues within tableau server and reports.
•Developed many python applications for extracting data from different sources like PostgreSQL and export to formats desired by client.
•Operationalized the extracts generated from the python applications so that the extracts are automatically stored in AWS S3 and emails are sent to the respective clients based on the frequency configured.
•Built complex views to transform data for the reporting needs from various external data sources like LYNX, IHS, SFDC
•Strong experience in Extraction, Transformation, loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatica Power Center (Designer, Workflow Manager, Workflow Monitor, Metadata Manger)
•Extensively worked on Informatica Designer Components such as Source Analyzer, Transformations Developer, Mapplet and Mapping Designer.
•Expertise in Data Cleansing at staging level with utilization Informatica Power Center.
•Designed and developed Informatica Mappings and Sessions based on business user requirements and business rules to load data from source flat files and oracle tables to target tables.
•Worked on the design, development and testing of Informatica mappings. Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle.
Client: TD Bank.
Role: Data Analyst. July 2017 to Sep 2019
Responsibilities:
•Working as Data Analyst for managing all Data Analytics, Data Mining, Tableau, ETL, and Data warehouse related development activity.
•POC for the most critical and performance-related modules in the project.
•Involved in creating daily support to business users in developing new reports using Power BI for Finance, Marketing and Sales teams.
•Fluent with DAX, Power Query.
•Worked on ticketing metrics Dashboard and BI reports migration project from Hyperion, OBIEE, SWR (Simple Web Report), SSRS to Power BI.
•Created complex SQL scripts to extract data from Oracle, SQL Server, AWS Athena, Data Lake, Hive to make data available for enterprise reporting.
•Used Power BI and Power Pivot to develop data analysis prototype and used Power View and Power Map to visualize reports.
•Gathered reports requirement; collaborated with the business users using wireframe mockups.
•Designing, developing, and delivering business intelligence solutions using SQL Server Integration Services (SSIS), Analysis Services (SSAS), and Reporting Services (SSRS).
•Mentoring the junior and cross-cultural colleagues for business and technical knowledge.
•Member of Design and Review Team for all new enhancement requests.
•Document Delivery related to delivery and deployment of Production System.
•Occasional support for System Testing / User Acceptance Testing /Simulation Environment Testing/ Production.
•Automated BI reports for a secure virtual server environment.
•Planned, designed, and implemented diverse business intelligence solutions.
•Designed customized data collection models for specific visualization tasks.
•Improved data gathering, Analysis, and visualization procedures with strategic optimizations.
•Generated reports, updated spreadsheets, and presented results.
•Interfaced with a cross-functional team of business analysts, developers, and technical support professionals to determine a comprehensive list of requirement specifications for new applications.
•Worked closely with other business analysts, development teams, and infrastructure specialists to deliver high availability solutions for mission-critical applications.
•Document Delivery related to delivery and deployment of Production System.
•Support for System Testing / User Acceptance Testing /Simulation Environment Testing/ Production.
•Tracking and Executing the Delivery process for stiff Deadlines.
•Mentoring the junior and cross-cultural colleagues for business and technical knowledge.
•Worked in various phases of SDLC, starting from requirements gathering, Analysis, documentation, UAT and production deployment, support & enhancements.
•Performance tuning guidelines for the production system.
•Part of Design and Review Team for all new enhancement requests.
•Prepared detailed reports on updates to project specifications, progress, identify conflicts, and team activities.
•Worked on End to End delivering Data Analytics Solutions.
•Worked on performance improvement of reports.
•Implemented PL/SQL procedure to develop ETL (Extract Transform Load) process to collect the data required for Analysis from the oracle databases.
•Create, Update and Delete the master data in the Oracle, Azure databases.
•Working on the data related issues and communicating resolutions with other solution domains.
•Analyzing the raw data, drawing conclusions, and developing the models accordingly.
•Developed the Leadership dashboards to monitor the performance of various analytics.
•Owned all the business reviews – WBR, MBR, QBR, and YBR and implemented automation and improvements.
Client: Windstream Communications, Little Rock, AR
Role: Data Analyst. July 2015 to June 2017
Responsibilities:
•Collect data from sales, marketing, and finance departments to use in establishing organizational structure.
•Manage the planning and development of design and procedures for metrics reports.
•Successfully interpreted data in order to draw conclusions for managerial action and strategy (WBR, QBR, MBR).
•Coordinated statistical data analysis, design and information flow.
•Synthesized current business intelligence data to produce reports and polished presentations and recommending changes.
•Utilized Google Analytics and Google Tag Manager and implemented new scripts that increased performance by 25%.
•Developed database objects, including tables, views and materialized views using SQL.
•Presented detailed reports about the meaning of gathered data to members of management and help them identify scenarios utilizing modifications in the data.
•Determines work procedures, prepares work schedules, and expedites workflow.
•Develop, perform and manage all aspects of financial planning, budgeting.
•Perform business analysis and solutions around data sources using SQL and data profiling tools.
•Worked with the client in writing alerts, dashboards using SPL (search processing language).
•Experience in System and network analysis, Malware analysis, and Intrusion Detection.
•Created Event types, tags, field lookups using regular expressions.