Machine Learning Data Scientist

Location:

Columbus, OH

Posted:

August 04, 2025

Contact this candidate

Resume:

Gnaneswar Vipparthi

Data Scientist

OBJECTIVE: Experienced and results-driven Data Scientist with 6+ years of expertise in leveraging advanced analytics machine learning algorithms, and data-driven strategies to solve complex business problems and Proficient in data mining, statistical analysis, predictive modelling, and big data technologies.

PROFILE SUMMARY:

Experienced Data Scientist with 6+ years of expertise in leveraging machine learning, deep learning, and statistical modelling techniques to derive actionable insights and solve complex business problems across various industries.

Proficient in programming languages including Python, R, and SQL, with extensive experience in data preprocessing, feature engineering, and data wrangling for clean and structured data.

Skilled in building and deploying predictive models, utilizing algorithms like Random Forest, Gradient Boosting Machines, SVM, KNN, and Neural Networks to address business challenges and improve decision-making processes.

Extensive experience with big data technologies such as Hadoop, Spark, and Hive for handling and processing large datasets in a distributed computing environment and Experience with cloud platforms including AWS, Azure, and Google Cloud to deploy models, manage data, and implement scalable machine learning solutions.

Expertise in data visualization using tools like Tableau, Power BI, Matplotlib, Seaborn, and Plotly to communicate findings and insights through interactive dashboards and reports and strong understanding of statistical analysis and hypothesis testing, with proficiency in A/B testing, regression analysis, time-series forecasting.

Hands-on experience in working with NoSQL databases like MongoDB and Cassandra, and relational databases such as MySQL, PostgreSQL, and Oracle to manage and query structured and unstructured data.

Familiarity with Docker and Kubernetes for containerization and deployment of data science models and workflows.

Utilized NLP techniques and libraries such as spaCy, NLTK, and Transformers for text mining, sentiment analysis, and language modelling and built and maintained CI/CD pipelines for machine learning workflows using tools like Jenkins and Git for version control and experienced in using TensorFlow, Kera’s, and PyTorch for developing and training deep learning models for advanced AI applications such as image and speech recognition.

Strong knowledge of data pipelines and workflow automation using Apache Airflow, Luigi, and Kedro to streamline data processing and model deployment and Extensive experience in anomaly detection and outlier analysis using techniques like Isolation Forest, Autoencoders, and Z-Score to detect fraudulent activities or rare events in data.

Proficient in using Graph Databases like Neo4j for modelling and analysing complex relationships and networks in data, especially for social network analysis and recommendation engines.

Familiarity with Time-Series Analysis and forecasting techniques using models like ARIMA, SARIMA, Prophet, and LSTM for applications in financial analysis, sales prediction, and demand forecasting.

Experience with Edge Computing and deploying models on low-power devices using TensorFlow Lite and ONNX to perform real-time inference at the edge and Solid experience with version control systems like Git and GitHub, and collaboration tools like JIRA for project management in data science workflows.

Proven ability to build and manage data lakes using technologies like AWS S3, Azure Blob Storage, and Google Cloud Storage to store vast amounts of structured and unstructured data for analysis.

Strong understanding of data ethics and privacy regulations such as GDPR and HIPAA, ensuring compliance while handling sensitive data in machine learning projects and Proficient in using ETL tools like Talend, Apache NiFi, and Informatica to extract, transform, and load large volumes of data into data lakes or warehouses for analysis.

Experience in developing chatbots and conversational AI using platforms like Dialog flow, Rasa, and Microsoft Bot Framework for automating customer interactions.

Skilled in model performance tuning and hyperparameter optimization using GridSearchCV, Randomized SearchCV, Optuna, and Hyperopt to enhance the accuracy and efficiency of machine learning models and Proficient in Docker and Kubernetes for containerization, orchestration, and deployment of machine learning models in cloud environments.

TECHNICAL SKILLS:

Programming Languages: Python, R, SQL, Java, C++

Data Analysis & Manipulation: Pandas, NumPy, SciPy, Dplyr, Data. Table

Machine Learning Algorithms: Linear Regression, Logistic Regression, SVM, Random Forest, Gradient Boosting, KNN,

Deep Learning: TensorFlow, Kera’s, PyTorch, Theano, Caffe

Natural Language Processing (NLP): spaCy, NLTK, Text Blob, Genism, Transformers, BERT

Data Visualization: Matplotlib, Seaborn, Plotly, Tableau, Power BI, ggplot2

Big Data Technologies: Apache Hadoop, Apache Spark, Hive, Pig, Flink

Cloud Computing Platforms: AWS, Google Cloud Platform (GCP), Azure, IBM Cloud

Data Warehousing: Amazon Redshift, Google Big Query, Snowflake, Teradata

Databases: MySQL, PostgreSQL, MongoDB, Cassandra, SQL Server, Oracle

ETL Tools: Apache NiFi, Talend, Informatica, Pentaho

Serverless Computing: AWS Lambda, Google Cloud Functions

CI/CD for Data Science: Jenkins, GitLab CI, CircleCI

Containerization & Virtualization: Docker, Kubernetes

WORK EXPERIENCE:

Kroger Company

Data Scientist Cincinnati, Ohio, USA Sep 2024 – Present

Description: The Kroger Co. is one of the largest grocery retailers. Develop statistical models, machine learning algorithms, and predictive analytics solutions and Analyse large amounts of complex data to extract insights and Design experiments to test hypotheses and measure the effectiveness of solutions and handle missing values, outliers, and inconsistencies in the data to ensure accuracy and reliability.

Responsibilities:

Analysed high-volume retail data using Python, R, and SQL to extract customer behaviour patterns and drive decisions in Kroger’s supply chain and store operations and built machine learning models with Random Forest, Gradient Boosting, and XGBoost to forecast product demand, reduce spoilage, and optimize in-store inventory placement.

Developed real-time analytics pipelines with Apache Kafka, AWS Lambda, and Apache Airflow to process point-of-sale and e-commerce transactions across Kroger’s digital platforms.

Utilized Apache Spark, Hadoop, and Hive to process terabytes of transactional and customer loyalty data, enhancing personalization strategies and marketing effectiveness and designed interactive dashboards and reports using Tableau, Power BI, and Matplotlib to visualize store-level performance, customer churn, and sales trends.

Applied TensorFlow, Keres, and PyTorch to build deep learning models for recommendation systems and visual product search capabilities on Kroger’s online store and Performed Natural Language Processing (NLP) tasks using spaCy and NLTK to analyse customer feedback, product reviews, and chatbot transcripts for sentiment and intent.

Managed scalable deployments on AWS and Google Cloud Platform (GCP) for training and serving machine learning models used in pricing and promotion optimization. And developed and deployed microservices with Flask and Django to integrate analytics and model outputs with Kroger’s enterprise retail systems and mobile applications.

Leveraged Neo4j to map supply chain networks and uncover inefficiencies or bottlenecks in product sourcing and distribution routes and followed Agile development practices with tools like JIRA for sprint planning and team collaboration in cross-functional data science projects and tuned model performance through GridSearchCV, Optuna, and RandomizedSearchCV, improving accuracy and reducing overfitting in predictive analytics workflows.

Implemented secure API integration using OAuth 2.0 and JWT to control access to customer data and ensure compliance with data protection policies and Implemented Snowflake and dbt (Data Build Tool) to transform and model retail data in a modern data warehouse environment, enabling efficient reporting and analytics across merchandising and logistics teams.

Environment: Python, R, SQL, SciPy, Pandas, NumPy, XGBoost, Apache Kafka, AWS IoT, Apache Airflow, AWS Lambda, Apache Spark, Hadoop, Hive, Tableau, Matplotlib, Power BI, TensorFlow, Kera’s, PyTorch, spaCy, NLTK, Flask, Django, JIRA, GridSearchCV, Optuna.

ProMedica

Data Scientist Toledo, Ohio, USA Oct 2023 – Aug 2024

Description: ProMedica is a mission-based, not-for-profit health and well-being organization. Managed and organized large datasets for efficient analysis and Ensuring data quality, accuracy, and security, including compliance with privacy regulations and developing and implementing machine learning algorithms to solve specific healthcare problems.

Responsibilities:

Built and maintained ETL workflows using Apache Airflow and Azure Data Factory to ensure smooth integration of data from EHR systems, wearable devices, and patient management platforms.

Created interactive dashboards and clinical reporting tools with Tableau, Power BI, and Plotly for hospital administrators and healthcare providers to track KPIs and patient outcomes and developed predictive models using Scikit-learn, XGBoost, and LightGBM to identify high-risk patients and enable early interventions in chronic disease management.

Analysed patient and operational datasets using Python, SQL, and R to improve care delivery efficiency and reduce readmission rates across ProMedica's healthcare facilities.

Utilized Hadoop and Apache Spark to process large-scale healthcare data, including insurance claims, diagnostic imaging, and lab results, for operational analytics and applied Natural Language Processing (NLP) with spaCy and NLTK to extract critical insights from unstructured clinical notes and improve documentation quality.

Employed Docker and Kubernetes to deploy machine learning models as microservices in clinical decision support systems and Integrated Neo4j to model relationships in patient histories, treatment plans, and healthcare provider networks for personalized care recommendations and Leveraged Azure Synapse Analytics and Snowflake for cloud-based storage, analytics, and compliance with healthcare data governance policies.

Streamlined real-time data ingestion using Apache Kafka to monitor ICU vitals and enable timely alerts to medical staff.

Deployed deep learning models using TensorFlow and PyTorch for early detection of anomalies in radiology images and automated triaging in diagnostic workflows and Utilized Google Big Query for cost-efficient querying of massive clinical datasets, enabling fast population health analytics and trend identification.

Developed Flask and Fast API-based microservices to expose predictive analytics models to hospital management systems in real-time and Employed Elasticsearch to enable full-text search on electronic medical records, improving accessibility and speed in clinical data retrieval and built secure, scalable APIs with OAuth 2.0 and JWT authentication to ensure HIPAA-compliant access to sensitive patient data from mobile and web apps.

Integrated FHIR and HL7 data standards for seamless interoperability between analytics systems and ProMedica’s EMR platforms.

Environment: Python, SQL, R, Apache Airflow, Azure Data Factory, Tableau, Power BI, Plotly, Scikit-learn, XGBoost, LightGBM, Hadoop, Apache Spark, NLP, spaCy, NLTK, Docker, Kubernetes, Neo4j, Azure Synapse Analytics, Snowflake, Apache Kafka, TensorFlow, PyTorch, Google Big Query, Elasticsearch, OAuth 2.0, JWT, FHIR, HL7.

Kotak Mahindra Bank

Data Analyst/ Data Scientist Bangalore, India Jan 2021 – June 2023

Description: Kotak Mahindra Bank is a prominent Indian banking and financial services institution. Analyse and interpret large sets of financial data to help the bank make informed decisions regarding customer needs, market trends, and performance improvements and Create reports and visual presentations that summarize key findings, helping management understand patterns in customer behaviour, loan performance, and market opportunities.

Responsibilities:

Extracted and analysed large volumes of financial data using SQL, Python, and R to improve customer targeting, transaction analysis, and portfolio optimization and built predictive models for credit scoring and fraud detection using scikit-learn, TensorFlow, and XGBoost, enhancing risk assessment and customer segmentation.

Developed dashboards and reports using Tableau, Power BI, and Looker for executive visibility into loan performance, customer churn, and revenue trends and utilized Hadoop, Apache Spark, and Hive for distributed processing of banking transactions, customer history, and ATM network data and integrated data warehousing solutions using Snowflake, Big Query, and Amazon Redshift to support scalable and efficient financial reporting.

Applied NLP with spaCy, NLTK, and Transformers to analyse call centre transcripts and customer complaints for sentiment classification and service improvements.

Processed and stored semi-structured data using MongoDB, Cassandra, and indexed datasets with Elasticsearch for high-performance retrieval and implemented Text Blob, Genism, and BERT for financial news sentiment analysis and customer feedback mining to inform strategic decisions and used Git, GitHub, and Bitbucket for version control and implemented CI/CD with Jenkins, and GitLab CI/CD for seamless deployment of analytics pipelines.

Conducted financial time series forecasting and anomaly detection using Prophet, ARIMA, and LSTM models to support investment and treasury operations.

Designed and maintained scalable data pipelines using Apache Airflow, NiFi, and AWS Glue to automate ETL workflows across multiple banking systems and implemented secure API integrations with internal and external systems using Flask, Fast API, and OAuth 2.0 for seamless data exchange and service interoperability.

Deployed containerized data science models using Docker, orchestrated with Kubernetes, for real-time fraud detection and credit scoring and Utilized Kafka and AWS Kinesis for real-time streaming of transactions and customer activity logs to enhance fraud monitoring and dynamic risk scoring.

Conducted graph-based analysis using Neo4j and GraphQL to detect fraudulent account networks and complex customer relationships and developed secure data access layers with JWT, LDAP, and Keycloak to ensure regulatory compliance and role-based control in sensitive banking systems.

Environment: SQL, Python, R, TensorFlow, PyTorch, scikit-learn, Hadoop, Spark, Hive, Tableau, Power BI, Matplotlib, Snowflake, Redshift, Big Query, spaCy, NLTK, ArcGIS, MongoDB, Cassandra, Elasticsearch, NoSQL databases, Jenkins, GitLab CI/CD, Prophet, ARIMA, LSTM, Apache Airflow, NiFi, AWS Glue, Fast API, OAuth 2.0, Docker, Kubernetes, AWS Kinesis, Neo4j, GraphQL, JWT, LDAP, Keycloak

Max Life Insurance

Data Analyst Bangalore, India June 2018 – Dec 2020

Description: Max Life Insurance is one of the leading private-sector life insurance companies offering a wide range of life insurance products. Collected and organized the large sets of data from various sources, such as customer records, policies, and claims, and Analyse patterns in customer behaviour, claims, and market trends to provide insights that can help improve the company’s products and services.

Responsibilities:

Extracted, cleaned, and processed large datasets from multiple sources, including customer records, policy details, and claims data, using SQL and Python to ensure data accuracy and integrity for analysis.

Developed and maintained dashboards and reports using Tableau and Power BI to communicate key performance indicators (KPIs) such as policyholder retention, claims ratio, and premium growth to stakeholders.

Implemented automated data pipelines for the efficient collection, transformation, and analysis of large volumes of insurance data, using tools like Apache Airflow and ETL processes.

Utilized Hadoop and Spark to process and analyse large-scale insurance datasets, improving operational efficiency and enabling real-time data analytics for enhanced decision-making.

Implemented natural language processing (NLP) techniques using spaCy and NLTK to extract insights from customer feedback, policy documents, and claims reports, improving service offerings and customer satisfaction.

Managed unstructured data using NoSQL databases like MongoDB and Cassandra, enabling efficient storage and retrieval of customer interactions, policy details, and claim histories.

Leveraged Google Big Query and Redshift for high-performance data querying and analysis, optimizing the bank’s ability to analyse large volumes of data quickly for business insights.

Collaborated with IT teams to implement CI/CD pipelines using GitLab CI and Jenkins, automating the deployment of data models and ensuring consistent integration of new data sources into the analysis pipeline.

Implemented real-time stream processing and alerting systems with Apache Flink and Kafka Streams to detect anomalies in claim submissions and transaction patterns.

Environment: SQL, Python, Tableau, Power BI, Apache Airflow, ETL, Hadoop, Spark, spaCy, NLTK, MongoDB, Cassandra, Google Big Query, Redshift, GitLab CI, Jenkins, Apache Flink, Kafka Streams.

EDUCATION:

INDIANA WESLEYAN UNIVERSITY, Master of Science in Data Analytics, USA from 2023 - 2025

*************@*****.***

314-***-****

Contact this candidate