Name: Rupesh Gundeti
Email ID: ****************@*****.*** Phone no. 913-***-****
Professional Summary:
●As a Data Science/Analyst with 8+ years of experience, I have a successful history of creating machine learning models using NLP techniques to enhance cost efficiency for the business use cases.
My expertise lies in implementing advanced AI technologies, particularly focusing on large language models (LLMs) and vector databases.
●Applied transfer learning techniques to adapt pre-trained models to new tasks in Machine Learning, NLP, and Generative AI, reducing training time and resource requirements.
●Implemented computer vision algorithms for image analysis tasks, such as object detection and image classification, alongside skills in Machine Learning, NLP, and Generative AI.
●Developed NLP solutions, including sentiment analysis and text generation, using state-of-the-art models like BERT and GPT, showcasing expertise in Machine Learning, NLP, and Generative AI.
●Demonstrated expertise in statistical modeling and analysis, applying advanced techniques like regression analysis, classification metrics, and clustering metrics to extract meaningful insights and drivk.e informed decision-making.
●Skilled in utilizing ANOVA to evaluate differences in means between groups, Bayesian statistics to update beliefs based on prior knowledge and evidence, and resampling methods like bootstrapping for estimating model performance and generalization.
●Experienced in dimensionality reduction techniques such as PCA and LDA, applying them to effectively reduce the complexity of datasets while preserving important information.
●Proficient in time series analysis, employing methods like ARIMA and exponential smoothing to model and forecast sequential data, enabling proactive decision-making based on future trends.
●Knowledgeable in survival analysis techniques like Kaplan-Meier estimation and Cox proportional hazards model, facilitating the analysis of time-to-event data in various fields.
●Integrated ETL pipelines with cloud-based storage and processing services such as AWS S3 and Redshift, enabling seamless data integration.
●Created comprehensive documentation for ETL pipelines, including data mappings and transformation logic, to facilitate easy maintenance and troubleshooting.
●Conducted performance tuning and troubleshooting of ETL processes to identify and resolve issues, ensuring smooth data flow.
●Utilized change data capture (CDC) techniques in ETL pipelines to process incremental data updates, minimizing processing time and resource usage.
●Worked closely with data architects to design data models supporting ETL processes and aligning with business requirements.
●Managed ETL pipeline deployments and releases, ensuring minimal downtime and smooth transition to new versions.
Regularly audited and monitored ETL pipelines to maintain data quality and compliance with data governance policies.
●Explored reinforcement learning algorithms for optimizing decision-making processes in dynamic environments, demonstrating proficiency in Machine Learning, NLP, and Generative
●Designed custom model architectures to tackle specific business challenges, optimizing for performance and scalability.
●Directed model training and optimization initiatives, leveraging advanced techniques to enhance overall model performance.
●Implemented hyperparameter tuning strategies to enhance model generalization and performance.
●Applied transfer learning methods to adapt pre-existing models to new tasks, minimizing training time and resource requirements.
●Utilized computer vision algorithms for various image analysis tasks, such as object detection and image classification.
●Created NLP solutions, including sentiment analysis and text generation, leveraging cutting-edge models like BERT and GPT.
●Explored reinforcement learning algorithms to improve decision-making in dynamic environments.
●Ensured model interpretability by utilizing techniques such as SHAP and LIME, aiding stakeholders in understanding model decisions.
●Led deployment and scalability efforts for AI models, ensuring smooth integration into production environments.
●Optimized model performance for real-time applications, enhancing response times and resource utilization.
●Advocated for ethical AI practices, ensuring fairness and transparency in AI systems.
●Collaborated with cross-functional teams to effectively communicate AI solutions and their business impact.
●Committed to continuous learning, staying abreast of the latest AI and machine learning advancements.
●Developed and deployed machine learning models using AWS SageMaker, enhancing prediction accuracy and model performance across various business applications.
●Collaborated cross-functionally to implement AI applications with Hugging Face Transformers, enhancing natural language processing capabilities and driving innovation.
●Utilized Databricks on Azure for real-time data processing and analytics, facilitating quicker decision-making and operational improvements.
●Integrated Azure Cognitive Services APIs into existing applications, enhancing functionality and user experience.
●Built and optimized machine learning pipelines on Google Cloud Platform, leveraging its AI and ML services to develop innovative solutions.
●Designed and implemented deep learning models with Hugging Face Transformers, achieving state-of-the-art performance in various natural language understanding tasks.
●Managed and optimized AWS SageMaker instances, ensuring cost-effective and scalable machine learning infrastructure.
●Utilized Databricks notebooks for data exploration and rapid model prototyping, accelerating development cycles.
●Created visualizations and dashboards on Databricks to provide insights into data trends for stakeholders.
Developed and maintained data pipelines in Databricks to ensure data quality and reliability.
●Implemented ETL processes in Databricks to transform raw data into usable formats, improving data accessibility.
●Conducted performance tuning and optimization of Spark jobs on Databricks, resulting in improved job execution times.
●Integrated Databricks with cloud services such as AWS and Azure for seamless data integration.
●Designed and implemented data lake architectures on Databricks, enabling efficient storage and retrieval of large data volumes.
●Architected data lakes on Azure using Databricks, improving data accessibility and enabling advanced analytics.
●Conducted performance optimizations on Azure Cognitive Services, improving resource utilization and processing speed.
●Leveraged Google Cloud Platform's AI Platform for training and deploying machine learning models, enabling scalable and efficient model management.
●Fine-tuned and deployed custom models with Hugging Face Transformers for specific business use cases, achieving exceptional performance.
●Orchestrated AWS SageMaker pipelines for seamless model training and deployment, ensuring efficient ML workflows.
●Provided training sessions on Databricks, Azure Cognitive Services, GCP, Hugging Face, and AWS SageMaker to internal teams, empowering them to utilize these technologies effectively.
●Collaborated with data engineers to design and implement data pipelines on Azure using Databricks, ensuring reliable data ingestion and processing.
●Integrated Azure Cognitive Services with chatbots for natural language understanding, enhancing customer engagement.
●Implemented AWS SageMaker machine learning models for predictive maintenance, reducing downtime and enhancing operational efficiency.
Education Details:
Master's - Sept 2015 - May 2017
Company: Jacobs Solutions Inc. April 2021 - PRESENT
Role: Data Scientist
Washington, District of Columbia, United States
Responsibilities:
●Skilled in designing and implementing custom NLP pipelines, including data preprocessing, feature extraction, and model evaluation, using Python and libraries like NLTK and spaCy. Proficient in writing efficient SQL queries for data manipulation, extraction, and integration across relational databases.
Utilized OpenAI Gym for reinforcement learning tasks in NLP, enhancing algorithm performance in dialogue systems and chatbots. Developed and deployed conversational AI systems using NLP techniques, enabling natural language interaction with applications and services.
●Extensive experience with sequence-to-sequence models for machine translation and text summarization, using frameworks like TensorFlow and PyTorch. Implemented topic modeling techniques such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) for document clustering and topic extraction.
●Proficient in using the Hugging Face Transformers library for fine-tuning pre-trained language models on domain-specific datasets, improving model performance for specialized tasks. Developed text generation models using recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, generating coherent and context-aware text.
●Employed Topic Modeling and Word Embeddings for insights into customer behavior and conducted A/B testing using SageMaker experiments to identify top-performing models for deployment. Utilized chain-of-thought prompting tactics to enhance interactions and encourage meaningful conversations with customers.
●Leveraged Machine Translation and Text Summarization for multilingual support and concise responses. Utilized Google Vertex AI Lang chain for language model creation and deployment, emphasizing natural language generation, and integrated Question Answering for interactive customer support solutions.
●Enabled real-time predictions by integrating SageMaker with AWS IoT services for efficient streaming data processing. Experienced in MXNet for scalable deep learning computations, especially in distributed environments. Knowledgeable about vector databases for efficient storage and retrieval of high-dimensional data.
●Proficient in prompt engineering and using tools like llama2 and langchain for optimizing model performance and generating high-quality outputs. Skilled in leveraging cloud services like Data Bricks, Azure, GCP, and AWS for scalable and cost-effective machine learning solutions.
●Understanding MLOps principles and experience with tools like LLMOPS and Mlflow for managing and automating machine learning workflows. Integrated GPT for advanced language processing and utilized distributed computing technologies like Apache MXNet, CUDA, and TensorFlow for high-performance deep learning models.
●Utilized GAN libraries like TensorFlow-GAN for image generation tasks and employed chatbot capabilities enhanced by Language Model (LLM) for a user-friendly conversational interface. Developed prototypes and products for Optical Character Recognition (OCR) using statistics, machine learning, programming, and data modeling.
●Utilized Azure Databricks ecosystem and ML Flow for model deployment and monitoring of recommendation systems. Selected features using greedy heuristic and optimized models using machine learning techniques such as time series methods, regression, random forests, and neural networks.
●Collaborated with cross-functional teams to translate complex challenges into data science projects, integrated vector databases for efficient storage and retrieval of embeddings. Experienced in analyzing and improving customer engagement in digital marketing through data-driven strategies, resulting in a 20% increase in customer engagement.
●Spearheaded the implementation of LLMOPS (Large Language Model Operations) and mlOPS practices, ensuring efficient deployment and management of large language models (LLM) and machine learning models, respectively using Databricks. This initiative streamlined the development process and improved model deployment efficiency by 10%.
Utilized Databricks for data engineering tasks, including ETL processes, data pipelines, and advanced analytics. Created visualization graphs for interactive and customizable dashboards, reports, and charts using Tableau. Integrated Grafana for real-time monitoring
of AI systems, providing stakeholders with actionable insights and improving operational efficiency.
●Employed Language Modeling and Dependency Parsing for context-aware responses and developed asynchronous streaming processes using Vertex AI for real-time text analysis. Implemented Coreference Resolution to improve conversation coherence and enhance customer interactions.
●Employed various prompt types for prompt responses and detailed replies, improving customer engagement, and used Text Generation for personalized interactions. Leveraged Python for backend applications, integrating PySpark libraries for web scraping and data visualization using R Shiny App.
●Integrated GPT for advanced language processing and utilized distributed computing technologies like Apache MXNet, CUDA, and TensorFlow for high-performance deep learning models. Utilized GAN libraries like TensorFlow-GAN for image generation tasks and employed chatbot capabilities enhanced by Language Model (LLM) for a user-friendly conversational interface.
●Developed prototypes and products for Optical Character Recognition (OCR) using statistics, machine learning, programming, and data modeling, and utilized Hugging Face Transformers for text analysis tasks. Proficient in using Keras as a high-level neural networks API, particularly with TensorFlow backend, for rapid prototyping and deployment of deep learning models.
●Developed and deployed various deep learning models using Keras, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers. Experience using OpenAI Gym for developing and testing reinforcement learning algorithms in environments such as Atari games, robotics simulations, and gridworlds.
●Familiarity with the OpenAI Gym API for creating custom environments and evaluating the performance of reinforcement learning agents. Proficient in using OpenAI's GPT models for natural language processing tasks, such as text generation, text completion, and text classification.
●Experience fine-tuning GPT models on specific datasets for improved performance on domain-specific tasks. Knowledgeable about Variational Autoencoders (VAEs) and their applications in unsupervised learning, generative modeling, and dimensionality reduction.
●Experience implementing VAEs using frameworks like TensorFlow and PyTorch for various machine learning tasks. Experience using GAN libraries such as TensorFlow-GAN and PyTorch-GAN for training and evaluating generative adversarial networks. Developed and experimented with GAN architectures for generating realistic images, videos, and other types of data.
●Proficient in using Hugging Face's Transformers library for working with pre-trained transformer models such as BERT, GPT, and T5. Experience fine-tuning transformer models for tasks like text classification, named entity recognition (NER), and question answering.
●Experience using the Fastai library for deep learning, including its high-level abstractions for training models, data augmentation, and visualization. Developed deep learning models with Fastai for tasks like image classification, tabular data analysis, and natural language processing.
Utilized Azure Databricks ecosystem and ML Flow for model deployment and monitoring of recommendation systems, and employed Fastai for rapid model iteration and deployment. Selected features using greedy heuristic and optimized models using machine learning techniques such as time series methods, regression, random forests, and neural networks.
●Employed MXNet for efficient model training on large datasets. Collaborated with cross-functional teams to translate complex challenges into data science projects, integrated vector databases for efficient storage and retrieval of embeddings. Experienced with orchestration tools like Terraform for GCP cloud migration, implemented prompt engineering for customized model behavior.
●Established databases and storage in GCP using RDS and S3 bucket, configured instance backups, and utilized llama2 for large-scale data processing. Proficient in SQL for querying and managing large datasets, and experienced in Python for building, deploying, and maintaining data pipelines and machine learning models.
Company: West Pharmaceutical Services, Inc. May 2019 - April 2021
Role: Sr.Data Engineer
New York City Metropolitan Area
Responsibilities:
●Spearheaded the implementation of MLOps practices, improving the efficiency and reliability of machine learning model deployment. Developed CI/CD pipelines for ML models, integrating Git, GitHub Actions, and MLflow to automate model deployment workflows.
●Led the adoption of MLflow for experiment tracking, model packaging, and deployment, enhancing collaboration and reproducibility. Implemented version control and model registry using MLflow, ensuring traceability and easy management of model versions. Managed and monitored machine learning models in production using MLOps tools like MLflow, ensuring optimal performance and reliability.
●Collaborated with data scientists to optimize model performance using MLOps best practices and techniques. Integrated MLflow with AWS SageMaker and Azure Cognitive Services for seamless model deployment and management. Utilized MLflow for experiment tracking, model packaging, and deployment, improving the efficiency and reproducibility of ML projects.
●Enhanced model deployment processes by integrating MLflow with cloud-based services like AWS SageMaker and Azure Cognitive Services. Managed model lifecycle using MLflow, from experimentation to deployment, ensuring seamless integration with existing systems. Implemented automated testing and validation processes for ML models using MLflow, improving overall model quality.
●Developed custom MLflow plugins to extend functionality and improve workflow automation. Conducted regular audits and reviews of ML models using MLflow, ensuring compliance with regulatory requirements. Collaborated with cross-functional teams to define and implement MLOps best practices using MLflow. Optimized resource allocation and usage for ML model training and deployment using MLflow.
Implemented model performance monitoring and alerting using MLflow, ensuring timely detection and resolution of issues. Contributed to the open-source MLflow community by sharing best practices and contributing to the development of new features. Developed standardized processes for model deployment and monitoring, ensuring consistency and reliability across the organization.
●Designed and executed ETL pipelines for consolidating data from diverse sources into a central data warehouse, enhancing data accessibility and accuracy. Automated data ingestion and transformation using Python, Pandas, and Apache Airflow, reducing manual efforts and enhancing efficiency.
●Utilized Python with Pandas and NumPy for data preprocessing, manipulation, and analysis, significantly improving data handling and transformation processes. Developed custom Python scripts to automate repetitive data tasks and integrate with SQL databases for streamlined data workflows.
●Collaborated with cross-functional teams to define ETL requirements and develop solutions meeting business needs and data quality standards. Ensured data integrity and consistency by incorporating data quality checks into ETL pipelines, reducing errors and enhancing data reliability. Improved data processing times by 15% through optimization of ETL pipelines for performance and scalability.
●Developed and maintained data pipelines in Databricks to process extensive datasets, ensuring data reliability and quality. Implemented ETL processes in Databricks to convert raw data into usable formats, enhancing data accessibility and usability. Collaborated with data scientists to deploy machine learning models on Databricks, enabling real-time predictions and insights.
●Managed Databricks clusters for optimal performance and scalability in data processing. Conducted performance tuning and optimization of Spark jobs on Databricks, improving job execution times. Integrated Databricks with cloud services like AWS and Azure for seamless data integration and workflow automation. Designed and implemented data lake architectures on Databricks for efficient storage and retrieval of large data volumes.
●Implemented advanced statistical models, such as linear regression, logistic regression, and time series analysis, to analyze complex datasets and provide actionable insights. Conducted hypothesis testing and A/B testing to evaluate the effectiveness of marketing campaigns, product changes, and operational strategies, leading to a 10% increase in ROI.
●Utilized statistical software (e.g., R, Python, SAS) to clean, transform, and analyze large datasets, improving data quality and reliability. Developed and maintained dashboards and reports to visualize key statistical metrics and trends, enabling data-driven
decision-making by stakeholders.
●Collaborated with cross-functional teams to design experiments and surveys, collect data, and interpret results, ensuring statistical rigor and accuracy. Presented statistical findings and recommendations to non-technical audiences, translating complex concepts into actionable insights. Contributed to the development of statistical models and methodologies, staying current with industry trends and best practices.
●Assisted in the development of data collection strategies and protocols, ensuring data integrity and compliance with regulatory requirements. Applied Convolutional Neural Networks (CNNs) for advanced image and pattern recognition tasks, enhancing the predictive power of models.
●Implemented machine learning algorithms, including Support Vector Machines (SVMs), to classify and analyze complex datasets. Deployed ML models into production environments, ensuring scalability and integration with existing systems.
Utilized SQL to design and execute complex queries for data extraction, transformation, and aggregation. Created and optimized database views and stored procedures to support business intelligence initiatives and data-driven decision-making. Developed
SQL scripts for automating data processing tasks and integrating with Python-based analytics workflows.
Company: Simon Property Group, Inc. August 2017 – April 2019
Role: Data Engineer
Lawrence, New Jersey, United States
Responsibilities:
●Developed and deployed machine learning models, including regression, classification, and clustering algorithms, to extract actionable insights from complex datasets.
●Applied statistical techniques, such as hypothesis testing and regression analysis, to identify patterns and trends in data, leading to data-driven decision-making.
●Utilized advanced machine learning algorithms, such as Random Forest, XGBoost, and Neural Networks, to build predictive models with high accuracy and scalability.
●Implemented A/B testing methodologies to evaluate model performance and optimize algorithms for improved results.
●Conducted exploratory data analysis (EDA) and feature engineering to preprocess data and improve model performance.
●Collaborated with cross-functional teams to design and implement data collection processes and data quality checks to ensure reliable model outputs.
●Presented findings and insights from machine learning models to stakeholders in a clear and actionable manner, influencing strategic business decisions.
●Stayed updated with the latest developments in machine learning and statistics to incorporate new techniques and best practices into model development processes.
●Integrated ETL pipelines with AWS S3 and Redshift for seamless data integration across platforms.
●Documented ETL pipelines, including data mappings, transformation logic, and job schedules, ensuring transparency and ease of maintenance.
●Identified and resolved bottlenecks in ETL processes through performance tuning and troubleshooting, enhancing overall efficiency.
●Implemented change data capture (CDC) techniques to process incremental data updates, reducing processing time and resource usage.
●Worked with data architects to design and implement data models supporting ETL processes and meeting business requirements.
●Implemented scalable and cost-effective solutions on AWS, leveraging services such as EC2, S3, Lambda, and RDS to support machine learning workflows.
●Designed and implemented CI/CD pipelines on AWS using CodePipeline, CodeBuild, and CodeDeploy to automate model deployment and testing processes.
●Leveraged Azure Machine Learning services to build, train, and deploy machine learning models, achieving high-performance and scalable solutions.
●Implemented Azure DevOps pipelines to automate model training, testing, and deployment processes, improving overall efficiency and reliability.
●Developed machine learning models on Google Cloud Platform (GCP) using services like AI Platform, BigQuery, and TensorFlow, achieving accurate and scalable solutions.
Implemented CI/CD pipelines on GCP using Cloud Build and Cloud Deployment Manager, automating the deployment of machine learning models.
Company: Moderna, Inc May 2014 - August 2015
Role: Data Analayst
Norwood, Massachusetts, United States
Responsibilities:
●Applied advanced statistical methods, including hypothesis testing, regression analysis, time series analysis, and survival analysis, to model complex data patterns and make accurate predictions.
●Leveraged Python for statistical modeling and data manipulation, utilizing libraries such as NumPy, SciPy, pandas, and statsmodels to extract meaningful insights and perform complex analyses. Developed custom scripts for data cleaning, transformation, and visualization.
●Utilized SQL to design and execute complex queries for data extraction, transformation, and aggregation, enabling efficient data analysis and reporting. Created and optimized database views and stored procedures to support business intelligence initiatives.
●Conducted statistical experiments, analyzed experimental results, and presented findings to stakeholders, leading to actionable business recommendations. Developed interactive dashboards using Python-based visualization libraries (e.g., Matplotlib, Seaborn) to present insights clearly.
●Utilized statistical software such as R and SAS for advanced statistical modeling and data analysis, integrating results into broader analytical frameworks and reporting systems.
●Collaborated with cross-functional teams to design and implement data collection processes and ensure data quality for statistical analysis. Developed SQL-based data validation checks and Python scripts to automate data quality monitoring.
●Developed and implemented statistical models to solve business problems, including customer segmentation, demand forecasting, and risk assessment. Applied machine learning algorithms in Python to enhance model accuracy and business outcomes.
●Stayed updated with the latest developments in statistical modeling and data analysis techniques, incorporating new methodologies and tools into analytical processes, including advanced Python libraries and SQL techniques.
●Managed ETL pipeline deployments and releases, ensuring smooth deployment with minimal downtime. Utilized Python for ETL script development and SQL for data transformation and loading tasks.
●Conducted regular audits and monitoring of ETL pipelines to ensure data quality and compliance with data governance policies. Developed SQL queries and Python scripts for automated pipeline monitoring and alerting.
●Mentored junior team members on ETL best practices, including SQL and Python techniques, fostering knowledge sharing and collaboration. Provided guidance on effective data handling and analysis strategies.
●Utilized Azure Databricks for big data processing and analytics, integrating machine learning models developed in Python for advanced insights and predictions.
Managed Azure resources and services, ensuring compliance with security and governance policies, and optimizing costs through resource utilization analysis. Implemented Python scripts for resource monitoring and optimization.
●Stayed abreast of industry trends and best practices, evaluating and implementing new tools and technologies for ETL pipeline development and maintenance, including advances in Python and SQL.
●Provided timely support and troubleshooting for ETL pipeline issues, minimizing impact on business operations. Developed Python-based tools and SQL queries for efficient issue resolution and root cause analysis.
●Utilized Google Cloud Dataflow for data processing and transformation, integrating machine learning models for real-time analytics and predictions. Applied Python for model integration and data pipeline development.
●Integrated AWS SageMaker into existing machine learning pipelines to streamline model development and deployment. Used Python for model training and evaluation.
●Managed AWS resources using infrastructure as code (IaC) tools like AWS CloudFormation and Terraform, ensuring consistent and reproducible deployments. Developed Python scripts for infrastructure management and automation.
●Managed GCP resources and services, optimizing costs through resource monitoring and utilization analysis, and ensuring compliance with security and governance policies. Used SQL and Python for cost analysis and optimization tasks.