Senior Data Scientist

Location:

San Francisco, CA, 94132

Posted:

December 17, 2023

Contact this candidate

Resume:

CASEY ARTNE GONDER

Contact: 408-***-**** (M); Email: ***********@*****.***

DATA SCIENTIST-

SUMMARY

•Data Scientist with 10+ years of experience in processing and analysing data across a variety of industries. Leverages various mathematical, statistical, and machine learning tools to collaboratively synthesize business insights and drive innovative solutions for productivity, efficiency, and revenue.

•Extensive experience in 3rd-party cloud resources: AWS, Google Cloud, and Azure

•Working with and querying large data sets from big data stores using Hadoop Data Lakes, Data Warehouse, Amazon AWS, Cassandra, Redshift, Aurora, and NoSQL

•Ensemble algorithm techniques, including Bagging, Boosting, and Stacking; knowledge with Natural Language Processing (NLP) methods, in particular BERT, ELMO, word2vec, sentiment analysis, Name Entity Recognition, and Topic Modelling Time Series Analysis with ARIMA, SARIA, LSTM, RNN, and Prophet.

•Experience in the entire data science project life cycle and actively involved in all the phases, including data extraction, data cleaning, statistical modeling, and data visualization with large data sets of structured and unstructured data.

•Demonstrated excellence in using various packages in Python and R like Pandas, NumPy, SciPy, Matplotlib, Seaborn, TensorFlow, Scikit-Learn, and ggplot2.

•Skilled in statistical analysis programming languages such as R and Python (including Big Data technologies such as Spark, Hadoop, Hive, HDFS, and MapReduce).

•Understanding of applying Naïve Bayes, Regression, and Classification techniques as well as Neural Networks, Deep Neural Networks, Decision Trees, and Random Forests.

•Performing EDA to find patterns in business data and communicate findings to the business using visualization tools such as Matplotlib, Seaborn, and Plotly.

•Experience in Tracking defects using Bug tracking and Version control tools like Jira and Git.

•Adept at applying statistical analysis and machine learning techniques to live data streams from big data sources using PySpark and batch processing techniques.

•Leading teams to productionize statistical or machine learning models and create APIs or data pipelines for the benefit of business leaders and product managers

•Experience in Tracking defects using Bug tracking and Version control tools like Jira and Git.

•Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying, and analyzing risks using appropriate templates and analysis tools.

•Good knowledge of creating visualizations, interactive dashboards, reports, and data stories using Tableau and Power BI.

•Excellent communicative, interpersonal, intuitive, analysis, and leadership skills, a quick starter with the ability to master and apply new concepts.

•Large Language Model fine tuning and training. Extensive experience hands on with PaLM and Open AI Davinci and GPT 2,3,3.5 and GPT-4

TECHNICAL SKILLS

Analytic Development - Python, R, Spark, SQL

Python Packages - Numpy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Fastai, SciPy, Matplotlib, Seaborn, Numba

Artificial Intelligence - Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Regression, Naïve Bayes

Natural Language Processing - Text analysis, classification, chatbots.

Deep Learning - Machine Perception, Data Mining, Machine Learning, Neural Networks, TensorFlow, Keras, PyTorch, Transfer Learning

Programming Tools and Skills -Jupyter, RStudio, Github, Git, APIs, C++, Eclipse, Java, Linux, C#, Docker, Node.js, React.js, Spring, XML, Kubernetes, Back-End, Databases, Bootstrap, Django, Flask, CSS, Express.js, Front-End, HTML, MS Azure, AWS, GCP, Azure Databricks, AWS Sagemaker

Data Modeling - Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series analysis

Machine Learning - Natural Language Processing and Understanding, Machine Intelligence, Machine Learning algorithms

Analysis Methods - Forecasting, Multivariate analysis, Sampling methods, Clustering Predictive, Statistical, Sentiment, Exploratory, and Bayesian Analysis. Regression Analysis, Linear models,

Applied Data Science - Natural Language Processing, Predictive Maintenance, Chatbots, Machine Learning, Social Analytics, Interactive Dashboards.

PROFESSIONAL EXPERIENCE

Sr AI Scientist

Genentech, San Francisco, California

Since Dec 2022

(Genentech, Inc. is an American biotechnology corporation headquartered in South San Francisco, California).

As a dedicated Senior NLP Engineer, I applied my skills to computational biology at Genentech. I led the development of automated literature search and biological sequence analysis pipelines. Collaborating with the Head of ML & CTO, we developed scalable methods for automated literature analysis. I also worked on creating predictive DNA and protein language models, among other sequence-based prediction methods.

Responsibilities:

•Accessed the production SQL database to extract data for validation with third-party data.

•Validated data between SQL servers and third-party systems.

•Worked with large datasets (10M+ observations of text data).

•Cleaned text data using a variety of techniques.

•Integrated with the AWS platform environment.

•Utilized cloud computing resources to optimize models, tune hyperparameters, and cross-validate statistical data science models.

•Used Python libraries Pandas, NumPy, Seaborn, Matplotlib, and SciKit-learn to develop various machine learning models, including logistic regression, random forest, gradient boost decision tree, and neural network.

•Built and analyzed datasets using Python and R.

•Applied linear regression in Python and SAS to understand the relationships between different attributes of the dataset and the causal relationships between them.

•Performed exploratory data analysis (EDA) on datasets to summarize their main characteristics, such as a bag of words, K-means, and DBSCAN.

•Utilized Git for version control on GitHub to collaborate with team members.

•Used different embedders, such as Universal Google Encoder, DocToVec, TFIDF, BERT, and ELMO, to identify the best embedder for the task.

•Implemented models to predict previously identified Key Performance Indicators (KPIs) among all attributes.

•Developed several ready-to-use templates of machine learning models based on given specifications and assigned clear descriptions of purpose and variables to be given as input into the model.

•Prepared reports and presentations using Tableau, MS Office, and ggplot2 that accurately conveyed data trends and associated analysis.

•Worked with data warehouse architecture and wrote SQL queries.

Generative AI Engineer

Genentech, San Francisco, California

May 2022 – Dec 2022

Generative AI-Powered Market Insights Generator for Internal Business Teams

In this project, I led the development of a cutting-edge Generative AI-powered application designed exclusively for the internal business teams of the company. The application generates detailed market insights and reports, enabling data-driven decision-making and strategic planning within the organization.

Responsibilities:

•Collaborated closely with internal business stakeholders to identify specific market trends, competitive analysis, and insights they required.

•Utilized state-of-the-art Large Language Models (LLMs), such as GPT-3, to develop a generative AI algorithm capable of producing insightful market reports.

•Designed an intuitive user interface that allows internal teams to input their queries and receive comprehensive market insights in real-time.

•Ensured the generated insights were accurate, relevant, and aligned with the specific needs of different business functions.

•Integrated the Generative AI application with the company's internal data sources and APIs to provide up-to-date information.

•Successfully implemented the Generative AI application, providing internal business teams with instant access to customized market insights.

•Empowered business teams to make informed decisions by delivering real-time reports on market trends, competitor analysis, and emerging opportunities.

•Enhanced efficiency by automating the process of generating insightful reports, reducing the time and effort required for manual analysis.

•Fostered a culture of innovation and data-driven decision-making within the internal business teams.

•Received positive feedback from internal stakeholders for the accuracy and relevance of the generated insights.

•Large Language Models: GPT-3, Hugging Face Transformers

•Natural Language Processing: Text generation, Sentiment Analysis

•Web Development: HTML, CSS, JavaScript

•Backend Development: Python, Flask

•API Integration: RESTful APIs

•Cloud Platforms: AWS, Azure

•Data Sources: Internal databases, APIs

ML Ops Engineer

Ascension, St. Louis, Missouri

Feb 2019 – April 2022

(Ascension is the largest nonprofit and Catholic health system in the United States and operates more than 2,600 healthcare sites in 19 states and Washington, D.C., including 142 hospitals and 40 senior living facilities. It employs more than 142,000 people as of 2021.)

As an MLOps Engineer, I was responsible for designing and building cloud solutions to support the activities of machine learning engineers across the company. I built tools and APIs to support ML teams in every stage of their ML workflows and championed automation of the entire ML lifecycle, covering data ingestion, model development, model training, model management, deployment, serving, and monitoring. Daily, I prototyped tools and APIs to allow ML engineers to access cloud-based infrastructure developed by my MLOps colleagues, iterating towards a production-ready service. In developing solutions, I carefully considered the skillset of the end user and designed and documented tools accordingly. I also set up batch process Inference, Model Monitoring, and Retraining Pipelines using AWS Sage maker and ML Flow model registry.

Responsibilities:

•Integrated ML models with Kubernetes-based infrastructure, leveraging EKS for efficient deployment and scaling of models in production environments.

•Designed and implemented a comprehensive ML workflow leveraging MLflow, AWS Batch, Docker, and Kubernetes, resulting in streamlined and scalable model training and deployment processes.

•Developed custom Docker containers for ML model packaging, ensuring consistent and reproducible environments across different stages of the ML lifecycle.

•Utilized MLflow to track and manage experiments, enabling easy comparison of different models and hyperparameters, and facilitating collaboration among team members.

•Implemented automated model training pipelines using AWS Batch, enabling efficient parallel processing of large datasets and reducing training

•Conducted comprehensive training sessions for ML team members, equipping them with the necessary skills to proficiently utilize MLOps tools and technologies for efficient ML model management and deployment.

•Developed and maintained documentation and best practices for ML operations, ensuring knowledge sharing and smooth onboarding of new team members.

•Designed and implemented continuous integration and continuous deployment (CI/CD) pipelines using Jenkins and GitLab, enabling automated testing, building, and deployment of ML models. This streamlined the development process and ensured consistent and reliable delivery of models.

•Automated model monitoring using ML-Flow and Weights and Biases

•Utilized AWS SageMaker and Lambda to conduct thorough performance tuning and optimization of ML models, resulting in significant improvements in inference speed and cost efficiency.

•Designed and implemented a containerized ML model deployment solution using Docker and Kubernetes, ensuring efficient resource utilization and seamless scalability for handling high-volume inference requests.

•Architected and implemented a robust and scalable infrastructure leveraging AWS Elastic Kubernetes Service (EKS) and Docker containers to efficiently orchestrate and manage the deployment of machine learning models. Incorporated fault-tolerant mechanisms to ensure continuous availability and optimized resource allocation for optimal performance.

•Collaborated with DevOps teams to integrate ML workflows into existing CI/CD pipelines, enabling seamless deployment and version control of ML models.

•Designed and implemented scalable data pipelines using AWS Glue and Athena to facilitate seamless data ingestion, transformation, and storage. Collaborated with cross-functional teams to ensure efficient and reliable data processing and analysis, resulting in improved data-driven decision-making.

Sr Data Scientist

The Cigna Group, Bloomfield Connecticut

Sept 2016 – Feb 2019

(The Cigna Group is a for-profit American multinational managed healthcare and insurance company based in Bloomfield, Connecticut. Its insurance subsidiaries are major providers of medical, dental, disability, life, and accident insurance and related products and services, the majority of which are offered through employers and other groups.)

As a data scientist at Cigna Group, I worked with a team of data scientists, data engineers, and ml-ops engineers to create deployable machine learning models to detect fraudulent claims and identify anomalies in medicine use. The tools I employed were anomaly detection algorithms such as convolutional autoencoders and isolation forests. The goal of the project was to reduce the payout to fraudulent claims and to identify possible medication abuse early by careful examination of the historical records.

Responsibilities:

•Utilized clustering-based outlier detection algorithms like CBLOF and Angle-Based Outlier Detectors to identify anomalies in medicine use patterns.

•Developed and maintained data pipelines to ensure the timely and accurate ingestion of data for anomaly detection.

•Implemented proactive monitoring and maintenance protocols to ensure optimal performance and effectiveness of deployed machine learning models.

•Designed and implemented machine learning models utilizing advanced anomaly detection algorithms, including Isolation Forest, Local Outlier Factor, and One-class Support Vector Machine, to detect fraudulent claims and identify anomalies in medicine use.

•Conducted comprehensive model evaluation and validation, utilizing performance metrics such as precision, recall, and F1-score, to ensure the robustness and effectiveness of the implemented anomaly detection algorithms.

•Engaged in extensive collaboration with subject matter experts to gain a deep understanding of the business requirements and effectively integrate domain knowledge into the development and implementation of anomaly detection models.

•Conducted extensive feature engineering to extract meaningful features from the medical claims data for improved model performance.

•Leveraged the Histogram-based Outlier Score algorithm to develop a robust system for accurately detecting and flagging fraudulent claims, as well as identifying early signs of medication abuse.

•Leveraged convolutional autoencoders, a deep learning technique, for accurate and efficient anomaly detection in the context of fraudulent claims and medication abuse identification.

Data Scientist

Hanover Research, Arlington, Virginia

Jan 2013 – Aug 2016

(Hanover Research is a global research and analytics firm that provides market research, surveys, competitive intelligence, and business strategy services to clients in the corporate, education, and healthcare sectors. The company has offices in the United States, the United Kingdom, and India. Hanover Research has a team of over 500 consultants and analysts providing data-driven insights to help clients make informed decisions.)

Worked as a model developing engineer - Built churn analysis models as well as market segmentation and customer lifetime value estimation. As a junior Data Analyst, I extracted insights from the existing datasets and prepared data for further analysis as part of a team.

Responsibilities:

•Conducted sentiment analysis on customer feedback data to identify key drivers of customer satisfaction and implemented targeted improvement initiatives.

•Developed and deployed cutting-edge customer segmentation algorithms leveraging advanced data analytics techniques to optimize the allocation of the marketing budget and enhance precision in targeting, resulting in a substantial 20% reduction in marketing expenditures.

•Designed and implemented robust data pipelines and databases, leveraging SQL, Python, and Hadoop technologies, to ensure the integrity and reliability of data for analysis purposes.

•Designed and executed A/B testing experiments to assess the impact of marketing campaigns and optimize conversion rates, leading to a notable 15% improvement in campaign return on investment (ROI).

•Collaborated with cross-functional teams to develop and deploy recommendation systems, improving personalized customer experiences and increasing upsell opportunities.

•Led a cross-functional team in designing and executing customer segmentation analysis, resulting in targeted marketing campaigns and a 25% increase in customer engagement.

•Led cross-functional teams in defining project objectives, gathering data requirements, and developing analytical solutions for market research and customer lifetime value analysis.

•Stayed up-to-date with the latest advancements in data science and machine learning technologies to continuously improve analytical capabilities.

•Implemented data visualization techniques to present complex findings clearly and concisely to stakeholders.

•Applied natural language processing techniques to analyze customer feedback and sentiment analysis for product improvement.

•Designed and executed market segmentation analysis to identify distinct customer segments based on demographic, behavioral, and psychographic characteristics.

•Conducted market segmentation analysis to identify distinct customer groups and tailor marketing strategies accordingly.

•Delivered comprehensive reports and presentations to senior executives, highlighting key findings and actionable recommendations based on data analysis.

•Utilized A/B testing methodologies to assess the impact of marketing campaigns on customer behavior and provided data-driven recommendations for optimizing future initiatives.

•Collaborated with cross-functional teams to define project objectives, gather data requirements, and develop analytical solutions.

•Built customer lifetime value estimation models to predict future revenue potential and inform customer acquisition and retention efforts.

•Collaborated with cross-functional teams to define project objectives, gather data requirements, and develop analytical solutions.

ACADEMIC CREDENTIALS

Pursuing Google Data Analytics Professional Certificate (Online)

Master of Business Administration - Business Management

Wake Forest University

Master of Science - Materials Science & Engineering

Norfolk State University

Bachelor of Science – Optical Engineering

Norfolk State University

Contact this candidate