Sr. Data Scientist

Location:

Menominee, MI, 49858

Posted:

September 06, 2023

Contact this candidate

Resume:

CHRISTOPHER PARRETT

Contact: 872-***-**** Email:***************@*****.***

Attuned to the latest trends and advancements in this field, I am consistently delivering impeccable results through my dedication to handling multiple functions and activities under a high-pressure environment with tight deadlines.

DATA SCIENTIST

EXECUTIVE SNAPSHOT

13+ years of expert involvement in IT with knowledge in Artificial Intelligence, Data Mining, Deep Learning, Predictive Analysis, and Machine Learning with big datasets of structured and unstructured data; Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle

Specialized knowledge to deal with exponentially growing data. Understands the underlying science of data and applies the same in a diverse set of problem statements in a variety of fields

Automated recurring reports using SQL and Python and visualized them on platforms BI or Tableau

Experience in the implementation of the Stored Procedures, Triggers, and Functions using T-SQL

Skilled to build a fully automated, highly elastic cloud orchestration framework on AWS

Predictive Modelling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution, and other advanced statistical and econometric techniques

Proficient in Tableau and R-Shiny data visualization tools to analyse and obtain insights using large data sets. Created visually powerful and actionable interactive reports and dashboards

Experience in developing and analysing data models. Involved in writing simple and complex SQL queries to extract data from a database for data analysis and testing

Well-versed in Statistical programming language coding skills in R, Python, SAS, and cloud platforms such as Azure ML, AWS ML, and GCP tools like Big Query, Vertex AI, etc.

Experience using CUDA/GPU API for real-time image processing

Strong SQL programming skills working with functions, packages, and triggers

Developed predictive models using Decision Trees, Random Forests, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks

Strong knowledge in all phases of the SDLC (Software Development Life Cycle), Agile & Scrum Methodologies

Knowledgeable in machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, Natural Language Processing (NLP), etc.

technical skills

Data Science Specialties: Natural Language Processing, Machine Learning, Internet of Things (IoT), Predictive Analytical thinking, Predictive Maintenance

Analytical Skills: Bayesian Analysis, Inference, Time-Series Analysis, Regression Analysis, Linear Models, Multivariate Analysis, Gradient Descent, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, and Predictive Analytics. Experience in stochastic optimization and regression with machine learning algorithms, formulating and solving discrete and continuous optimization problems.

Analytics Tools: Classification and Regression Trees (CART), Support Vector Machine (SVM), Random Forest, Gradient Boosting Machine (GBM), Principal Component Analysis (PCA), Regression, Naïve Bayes,

Analytic Languages and Scripts: Python, Spark, Spark SQL, Scala, Java, C++, Ruby, LaTeX, Markup

Data Integration: SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS)

Languages: C, C++, R, Java, Perl, SQL, Python

Version Control: Git Hub, Git, SVN

Libraries: ggplot, nltk, numpy, opencv, pandas, sci-kit-learn, scipy, spacy, multiprocessing, dask, flask

Computing Environments: Jupyter, Spyder IDE, Atom, VBCode, Eclipse, Android Studio

Command Shell: iPython, Mac OS X, Powershell, Bash

Data Query: Azure, Google, SQL and NoSQL, various SQL and NoSQL databases, and data warehouses. Experience with AWS, GCP, Azure cloud computing, Spark, and Tableau. Capable of writing efficient code and working with large data sets.

Deep Learning: Machine perception, Data Mining, Machine Learning algorithms, Neural Networks, Tensor Flow, Keras

Soft Skills: Able to deliver presentations and highly technical reports; collaboration with stakeholders and cross-functional teams, advisement on how to leverage analytical insights. Development of clear analytical reports which directly address strategic goals.

Operating Systems: Windows 8 and 10, Linux Ubuntu

PROFESSIONAL EXPERIENCE

Sr. Data Scientist March 2023 – Present

Anthem (Elevance Health), Chicago, IL

Model development for Home Lab Kits in identifying member populations that are most in need of Home Lab Kits and the ones are most likely to return kits. Discussions on designing a recommendation system at the program level for Medicare members.

Played a key role in the initial design of models as a Senior Data Scientist, focusing on using engagement measures as targets and leveraging KNN similarity modeling, and enhanced the accuracy and effectiveness of the models in predicting customer behavior and outcomes

Demonstrated project management skills by scheduling and monitoring meetings to facilitate the development of models and identify relevant data tables that aligned with specific business requirements

Conducted a thorough review of past models written in Python to understand the logic and insights behind previous model designs and identified opportunities for improvement and optimization in subsequent iterations

Applied advanced SQL techniques and complex queries to determine member eligibility and compliance with Medicare health measures within specific contracts and plans for accurate assessments and improved decision-making processes

Utilized PySpark to distribute calculations and perform queries on large datasets, optimizing performance and enabling efficient data processing.

Coordinated with the team in addressing peripheral ad-hoc requests and helping with Python and PySpark for research studies related to Medicare measure lift

Played a key role in feature selection by leveraging PTC models and utilizing Random Forests/XGBoost to predict the likelihood of members returning a Home-Lab-Kit, contributed to targeted marketing efforts, and improved member engagement

Documented weekly meeting discussions and offered support using Teams, ensuring effective communication and knowledge sharing within the team

Provided consultation and expertise in model versioning and refinement based on monthly feedback and evaluation of predicted member lists

Designed and implemented hold-out groups to test model predictions of program compliance, contributing to the evaluation and improvement of program effectiveness

Developed refined categorizations of historical compliance and engagement, incorporating multiple combinations of factors that influenced Medicare categorizations of plan members for improved accuracy of member segmentation

Leveraged KNN similarity factors and engagement scores to assign Medicare measures to active members without a historical record, enabling targeted interventions and personalized care plans

Analyzed data at various levels of granularity, including member, plan, and contract levels, to derive actionable insights and drive strategic decision-making processes

Utilized Natural Language Processing (NLP) techniques to preprocess textual data and conducted data cleaning and processing to ensure data quality and consistency.

Managed and processed large volumes of data with billions of rows and thousands of high-dimensional features, demonstrating proficiency in handling complex datasets

Led the development of a recommendation system for plan-level members, leveraging PowerShell SSH into a Hadoop/Hive environment and using Impala for complex multiple joins

Utilized Jupyter Notebooks to run models and employed SHAP (SHapley Additive exPlanations) to evaluate the marginal effects and interpret the predictability of members returning a Home-Lab-Kit

Sr. Data Science Consultant - Gamma Team January 2022 – Mar 2023

Boston Consulting Group, Chicago, Illinois

As a Senior Data Science Consultant under the Gamma Team at Boston Consulting Group, I have assisted in providing case analysis for internal projects and for a user portal data exchange that provides data assets, models, and algorithms. I am active in integrating several new data sources into the existing portal and provide support and consulting for close to 50 existing pipelines with their associated predictive models.

Liaison between Lighthouse and data vendors integrating new pipelines and providing model preparation consultations

Responsible for weekly meetings with vendors and gamma team covering JIRA assigned tickets

Exposure to a wide variety of domains of differing granularity and frequencies including satellite imagery, footfall traffic, credit card purchases, device usage on an international scale, social listening, financial, demographic, telecom, GIS, geospatial data, ESG ratings, global temperatures, vegetation, and air quality, etc.

Distinguished legality of the use of data between raw data, derived data, and data with model-inferred values

Vendor staging of data and machine learning APIs using vendor-specific platforms including GCP/AWS/AZURE

Provide feature updates to the repos to streamline pipelines destined for Snowflake and debugging of airflow event breakdowns.

Use of APIs/SFTP/S3/GCS to manage source data staging specific to each vendor and dynamic SQLs used to create queries with domains over hundreds of snowflake databases, schemas, tables, and views.

Developed source code using Python within 3 large Git repositories to handle data ingestion, transformations, validations, final staging, and markup for client portals or internal case projects.

GCP BiqQuery to assist vendors and case teams with big data and model-inferred data using Vertex AI

Implemented Python virtual classes and documented details of data lineage and data pipelines in Egnyte/Confluence/Share point

Accountable for pipeline schedules and log reviews of DAGs coded using ECS operator Python structures

Protocol conversion from Rest APIs toward GraphQL for ingesting geospatial maritime vendor data.

Repos cloned and tested using locally run docker containers before pushing upon successful results

Regex is used to pattern-match S3 directories with parameterized queries dynamically based on current dates or other events or messages received

Enumeration modules developed to detect changes in vendor source data stage structures.

Weekly JIRA meetings with teams discussing current projects and vendor data proposals for new integrations

Sr. Data Scientist Dec 2020 – Jan 2022

Credit Suisse, New York

Serving with the RAD team, I assisted with implementing Isolated Forest machine learning models used to detect anomalous trading signatures among currencies, options, futures, stocks/bonds, IPOs, and collateral assets. I was able to streamline the current workflow by automating several steps and implementing their first data pipeline. In addition to another team, I developed code to extract data from several online submitted tax forms using OCR Tesseract technology. LSTMs and CNNs were used to convert handwritten images into text with 99% accuracy where data was stored in a NoSQL database.

Contributions made for two independent projects: RAD (Regulatory Anomaly Detection) and Tax Extraction.

Daily scrum meetings using Odyssey/Atlassian JIRA to create tasks/stories and set priorities

Weekly collaboration with European clients and coordination with IT India data center on off-hours to construct complex SQL queries processed with Impala engines of error trading data

Orchestrated several pipelines with cron-jobs to ingest and process error trading data to hive connected tableau creating aggregated visualizations based upon regulators, jurisdictions, and data sources.

Developed anomaly detection models using isolated random forests of features from stocks and derivatives error trading data.

Implemented OCR using Tesseract, CNNs, etc.

Text so extracted was treated with NLP, and word embedding was done using BERT and used LSTMs for topic modeling

Deep learning frameworks PyTorch and Keras on TensorFlow were used

Python, NumPy, SciPy, Pandas, etc. tools used

Lead developer to code OO Python classes to process document handling of pdf tax forms stored in Mongo DB with 99% accuracy in extracting data from tax forms using Tesseract.

Synthetic training data was used with TensorFlow to model variations of tax form handwritten dates.

JSON schemas with specific coordinate extractions using CV2 created for W9, EXP, ECI, W8BENE, WBEN, and IMY tax forms.

Homographic coordinate alignment and morphological transformations applied to images using CV2

Regex expressions applied to post-processing of OCR-extracted text data to filter noise

Multi-threading/processing using Python applied to multiple pages at once within a single tax form to decrease the time to process and extract data.

Sr. Data Scientist Nov 2019 – Dec 2020

Freeport McMoRan, Phoenix, Arizona / Remote

As a Senior Data Scientist at Freeport McMoRan, I assisted in developing Monte Carlo methods, optimized through particle swarm methods. Recommendations were created by simulating mine conditions that aided managers in choosing scenarios to optimize ore producing KPIs. During testing, experienced mine dispatchers accepting these recommendations resulted in millions of dollars of additionally produced ore during a typical 12-hour shift.

Data ingestion of IoT data using HVR/SAP/PIBA/API sources.

Data pipelines using Kafka/Even Hubs/Snow pipe

Data persistence using Blob Storage (Cosmos)/Teradata/Snowflake

Data analytics using Azure Streaming Data/Python/Azure Databricks/SQL/GIT/AZURE

Use of Power BI on top of cosmos with data in JSON format

Use of wide-ranging IoT data types from GPS/Temperature/Pressure/Altitude/Speed/Bumps

Databricks cluster analysis and run-time tests to determine latency issues

Development of using REST API/web sockets used to change mine operations remotely with the aid of Postman

Use of brainstorming tool Free mind to organize complex data relationships

Daily standup meetings of items assigned in Azure DevOps using the Agile method

Running mode sequentially or in parallel using Pandas or Spark Data frames

Remote testing of model recommendation with the client in real-time at Peru Mine in South America

Analysis to determine the duration of various down events in the mine

Feature engineering of IoT data to measure or trigger events in the mine

Research and wiki-documentation of past and current items in DevOps

Led discussions with the client in adopting the model used in the mine

Developed a method to predict when shovel material blocks would become depleted

Analysis of precision of estimates made of mine KPIs from the Monte Carlo Simulations

Analysis of code to produce model and experimentation of Python classes or extraction of data structures with classes to better understand and make suggestions to change the core code.

JDBC connection to data sources using Databricks

Worked with a core team of 4-6 individuals of Data Engineers, Data Analysts, and Data Scientists

Data Scientist Jul 2018 – Nov 2019

KeyCorp, Cleveland, OH

Key Corp’s primary subsidiary is KeyBank whose service spans retail, small business, corporate, and investment clients. I developed a churn model that identified customers who were on the verge of leaving, using logistic regression. Through incentives for customers who were identified by the model to potentially cancel their account, cost savings were created by retention, and the average lifetime of customers was increased significantly.

Worked with the data science team and provided respective data as required on an ad-hoc request basis

Delivered portfolio risk dashboard as a package covering all aspects of the credit life cycle for retail unsecured loans

Created time series forecasts using Prophet for default rates of bank financial instruments

Worked with huge data sets from Big Data with Hadoop, HDFS, Map Reduce, and Spark

Information used included structured and semi-structured data elements collected from both internal and external sources

The unbalanced data issue was handled using Synthetic Minority Over Sampling, SMOTE. Missing data were handled using KNN imputation

Assisted both application engineering and data scientist teams in mutual agreements/provisions of data, deployment of production models, etc.

Python, MLLib, and a broad variety of machine learning methods including classifications, regressions, and dimensionality reduction were incorporated

Use of Supervised, Unsupervised, and Semi-Supervised classification and clustering of documents

Strong communication and problem-solving skills incorporated in a team environment

Contributed to security projects involving real-time object tracking and classification using OpenCV libraries

Use of google cloud video intelligence to make videos searchable and discoverable

Interrogated analytical results to resolve algorithmic success, robustness, and validity

Assisted in developing Spark/Scala, and Python for regular expression (regex) projects in a Hadoop/Hive environment

Data Scientist Feb 2015 – July 2018

Aetna Insurance, Hartford, CT

Aetna is a Health & Life Insurance company that is privately held. Data from current and past patrons were used to warehouse data that allowed me to create models and make predictions of the best medical plans suited for each potential customer via a recommender system. Insurance agents were aided by this recommender system that incorporated Principal Component Analysis and Singular Value Decomposition in making collaborative suggestions about prospective customers. Early A/B testing showed that insurance agents who used the recommender system were able to close deals 13% more often than those who had not adopted the system.

Developed personalized product recommendations with machine learning algorithms that used Collaborative filtering to better meet the needs of existing customers and acquire new customers.

Created machine learning algorithm and employed logistic regression, random forest, KNN, SVM, neural network, linear regression, lasso regression, and k-means.

Developed optimization algorithms that can be used with data-driven models, such as with supervised and unsupervised machine learning or reinforcement machine learning.

Able to research statistical machine learning methods which may include forecasting, supervised learning, classification, and Bayesian methods.

Able to advance the technical sophistication of solutions using machine learning and other advanced technologies.

Performed exploratory data analysis and data visualizations using R, and Tableau.

Collaborated with data engineers to implement the ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.

Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.

Performed data integrity checks, data cleaning, exploratory analysis, and feature engineering using R and Python.

Data Scientist Oct 2013 – Feb 2015

Black Rock, New York, NY

Created alternative methods to scout companies in anticipation of LBO negotiations. Data training sets were created, and the classification of documents into relevant categories was used in the search for companies with low market visibility. NLP techniques and associated models were used to filter data from several sources on the web. Relevant documents led to the discovery and acquisition of several companies structured as LBOs that yielded estimated averages of IRRs of 25% with expected exit times of 2-4 years.

Use of a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction using NTLK, word2vec, and TF/IDF

Cleaned, parsed, and tokenized 1.6 million sentences using NLTK and sci-kit-learn.

Developed, deployed, and maintained production NLP models with scalability in mind.

OLAP cubes are used in preparation for data mining, behavioral and attitudinal segmentation, predictive modeling, insight extraction, and data visualization

Worked with complex applications such as R and SAS to develop neural networks and implement cluster analysis.

Implementation of machine learning algorithms and concepts such as K-means Clustering (varieties), gaussian distribution, decision trees, etc.

Analyzed data using data visualization tools and reported key features using statistic tools and supervised machine learning techniques to achieve project objectives.

Analyzed large data sets and applied machine learning techniques and develop predictive models, and statistical models.

Used key indicators in Python for machine learning concepts like regression, bootstrap aggregation, and random forest.

Use of inferential statistics and machine learning data pipelines.

Data Scientist Jan 2010 – Oct 2013

Johnson and Johnson, New Brunswick, New Jersey

Johnson and Johnson develops several medical products that monitor health conditions. ANN autoencoders were used to model ECG readings and to detect anomalies from baseline. Data from arrhythmic events were combined with normal ECG data to simulate early warnings of symptoms of heart disease based on error reconstruction metrics. Successful models were packaged and forwarded for deployment.

Thrived as a contributor, scientist, and developer in an Agile development process.

Use of Python/R or similar scripting languages to manipulate, analyze and visualize large data sets.

Rapid model creation in Python using pandas, NumPy, Sklearn, and Matplotlib for data visualization.

Models implemented in SAS and interfaced with MSSQL databases and scheduled to update on a timely basis.

Developed anomaly and outlier detection development in the creation of algorithms with CNNs.

Machine learning classification of documents - Neural Network and Deep Learning language techniques, K-neighbors, K-means, Random Forest, Logistic Regression.

Trained Data with Different Classification Models like Decision Trees, SVM, and Random Forest

Managed BI group and associated cross-departmental collaborators in the design, development, and implementation of near-real-time cloud and traditional data systems to capture, cleanse, store, and process data

Programmatic usage of SQL databases and search engines.

Segmentation of medical-related images using OpenCV libraries.

Developed and deployed machine learning as a service on Microsoft Azure cloud service.

Conducted and interpreted analyses using noisy data sets.

Senior Mathematics Lecturer Aug 2007 – Jan 2010

University College of the Cayman Islands, Grand Cayman, Cayman Islands

UCCI is a private institution that serves the academic needs of its local community and prepares students for the thriving business sector in the Cayman Islands. I taught a wide range of courses including College Algebra, Calculus, Linear Algebra, and Discrete Mathematics. Courses covered several fundamental computer science concepts including classical logic, combinatory, and graph theory.

Installed online test data banks and provided technical assistance for colleagues.

Executed IT projects using Raspberry Pie, Arduino, and Network Simulators.

Taught Discrete Mathematics for computer science majors.

Development of an ASPX database-driven website for posting student grades.

Invigilated break-out sessions for yearly science conferences.

Led break-out sessions for yearly STEM conferences.

Conducted yearly chess tournaments for students with personally donated funds.

Video/Audio distance learning for nearby Islands integrated within normal classrooms.

Wrote, reviewed, and was lead invigilator for several collaboratively designed Midterms and Finals.

EDUCATION

Bachelor of Arts in Mathematics, Michigan State University

Master of Science in Mathematics, Michigan State University

Master of Science in Statistics, Michigan State University

Contact this candidate