Post Job Free

Resume

Sign in

Data Scientist Learning Engineer

Location:
San Antonio, TX
Posted:
February 14, 2023

Contact this candidate

Resume:

Nayere Hess

Senior Data Scientist

PROFESSIONAL PROFILE

Data Scientist and Machine Learning Engineer with over 8 years of experience in using advanced statistical analytics and machine learning techniques to create actionable solutions to business and scientific problems...

Ph.D. in Computational BioPhysics, CLEMSON, USA with MS in Condensed Matter Physics BeHESHTI University, Tehran, and BS in Physics International IK University

A versatile technocrat with hands-on experience in optimization techniques, python libraries, IDE's, development tools, and supervised as well as unsupervised learning

Extensive experience in using object-oriented programming to perform data mining, artificial neural networks, and various optimization algorithms

Hands-on experience with multiple NLP methods for information extraction, topic modeling, parsing, and relationship extraction

Hands-on experience in the Computer Vision domain including object detection, image segmentation, and event detection in video

Well-versed in designing, developing, and deploying custom BI reporting dashboards using Shiny, Shiny dashboard, and Plotly to provide actionable insights and data-driven solutions

Dexterous in the application of statistical and machine learning methods including Regression Analysis, Forecasting, Decision Trees, Random Forest, Classification, Cluster Analysis, Support Vector Machines, Naive Bayes techniques, Deep Learning, CNN, and RNN

Skilled at applying techniques to live data streams from big data sources using Spark and Scala; possess cloud platform experience using Azure, AWS, and GCP

A leader with excellence in transforming business concepts and needs into mathematical models, designing algorithms, and deploying custom business intelligence software solutions; knowledge of building models with deep learning frameworks such as TensorFlow, PyTorch, and Keras

An assertive team leader with strong aptitude in developing, leading, hiring, and training highly effective work teams; strong analytical skills with proven ability to work well in a multi-disciplined team environment and adept at easily learning new tools and processes. __

TECHNICAL SKILLS

Python Packages: PyTorch, NumPy, Pandas, Scikit-learn, TensorFlow, Keras, SciPy, Matplotlib, Seaborn.

Programming Languages: Python, R, MATLAB, Linux, Latex Data Systems SQL, MySQL, NoSQL, AWS (RDS, RedShift, Kinesis, EC2, EMR, S3), MS Azure, Spark, Hive, Hadoop.

IDEs: Spyder, Jupyter, PyCharm, RStudio, Eclipse.

Development Tools: GitHub, Git, JPython Notebook, Trello, SVN.

Statistical Methods: Bayesian Statistics, Hypothesis Testing, Factor Analysis, Stochastic Modeling, Factorial Design, ANOVA Optimization Techniques Linear Programming, Dynamic Programming, Convex Optimization, Non-Convex Optimization, Monte Carlo Methods, Network Flows.

Machine Learning Frameworks: TensorFlow, Torch, Keras, Caffe Python Libraries NumPy, Pandas, SciPy, Matplotlib, scikit-learn, Caffe, NLTK, StatsModels, Seaborn, Selenium.

Deep Learning: Keras, TensorFlow, PyBrain,

Analysis Methods: Unsupervised Learning K-means Clustering, Hierarchical Clustering, Centroid Clustering, Principal Component Analysis, Gaussian Mixture Models, Singular Value Decomposition Supervised Learning Naive Bayes, Time Series Analysis, Survival Analysis, Linear Regression, Logistic Regression, ElasticNet Regression, Multivariate Regression.

Applied Data Science: Natural Language Processing, Deep Learning, Transfer Learning, Auto encoding/decoding.

Soft Skills: Quick learner. Goal-oriented. Write well-documented code, Able to manage time and prioritize tasks. Practice ego-free collaboration and communication. Great at presenting complex results to a non-technical audience.

PROFESSIONAL EXPERIENCE

Senior Data Scientist and ML Developer, PNC Bank, Pennsylvania (Remote work)

June 2022 – Present

At PNC, worked as a part of the Intelligent Automation team and was involved in reporting to the manager. The project in PNC is aimed to develop a way to ensure the new code meets security standards and audit compliance requirements. The Intelligent Automation team is an internal consultancy to assist line-of-business (LOB) and technology teams in delivering AI-enabled solutions to production.

In charge of the implementation of the delivery of ML models

Advising on architecture for ML-based cloud solutions

Implementing components of ML-based solutions in AWS

Automated the end-end process of fraud detection of transactions

Built AWS Cloud Formation templates used in Terraform with existing plugins

Developed Fastapis for feature engineering/data quality, model execution, and business logic

Communications with business stakeholders to establish the logic of the processes

Worked with LOB technology counterparts as a member of an Agile Scrum crew to implement ML-based solutions

Established best practices within Agile crews for model development workflow, DevOps/MLOps methodology, and productizing the ML models

Conducting model review, code refactoring and optimization, containerization, deployment, versioning, and monitoring of its quality

Created the tests and integrate them automatically with the workflows

Developed an efficient, reliable CI/CD pipeline using AWS Sagemaker

Building native cloud infrastructure using Kubernetes, Knative, and TriggerMesh

Sound Understanding of NoSQL database concepts

Performed application programming activities including, coding, testing, debugging, documenting, maintaining, and modifying machine learning systems

Engaged in data modeling to identify the underlying structure of a dataset and finds patterns or properties, which lead to machine learning improvement recommendations

Managed all the risks associated with their business objectives and activities to ensure they adhere to and support PNC's Enterprise Risk Management Framework

Lead Data Scientist, Pfizer, New York, NY (Remote)

January 2020-May 2022

Pfizer is an industry leader in conducting clinical trials for pharmaceuticals. However, each trial can take years to complete, and nearly 31% of trial participants withdraw before its conclusion, costing Pfizer millions of dollars. To stay ahead of the curve. Pfizer has a robust research effort. One of the areas of research is cell mutation and protein folding. Genetic mutations can change protein stability through alteration in protein conformation and dynamics. To investigate this, the unfolded state of proteins was described using a statistical coil model and the folding free energy changes were obtained using Molecular Dynamics simulation and the MMPBSA method. Different machine learning algorithms such as Regression, Random Forest (RF), Gradient Boosting (GB), and Artificial Neural Networks (ANN) were implemented for this purpose. The experimental values for stability changes were used to check the performance of the models.

Used unsupervised learning techniques such as K-means clustering and Gaussian Mixture Models to cluster customers into different risk groups based on health parameters provided through wearable technology regarding their activities and health goals

Multiple statistical modeling approaches were applied to determine the usefulness of wearable technology data for various insurance products.

Survival modeling techniques, such as Poisson regression, hidden Markov models, and Cox proportional hazards, were used to model time to different events utilizing wearable data (time to death for life insurance, time to next hospital visit, time to next accident, time to critical illness, etc.).

Data required extensive cleaning and preparation for machine learning modeling, as some observations were censored without any clear notification.

We solved a binary classification problem (transferring to a lower-risk group or not with a given financial incentive) with logistic regression.

An artificial neural network was utilized with Keras/TensorFlow in Python to solve binary classification problems for premiums and their intersection with the discriminant.

We used modifiers, including L1 regularization, dropout, and Nesterov momentum to enhance the neural network and optimize generalization

Data Scientist (AI & NLP), IBM, Atlanta, GA,

January 2018-December 2019

Worked with IBM’s L1 support team to automate assistance for users with software issues and crashes. Used anomaly detection algorithm to find periods of increased error rates with an NLP solution to determine what categories the anomalous logs belong to. Utilized Watson Discovery to find relevant articles and direct users to the logs. Incorporated feedback for continuous model training.

Implemented tokenization using NLTK, stemming, filtering stop words, creating individual functions within a class for each prompt to the user

Implemented K-Means clustering, Cosine similarity, and Flask deployment.

Used Keras, TensorFlow, PyTorch, and other frameworks for neural network generation and optimization.

Utilized Python libraries such as Pandas, NumPy, and Plotly to preprocess and clean text data and visualize.

Trained classification models on text classes using transfer learning techniques.

Used RNNs, LSTMs, and BERT on text data for sentiment analysis or classification.

Implemented Anomaly Detection and Root Cause Analysis.

Unified consumer profile with probabilistic record linkage.

Architected, built, maintained, and improved new and existing suites of algorithms and their underlying systems.

Implemented deployment solutions using TensorFlow, Keras, Docker, and Kubernetes Service.

ML Engineer (AI/Computer Vision), Cryovac-SealedAir, Greenville SC

September 2016-Januaray 2018

Cryovac is a food packaging and packaging machinery company. Cryovac is the second largest employer in South Carolina. As part of the Computer Vision initiative, my responsibilities

included creating a computer vision-based algorithm to identify unproperly sealed items and establish a checklist for future computer vision-aided QA systems. The work was in the early days and was meant to be a proof of concept only.

With Keras, TensorFlow, and PyTorch Python API, the team built the architecture and trained the convolutional neural networks (CNN).

Exploited transfer learning with custom-built classifiers in PyTorch to speed up production time and improve results.

Fine-tuned Vgg16 and other models to adapt their pre-trained weights to our use case.

Used a fully convolutional network (FCN) - pre-trained YOLO algorithm - to speed up predictions.

Took into consideration the real-time predictions overhead to make sure our predictions happened in real time.

Regularized the data by applying transformations to the images using Pillow.

Worked with large stores of video imaging data stored on AWS S3 buckets for training the model.

Supplied pickled model to the software development team to integrate into QA machinery.

Data Scientist (Supply Chain Solutions), Michelin, Greenville, SC

January 2015 - Aug 2016

Michelin is one of the world’s leading companies and the largest single employer in South Carolina. As a Supply Chain Solutions consultant, my responsibilities included creating demand forecasting analyses of the whole automotive industry in the United States. This was done by creating Sales prediction using time series analysis: I developed a tool for predicting the company’s sales based on its previous sales records. I used time series analysis techniques with TensorFlow and Keras. The goal was to provide the company analysis, insight, and suggestions for the future. Since time series analysis can be easily applied to different use cases, this model can be used in many other enterprises. Data was scraped from online sources using SQL queries among other tools. Python’s statsmodels package and Arima model were used in this project. The franchises were analyzed both individually and in groups. The model successfully identified branches that are doing well as well as those that are not performing as expected:

Data scraping and preprocessing.

Classification of the branches based on their size.

Training and testing a time series model to forecast the future sales of each group.

Identifying branches that perform the best and the worst among all branches.

Using StatsModels to decompose the time series into trend, seasonality, and residual data.

Using Dicky-Fuller test to prove that the residual has stationary data.

Using the Arima model to model the stationary data.

Training and testing a time series model to forecast the future sales of the individual branches.

Modeling the trend, seasonality, and stationary data, combining them to provide the forecast for the future.

Providing insight and suggestions to the managerial staff for the future.

Responsible for planning, monitoring, and execution of deployment and product releases.

Participation in status meetings, a progress report to track progress, risk management, defect triage, defect tracking, and resolution.

EDUCATION

BS in Physics, International IK University

MS in Condensed Matter Physics, BeHESHTI University, Tehran

Ph.D. in Computational BioPhysics, CLEMSON, USA



Contact this candidate