Post Job Free

Resume

Sign in

Data Scientist / ML Engineer

Location:
New York, NY
Posted:
February 03, 2023

Contact this candidate

Resume:

Demetri person

Senior Data Scientist & ML Engineer...

E: ads9uh@r.postjobfree.com P: 718-***-****

SUMMARY

•Senior Data scientist & Machine Learning Engineer with 8+ years of experience in the applications of Data Science, Machine Learning, Deep Learning, and Data Analytics to transform business requirements into analytical and statistical solutions and data products

•Leveraging the power of Python, R, SQL, etc. with packages like NumPy, Pandas, sklearn, TensorFlow with Keras, PyTorch, matplotlib, seaborn, Plotly, Rshiny, etc.

•Constructed statistical models in Python and R and implemented data mining and BI reporting solutions that scale across massive volumes of structured and unstructured data.

•Skilled at data mining, SQL queries, data modeling, data/business analytics, and data visualization, machine learning in Python & R using NumPy, Pandas, and tidyverse.

•Experienced using R, Python, and SQL over Big Data Platforms like Hadoop (multiple flavors, including MemSQL, MySQL, HQL/HiveQL).

•Deployed models as python packages, as API for backend integrations, and as services in a micro-services architecture with a Kubernetes orchestration layer for the Docker containers.

•Experienced in Python (NumPy, TensorFlow, Matplotlib) & R-tidy verse, using data modeling, data analytics, and evidence-based approaches to find lean and actionable solutions and insights to various real-world business problems.

•Apply advanced-level skills in statistical and predictive modeling techniques to build, maintain, and improve real-time decision systems.

•Experienced with native Bayes, regression analysis, neural networks/deep neural networks, support vector machines (SVM), decision trees, random forest, and XGBoost using machine learning and statistical techniques in Python and R.

•Experienced in handling and implementing statistical models on big data sets using cloud computing assets with AWS and Azure.

•Experienced in MLOps and building CI-CD pipelines for model deployment.

•Applied creative thinking/strong ability to devise and propose innovative ways to look at problems by using business acumen, mathematical theories, data models, and statistical analysis.

•Discovered patterns in data using statistical analytics, machine learning algorithms, and SQL queries and use experimental and iterative

•Vast experience in designing and presentation of interactive data visualizations and widgets in Python using Matplotlib, ggplot2, Plotly, and Seaborn, and in R using tidyverse and R Shiny for visualization.

•Produced Custom BI reporting dashboards in Python using Dash with Plotly for rapid dissemination of actionable, data-driven insights.

•Experienced working with relational databases with advanced data SQL skills.

•In-depth knowledge of statistical procedures that are applied in both supervised and unsupervised machine learning problems.

TECHNICAL SKILLS

• Machine Learning: Natural Language Processing & Understanding, Machine Learning algorithms including text recognition, image classification, and forecasting

•Analysis Methods: Advanced Data Modeling, Statistical, Exploratory, Bayesian Analysis, Inference, Clustering, Sentiment Analysis, Predictive Analytics, Decision Analytics, Design and Analysis of Experiments, Regression Analysis, Multivariate Analysis, Sampling methods, Forecasting, Segmentation, Factorial Design, and Response Surface Methodologies, Optimization, and State-Space Analysis

•Analysis Techniques: Random Forest, Gradient Boosting Machine (GBM), TensorFlow, Classification and Regression Trees (CART), PCA, RNN including LSTM, CNN, Transfer learning, Linear and Logistic Regression, Naïve Bayes, Simplex, Markov Models, and Jackson Networks

•Data Modeling: Stochastic Modeling, Linear Modeling, Behavioral Modeling, Bayesian Analysis, Statistical Inference, Predictive Modeling, Probabilistic Modeling, Time-Series Analysis

•Deep Learning: Machine Perception, Data Mining, Machine Learning algorithms, Neural Networks, RNN, CNN, Transfer learning, TensorFlow, Keras. PyTorch

•Data Query: Azure, Google, SQL, data warehouse, data lake, and various SQL databases and data warehouses

•Applied Data Science: Natural Language Processing, Machine Learning, Text Recognition, Image Classification, Social Analytics, Predictive Maintenance

•Python Packages: Numpy, Pandas, Matplotlib, Seaborn sci-kit-learn, TensorFlow, PyTorch, SciPy

•Analytic Development: Python, R-Programing, SQL, Excel

•Artificial Intelligence: Text Understanding, Classification, Pattern Recognition, Recommendation Systems, Targeting Systems, Ranking Systems, and Time Series

PROFESSIONAL WORK EXPERIENCE

Senior Data Scientist & Machine Learning Engineer

Strategy& (PwC), New York, NY

Sep 2020 – Present

Strategy& is the Analytical consulting division of PwC. I had worked with two PwC clients that were large-size corporations. I led a Data Science team that built a pipeline for automatic data entry of manually written forms using AWS as storage for our database. The pipeline accepts two images (the image for information extraction and the reference form) and outputs information such as names, id, and other tax information. The information extraction steps were implemented using Google Tesseract and AWS text Extract as well as Convolutional Neural Networks. We performed OCR on the image to extract wanted information and stored them in our AWS database. This automation helped reduce manual data entry by 30% and was smoothly implemented into production. Another project included using Machine Learning techniques and Python libraries to derive relevant analysis and metrics, including building POCs to determine the value of implementations in future projects. The work focused on understanding the customer experience across the organization. Applied text summarization and extracted insights by applying sentiment analysis on customers’ feedback and chat texts.

•Applied business analytics skills, integrated, and prepared large and varied datasets, and communicated results.

•Worked with specialized database architecture and computing environments for Structured data

•Developed analytic approaches to strategic business decisions.

•Performed analysis using predictive modeling, data/text mining, and statistical tools.

•Built predictive modeling using Machine Learning algorithms such as Random Forests, Naive Bayes, Neural Networks, MaxEnt, SVM, Topic Modeling/LDA, Ensemble Modeling, GB, etc.

•Used common NLP techniques, such as pre-processing (tokenization, part-of-speech tagging, parsing, stemming).

•Performed semantic analysis (named entity recognition, sentiment analysis), modeling, and word representations (GPT2, BERT, ELMo, Word2vec, Doc2vec).

•Synthesized analytic results with business input to drive measurable change.

•Performed data visualization and developed presentation material using Tableau.

•Participated in product redesigns and enhancements to know how the changes would be tracked and to suggest product direction based on data patterns.

•Applied statistics and organized large datasets of structured and unstructured data.

•Worked with applied statistics and applied mathematics tools for performance optimization.

•Facilitated data collection to analyze document data processes, scenarios, and information flows.

•Determined data structures and their relations in supporting business objectives and provided useful data in reports.

•Experience in Kubernetes to deploy scale, load balance, and manage Docker container with multiple names spaced versions and a good understanding of open shift platform in managing Docker and Kubernetes Cluster.

•Assisted in the continual improvement of the AWS data-lake environment.

•Implemented MLOps using AWS Sagemaker by automating the CICD pipeline for the various stages of data preprocessing, model building, model deployment, model monitoring, and maintenance.

•Constructed packages in Python to automate and/or streamline workflows in continuous model validation, reporting, data cleaning, data quality testing, reporting, and dependency using AWS Sagemaker.

•Utilized Python and TensorFlow to build proofs of concept for new models, new features for existing models, and new internal products, and then go to live production and deploy these as finished models, features, or products.

•Built cutting-edge Bidirectional Long-Short-Term Memory (BLSTM) architecture Recurrent Neural Network with Python and TensorFlow text summarization.

•Carried out work with the team in an Agile way with the practice of Scrums.

•Led weekly Scrum meetings to prioritize and assign tasks to members of the team.

Senior Data Scientist (ML/Computer Vision)

Freeport McMorran, Phoenix, AZ

Feb 2019 – Sep 2020

Freeport McMorran is a large mining operation with a major interest in copper and other metals. Operations span the globe, with main operations in the United States and Peru. The Data Science/Machine Learning mandate was to create analysis tools to optimize the delivery and production of process ore as well as preventative maintenance of mining/crushing equipment. I worked as a Data Scientist/Computer Vision Researcher to find state-of-the-art solutions for classifying crushable and uncrushable materials for the company’s production pipeline. The dev team built a full pipeline to make inferences in real-time. Before sending the material for further processing, our model had to decide whether the material was crushable or not. We tested various computer vision architectures such as VGG16, Resnet, Inception, EfficientNet, etc. Resnet gave the best performance for a recall of 0.85 and precision of 0.7 and fast inference time (~3s). We aimed to maximize recall in this problem because it was more cost-effective to filter out as many uncrushable materials than to keep as many crushable ones.

•Worked in Git development environment.

•Applied expert-level Python and SQL Server development skills.

•Utilized TensorFlow, Keras, Python, and deep neural network and analytical techniques.

•Transformed business requirements into analytical models, designed algorithms, built models and developed data mining and reporting solutions that scaled across massive volumes of structured and unstructured data.

•Working knowledge of cluster managers: Kubernetes/Docker.

•Worked with Proof of Concepts (POCs) and gap analysis and gathered necessary data for analysis from different sources.

•Conducted data labeling on a massive scale by creating a Python-based web tool and crowdsourced data annotation.

•Prepared data for data exploration using data wrangling.

•Implemented advanced and deep CNNs for image classification

•Implemented transfer learning from pre-trained models like RESNET, VGG16, etc.

•Worked with Random Forests, Decision Trees, Linear Regression, Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis, and Recommender Systems.

•Used Pandas in Python for performing Exploratory data analysis.

•Worked with data modeling tools Power Designer and ER Studio.

•Interacted with data from Hadoop for basic analysis and extraction of data in the infrastructure to provide data summarization.

•Created visualization tools and dashboards with Tableau, Ggplot2, and D3.js.

•Worked with and extracted data from various database sources like Oracle, SQL Server, and DB2.

Data Scientist (ML Engineering)

PNC Financial Services Group, Inc., Pittsburgh, PA

Apr 2017 – Jan 2019

The PNC Financial Services Group, Inc. is an American bank holding company and financial services corporation. Worked with Data Science in the Decision Science Department team to build models that evaluate and improve new compliance regulations implemented by the Feds, Consumer Financial Protection Bureau (CFPB), and the Options Clearing Corporation (OCC) by combining both unstructured data, using text analytics and Natural Language Processing (NLP), and structured data. Performed text analysis, constructed advanced predictive models and analytics supporting LOBs across the enterprise, and rendered business intelligence reports from varied data. The insights gleaned from the data were used to determine target markets and their issues, plan supply and demand of financial products and ways to save on costs with predictive analytics as well as reduce risk and examine strategic partnerships.

•Built predictive modeling using Machine Learning algorithms such as Random Forests, Neural Networks, MaxEnt, SVM, Native Bayes, Topic Modeling/LDA, Ensemble Modeling, GB, etc.

•Forecasted key performance indicators using an Attention-LSTM-based predictive model in TensorFlow with 80% accuracy; entailing reduced network downtime and improved log-collection triggering.

•Experience in containerizing and migrating applications to Kubernetes.

•Modelled Anomalies in KPIs with an autoencoder-based classification model to predict the achieving 60% accuracy; reducing unnecessary log-analysis for cell maintenance.

• Led machine learning life cycle tracking system development based on MLflow centralizing experiment tracking, visualization, and reducing hyper-parameter tuning time of models by 20%.

•Mentored interns over summer internship; aided systemization of onboarding process for the AIML team leading to reduced onboarding time for new candidates from 3 to 2 months

•Applied common NLP techniques, such as pre-processing (tokenization, part-of-speech tagging, parsing, stemming).

•Analyzed unstructured data leveraging an Open Source Data Science Platform (OSDS) and other analytical tools to solve complex business objectives.

•Defined key business problems to be solved while developing and maintaining relationships with stakeholders, SMEs, and cross-functional teams.

•Determined data structures and their relations in supporting business objectives and provided useful data in reports.

•Completed semantic analysis (named entity recognition, sentiment analysis), modeling, and word representations (RNN / LSTM, word2vec, doc2vec TF-IDF, LDA,).

•Worked with big data infrastructure and tools such as Hive and Spark.

•Performed data visualization and developed presentation material using Tableau.

•Provided knowledge and understanding of current best practices and emerging trends within the analytics industry.

•Participated in product redesigns and enhancements to know how the changes will be tracked and to suggest product direction based on data patterns.

•Facilitated data collection to analyze document data processes, scenarios, and information flow.

•Assisted in the continual improvement of the AWS data lake environment.

•Promoted enterprise-wide business intelligence by enabling report access in SAS BI Portal and Tableau Server.

Data Scientist

Coldwell Banker Real Estate LLC, Madison, NJ

Oct 2014 – Mar 2017

Coldwell Banker Real Estate LLC is an American real estate franchise. Worked with a Data Science team on a Forecasting and Analytics engagement. Built a model to target the price prediction of houses in specified areas. Trained the model with comprehensive data and related attributes. The model was then trained using the attributes, number of bathrooms, number of bedrooms, number of crimes, and square feet. To test the model, the same data was extracted from a website used to buy/sell property called Zillow. The prediction of the model was 85% accurate in comparison with the price listed on the website. Additional forecasting for sales and overall trends was done in parallel with this work.

•Constructed and integrated logistics and linear regression models, balancing various internal requirements of covariance and variable criteria.

•Applied advanced statistical and predictive modeling techniques to build, maintain, and improve real-time decision systems using ARIMA, SARIMA, and SARIMAX.

•Devised and proposed innovative ways to look at problems by using business acumen, mathematical theories, data models, and statistical analysis.

•Utilized decision trees and random forests to grade the validity of the variables used in the regression models. Implemented additional tools such as bagging and boosting (AdaBoost, XGBoost) to strengthen these models.

•Identified patterns in data using algorithms and used an experimental and iterative approach to validate findings.

•Communicated results and recommendations to business stakeholders weekly. Implemented feedback and features based on the evolving needs of the business in a rapidly changing social landscape.

•Deployed final model in a Flask app on AWS to be called via a REST API.

EDUCATION

Bachelor of Science in Computer Science, Middle Tennessee State University (Completed)

AWS Certified Machine Learning - Specialty (Ongoing)



Contact this candidate