Resume

Sign in

Senior Data Scientist

Location:
Issaquah, WA
Posted:
April 27, 2022

Contact this candidate

Resume:

Profile Summary

* *****’ experience designing, building, and deploying data modelling solutions that solve real-world business problems and help businesses better understand their market demographics to drive business performance.

Build and deploy data models using Linear Discriminant Data Analysis (LDA), Logistic Regression models, Naive Bayes, etc.

Design, develop, and implement informative data visualizations in Tableau, Ggplot2, Matplotlib, Seaborn, and R Shiny.

Build statistical models in Python using TensorFlow, data mining, data streaming, and BI reporting solutions that scale across massive volumes of structured data and unstructured data.

Build up Recommender Systems by applying content-based/user-based collaborative filtering.

Configure/apply Machine Learning algorithms (Logistic Regression, Random Forest, XGboost, KNN, SVM, Neural Network, Linear Regression, Lasso/Ridge Regularization, K-Means).

Discover patterns in data using algorithms and SQL queries and use experimental and iterative approach to validate findings in Python using TensorFlow.

Hands-on skill with Tidyverse, Dplyr, Readr, Zoo, Glmnet in R, Pandas, Numpy, Matplotlib, Seaborn, Scikit-Learn in Python for performing exploratory data analysis (EDA), and data preprocessing.

Experience with data analysis methods such as data reporting, ad-hoc data reporting, graphs, scales, pivot tables, OLAP reporting with Microsoft Excel, R Markdown, R Shiny, Python Markdown.

Advanced skills with relational databases and SQL programming/scripting.

Skilled conducting Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and Non-Negative Matrix Factorization (NMF) to predictive and cluster analysis.

Conduct statistical data analysis in Python, R, and SQL to build complex statistical models for predictive analysis.

Hand-on skill working with Support Vector Machines (SVM), K Nearest Neighbors (KNN), Random Forests, Decision Trees, Bagging, Boosting (AdaBoost, Gradient Boosting, XGBoosting), Neural Networks (FNN, CNN, RNN, LSTM).

Publish and present dashboards, storylines on web, and desktop platforms.

Skilled with statistical analysis programming languages R and Python (including Big Data technologies such as Spark, Hadoop, Hive, HDFS, MapReduce).

Advanced knowledge in CAD models (AutoCAD) for three-dimensional models.

Certification in SolidWorks (CSWA) for the modeling of 3D parts, applications in Augmented Reality and experiences in Unity video games.

Technical Skills

Machine Learning and Deep Learning: Artificial Neural Network (ANN), Multivariate regression, Engineering features, K-Nearest Neighbors (KNN), K-means clustering, Random forests, Decision trees, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Linear regression, Logistic regression, XGBoost, Reinforcement learning, Transfer learning.

Analytical Methods: Predictive Analytics, Statistical Analysis, Advanced Data Modeling, Forecasting Time Series Models, Regression Analysis, Sentiment Analysis, Exploratory Data Analysis, Stochastic optimization, Predictive Modeling with Time Series (AR, MA, and ARIMA) analysis, Principal Component Analysis (PCA), Linear Discriminate Analysis for features selection on cluster analysis; Bayesian Analysis and information theory. Linear/Logistic Regression, Classification and Regression Trees (CART).

Programming: Python, R, SQL, Scala, Java.

Libraries: TensorFlow, PyTorch, NumPy, Pandas, Scikit-Learn, Keras, Matplotlib, Statsmodels, Ggplot2, Gensim.

Data Visualization: Tableau, Matplotlib, Seaborn, Altair, Plotly.

NLP: TensorFlow, Keras, spaCy, PyTorch, LSTM, NLTK, Gensim, Textblob, CoreNLP.

IDE: Jupyter Notebook, PyCharm, Visual Studio, IDLE, Spyder, Eclipse.

RDMS: Microsoft SQL, MySQL, Oracle DB, AWS RDS, PostgreSQL, Amazon, Azure SQL

NoSQL Databases: Cassandra, MongoDB, HBase, Oracle NoSQL, Amazon DynamoDB, AzureCosmosDB.

Professional Work Experience

Costco SHIPT, Issaquah, WA September 2019 to Now

Senior Data Scientist

Shipt.com and the Shipt app provide delivery from your local store via a local community of reliable shoppers. Once you become a member, you can use Shipt to shop Target and Costco groceries, essentials, and more. A Shipt shopper will fulfill and deliver your order at a time that is convenient during normal store business hours. As Senior Data Scientist at Shipt, I developed time-to-delivery and time-to-collection models using custumer location and product data. I worked as Team Lead in the geographic information system problem and developed a geotagging system to increase pick prediction to 90 percent shopper predictability.

Contributions:

Built, deployed, and monitored AI products partnering with data scientists, product managers and other cross functional teams.

Led and defined best practices for robust and efficient AI Product development that included streaming and batch pipeline development and automation, and model versioning and deployment serving models using microservices scalability and maintainability throughput and latency requirements.

Designed and managed CI/CD pipelines for machine learning product deployment.

Collaborated with a cross-functional team of data science, peer engineering teams, product, business, etc. to deliver AI services with quality, efficiency, security, and affordability.

Utilized Big Data technologies (Hadoop, Spark and Hive) to develop and deploy machine learning models in production.

Utilized cloud, Python/Scala, Microservices.

Developed Personalization models.

Developed price optimization, promotion optimization, and Ad Tech optimization models.

Contributed as hands-on senior technical contributor in the Membership organization.

Contributed to architecture, design, and implementation.

Consumed, deployed, and monitored AI models and served those models in public cloud environments.

Applied in-depth knowledge and understandings of machine learning, software development, and statistical modeling.

Hands-on with Python/Scala.

Completed Python API development framework, FastAPI, Django, and Flask.

Hands-on with cloud computing services, AWS, Azure, and GCP.

Applied SQL to Snowflake, Postgres, and Redshift.

Produced data structures, algorithms, etc.

Applied linear algebra, statistics etc.

Experience with demo UI development using tools like Python Dash and R Shiny.

Utilized version control and peer code GitLab, BitBucket, etc.

Dole Juice Company, Westlake Village, CA January 2017 to August 2019

Senior Machine Learning Engineer

At Dole, a fruit juice company, I worked with manufacturing equipment datasets to provide complex data extracts, programming, and analytical modeling. This was performed to support the automation of various routine manufacturing processes by predicting time-to-failure to prevent extended downtime, scheduling appropriate preventative maintenance. Incorporated IoT data for up-to-date predictions. When Covid-19 hit, we focused on generating automated system alerts and predictive solutions to increase the reliability of the plants under reduced staff.

Contributions:

Applied survival analysis techniques and produced machine learning algorithms to improve how manufacturing teams could predict part failures.

Applied data mining methods, hypothesis testing, regression analysis, and various other statistical analysis and modeling methods.

Presented weekly updates to managers and key stakeholders to preview the user interface designs and analytical results of stress analysis findings, etc.

Presented using PowerPoint, Tableau, and Excel for data work and charts.

Participated in Software Development Life Cycle (SDLC), including Requirements Analysis, Design Specification and Testing following Agile methodologies. Operated in 2-week sprints, and weekly stand-ups.

Worked in Git development environment.

Responsible for preparation for data for collaboration with machine learning models.

Used Python to create a semi-automated conversion process to generate raw archive linked data file.

Provided software training and further education about model applications to incoming team.

Initial findings reported for conversion of Excel to CSV, text to CSV, and image to CSV.

Collaborated with the computer vision team to better understand how to extract meaning from images and PDF files.

Used Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical Testing, and Normal Distribution.

Implemented project with custom APIs in Python and use of visualization tools such as Tableau and Ggplot to create dashboards.

Index Fresh, Bloomington CA (Remote) January 2015 to December 2016

Computer Vision Specialist/ Data Scientist

Index Fresh is worldwide marketer of avocados, sourcing from all major growing regions around the globe. To ensure appropriate procurement of avocados, Index Fresh needed to build an image recognition tool to help their growers identify potential diseases on avocado trees. I was tasked to build a classification model using a Keras Convolutional Neural Network that could take in photos from field workers and return specific disease classifications. The model utilized transfer learning from Densenet201v2, and my demo had an accuracy rate of 93%. After finishing the model training, the weights were stored and shared with the deployment team. In the same company, I worked to predict and forecast the price of avocados using linear regression and ARIMA time series modeling. Also, I worked on the classification problem to predict the type of avocado, whether organic or conventional.

Contributions:

Used SQLAlchemy to perform queries and pull data from Amazon S3 MemSQL database into Pandas DataFrames in Python.

Created Python scripts that converted images to a JPEG filetype and then resize to a set size.

Utilized Tensorflow’s Keras Image Processing to develop ImageDataGenerator objects to augment (rotate, blur, flip) images for a Convolutional Neural Network

Imported from Tensorflow Keras Applications InceptionResNetV2 as the base model for the initial CNN using sample images from field workers and online resources.

Utilized a second pre-trained CNN called DenseNet201 as transfer learning technique to train data collected from the MemSQL database.

Developed basic Flask app using a static HTML template as a UI to upload images, and then return a processed version image with a disease classification.

Coordinated with field workers, logistic warehouse workers, and other departments to ensure the image recognition product fit the Avocado disease use case in deployment.

Defined different metrics and indicators to ensure the CNN model maintained its integrity.

Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, and Scikit-learn in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, &KNN for data analysis.

Implemented machine learning algorithms and concepts such as K-means Clustering (varieties), Gaussian distribution, Decision Tree, etc.

Analyzed data using data visualization tools and feature extraction tools and supervised machine learning techniques to achieve project objectives.

Analyzed large data sets and applied machine learning techniques and developed predictive models and statistical models.

Applied Supervised, Unsupervised, Semi-Supervised classification and clustering of warehouse inventory in analysis.

Established machine learning classification of documents - Neural Network and Deep Learning, K-neighbors, K-means, Random Forest, Logistic Regression, SVM.

Created an analysis model to optimize resource allocation and warehouse layout.

Documented logical data models, semantic data models, and physical data models.

Performed complex modeling, simulation, and analysis of data and processes.

Performed analysis through different regression and ensemble models in machine learning to perform forecasting.

Created and managed GitHub repo for Data Science EET and Data Science semantic search projects.

Instructed team members about Data Science tools such as SageMaker, Comprehend and Gensim.

Explained technical AI work to non-technical stakeholders.

Pickled models and deployed SageMaker pipelines.

Operationilized AI Models using AWS tools such as AWS Batch, EKS, and AWS Lambda.

Deployed ML Pipeline monitoring and Orchestration using Apache Airflow and MLFlow.

SEDICOSA - Soluciones en Ingenieria, State of Mexico February 2014 to January 2015

Project Engineer

Supervisor and executor of engineering projects. Attend the client's needs regarding problems in their processes, improvement projects or growth projects. Manage internal and external projects applying the PMI and PMBOK, 5S methodology, among other good practices. Developer of Layouts, Capex, KPI'S, Kick-off's, Efficiency calculations, Renders, Project plan, Quotes, Basic and Detail Engineering. Developer of Basic and Detailed Engineering (DTI's, System architecture, Wiring Diagrams, Arrangement of rack equipment’s, List of materials properties, Functional descriptions, Flow diagrams, List of signals, FAT and SAT protocols, Datasets, One-line diagrams.

Contributions:

Continuous Emission Monitoring System (CEMS) for combustion system, the gases to analyzed: NOx, O2, CO2 - Generation Central TULA HIDALGO, ALDESA Guerrero Negro IV.

Air Quality Monitoring System (AQMS), the gases to analyzed: NOx, O2, O3 - Generation Central TULA HIDALGO, ALDESA Baja California.

Quality assurance for packaging laboratories - N2, H2, O2, Ar and CO2) (Praxair, Air Liquide, Gaspro).

Education

Master’s in Computer Science (International Competition: PNPC)

Cum Laude: Graduating with Honor

Centro de Investigación en Computación (CIC-IPN) CONACYT

Bachelor of Engineering: Control and Automation Engineering

Escuela Superior de Ingeniería Mecánica y Electica (ESIME-IPN)



Contact this candidate