Top Skills
Python (* years)
R (* years)
Model Development
Machine Learning Deployment
Deep Learning
Statistical Methods
Cloud Development
Big Data Analytics
Communication & Leadership
Summary
7 Years Data Science Experience
An accomplished data scientist with over seven years of experience bringing a strong working knowledge of machine learning / artificial intelligence techniques and rigorous statistical methods to bear on a variety of real-world business problems to yield lean, actionable results and insights for improvement. A highly organized and efficient individual whose leadership and thorough, precise approach to projects has yielded excellent results.
Expert Python developer specializing in developing and deploying machine learning solutions.
Always on top of the current trends in relevant technologies, shifts in the data science climate, and improvements in existing methodologies.
Strong leadership skills with specific experience in the Agile framework.
Ability to take machine learning beyond proof of concept stage and into full productions and deployment.
Extensive experience with third party cloud resources: AWS, Google Cloud, Azure
Expertise in all common supervised machine learning methodologies: Naïve Bayes Classifiers, Linear Regression, Logistic Regression, Support Vector Machines, Support Vector Regression, Random Forests, Regression Forests, and Survival Modeling.
Strong proficiency with TensorFlow for building, testing, validating, selecting, and deploying successful and reliable machine learning algorithms using Python or R.
Proficiency with a variety of python libraries such as: NumPy stack (NumPy, SciPy, Pandas, and matplotlib), BeautifulSoup and RoboBrowser (Python) or RSelenium, RCurl, curl, httr, and rvest (R)
Experience in ensemble meta-algorithm ensemble techniques, including Bagging, Boosting, and Stacking
Strong knowledge with Natural Language Processing (NLP) methods, such as word2vec, sentiment analysis, named entity recognition, and part of speech tagging, Recurrent Neural Networks, and Transformers.
All Technical Skills
Programming
Python, R, SQL, Scala, Java, Javascript, Shell, MATLAB, C#
Libraries
NumPy, Pandas, Scikit-Learn, Keras, Matplotlib, statsmodels, SciPy, TensorFlow, PyTorch, Deeplearning4j, TSA, ggplot2, Gensim, Searborn, NLTK, BeautifulSoup4, Scrapy, Selenium
Machine Learning
Supervised & unsupervised Learning algorithms
Machine Learning, Survival Analysis, Machine Intelligence, Deep Learning, Machine perception, Data Mining, Neural Networks, Gradient Boosting Decision Trees, Convolutional Neural Networks, Cluster Analysis, K-Means, K Nearest Neighbors, KNN, Recurrent Neural Networks, RNN, Support Vector Machines, SVM, Bagging, Boosting, Random Forest, Bootstrap Aggregating, Additive Models, Ensemble Learning Meta Algorithms, Stacking, Regression, Generalized Linear Regression, GLM, Gradient Boosting Machine, GBM, DBScan, Ensemble Regressors, Naïve Bayes Classifier, PCA, ARIMA, Pipelines.
Analytical Methods
Advanced Data Modelling, Forecasting time series Models, Regression Analysis, Predictive Analytics, Statistical Analysis (ANOVA, correlation analysis, t-tests and z test, descriptive statistics), Sentiment Analysis, Exploratory Data Analysis, Stochastic optimization, Capital/Project Justification and Budgeting, Linear Programming, VBA, Equity Options Trading and Analysis. Predictive Modelling with Time Series (ARIMA) analysis, Principal Component Analysis (PCA) and Linear Discriminate Analysis for features selection on cluster analysis; Bayesian Analysis and information theory. Linear/Logistic Regression, Classification and Regression Trees (CART)
Data Visualization
Tableau, Matplotlib, Seaborn, Altair, ggplot2, Plotly, missingo
NLP
TensorFlow, Keras, spaCy, PyTorch, LSTM, NLTK, Gensim, AWS Transcribe, Comprehend
Version Control
GitHub, Git, SVN, Mercurial, AWS CodeCommit, Azure DevOps Repos
IDE
Jupyter Notebook, PyCharm, Visual Studio, Spyder, Eclipse, Atom
Big Data Ecosystems
Hadoop, SnowFlake, Oracle ExaData, Vertica, Teradata, Pivotal Greenplum, SAP IQ
SQL RDBMS
Microsoft SQL, MySQL, Oracle DB, AWS RDS, T-SQL, PostgreSQL, IBM DB2, Amazon Aurora, Azure SQL, MariaDB, SQLite, Microsoft Access
NoSQL ONDMs
PyMongo, HappyBase, Boto3 (DynamoDB), EclipseLink, Hibernate
NOSQL Database
MongoDB, Cassandra, Redis, HBase, Neo4j, Oracle NoSQL, Amazon DynamoDB, Couchbase, CouchDB
DATA SCIENTIST / SR. MACHINE LEARNING OPTIMIZATION ENGINEER
Microsoft
Redmond, WA
March 2018 - Present
Microsoft is a multinational technology company that develops, manufactures, licenses, supports and sells computer software, electronics, personal computers, and other related services and products. My primary role at Microsoft was to ensure end to end development and execution of Microsoft’s Cortana Virtual Assistant. This utilized extensive knowledge in Data Science technologies such as NLP, building of advanced transformer models, and working on Microsoft’s Azure for pipeline implementation and deployment.
Utilized Azure Kubernetes Services (AKS) for data ingestion clusters management
Worked with Azure Designer to design and upgrade existing data pipelines
Automated key end to end dataflow transformations and load balancing
Assisted in creation of multiple endpoint API’s for Cortana services
Worked with bidirectional encoder representations from transformers (BERT) to implement various models for Microsoft’s Cortana Virtual Assistant
Created new API triggers using Azure Functions providing simple solutions for complex orchestration challenges
Management of docker containers via Kubernetes to ensure coordination of node clusters at scale in production.
Utilized Numpy, Pandas for exploratory data analysis
Used libraries NLTK, Gensim, Glove for NLP preprocessing and embedding
Utlized Apache Spark based Azure Databricks to ingest data from Azure Data Factory in batches and real time using Kafka.
Optimized dashboards on Power BI to ensure stable workflow and updated visualizations.
Lead a team of five to ensure proper work distribution and meeting project deadlines
Scrum master for daily scrum stand up meetings, presented my teams accomplishments and future goals
Utilized Ingress Controllers in Azure for route HTTP traffic to different applications
Made use of multiple cognitive API’s including speech, language, Bing Search, QnA services.
Optimization and redeployment of core and value add services surrounding Cortana on multiple platforms such as Windows, smartphones, Xbox console, Edge Browser, and VR headsets
Managed code repository using Git to ensure code integrity is stable at all times and ready to deploy
DATA SCIENTIST
State Farm
Bloomington, IL
April 2016 – March 2018
State Farm is an insurance company specializing in offering Auto, Home, and Business Insurance. My role as a data scientist at State Farm was to provide end to end solutions for cloud migration to AWS for key services such as insurance ratemaking, and some NLP models. I worked with NLP to help improve their Phone call A.I Virtual Assistant. I achieved this by deploying several models, a rate maker using GLMs and GBMs that worked together to create insurance plans, and a plan purchase predictor using GBMs to find which plan a customer would purchase. Implemented the Agile scrum framework within the data science and analytics teams of the company, and continue to lead weekly scrum meetings to prioritize and assign tasks to members of the team
Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
Queried data from AWS RDS using Aurora Query Editor.
Used NumPy, Pandas, Altair for exploratory data analysis and visualization
Utilized Generalized Linear Model (GLM) with log link in Sci-Kit Learn and Gradient Boosted Machine (GBM) in XGBoost to create the rate maker.
Classification model was created using XGBoost to determine which plans a customer were most likely to choose.
Collaborated with data engineer team and e-commerce team to successfully deploy and integrate the model on the company website.
Built a Rest API using Flask to deploy the models an EC2 Instances within the existing software ecosystem.
Built visualizations with libraries such as Matplotlib, Seaborn, ggplot2, and Plotly.
Engineered an automated ETL pipeline for data ingestion and feature engineering using AWS Sagemaker.
Built internal packages in Python to either automate or streamline certain workflows in continuous model validation, reporting, data cleaning, data quality testing and reporting, and dependency checking within a given project
Performing continuous validation, testing, and implementation of models, and integration of new features
Constantly and continuously working to engineer and test new features from new data sources to improve model accuracy in all areas
Manage code repository using Git to ensure integrity of code base is maintained at all times
Used AWS tools such as Transcribe, Comprehend, Sagemaker, to update and improve framework of Phone Virtual Assistant.
Created NLP Text classification model, which received data from speech to text via phone calls.
DATA SCIENTIST
Airbus
Mirabel, QC
June 2014 – April 2016
Airbus is an aerospace manufacturer specializing in manufacturing and providing engineering support for business jet aircraft My role as a data scientist at Airbus was to provide predictive maintenance analytics for aircraft in use by customers. I built and deployed a proprietary LSTM Neural Network model using flight recorder data with Theano. Airbus then used our developed model to formulate dynamic maintenance schedules for their customers.
Worked in a Cloudera Hadoop environment, pulling data from clusters.
Used SQL for Hadoop data extraction with Hive.
Used Gaussian Kernel Smoothing Filter built in Python for sensor data noise reduction
Utilized NumPy, Pandas, Matplotlib, Seaborn, for exploratory data analysis and visualization.
Analyzed panel signal data received from flight data recorders.
Explored several different models for survival analysis such as Aalen’s Additive, Accelerated Failure Time Regression, and Cox Proportional Hazards.
Built Long Short-Term Memory Neural Network (LSTM) using NumPy, SciKit-Learn and Theano packages.
Gradient boosted decision tree regression models built with XGBoost.
Created hardware accelerated Neural Network models utilizing distributed GPU clusters on Theano and Dask.
Deployed pickled model inside a Flask app that was containerized using docker.
Preformed hyperparameter tuning using randomized grid search and K-Fold Cross-Validation with a stratified train-validate-test split.
gitAnalyzed performance of my proportional hazards model using the concordance index.
DATA ANALYST
PCC AEROSTRUCTURES
Toronto, ON
September 2013 – June 2014
PCC Aerostructures is a tier one manufacturer of aerospace parts and assemblies, primarily for landing gear used in Boeing aircraft. My role as a data analyst at PCC was to analyze quality data and create insight to help inform on business decisions. Mentored, trained, and taught more junior members of the data science and analytics team, in addition to duties.
Used the R package dplyr for data manipulation and analyzing
Maintained and contributed to many internal R packages used for building and diagnosing models, and automated reporting
Used R to perform ad-hoc analyses and deeper drill downs into spend categories of particular interest to clients on a project-to-project basis
Performed large data cleaning and preparation tasks using R and SQL to gather information from disparate and incompatible data sources from across a client’s entire enterprise to provide a complete view of all indirect spend
Helped to Maintain a large database of commodity and vendor information using SQL
Maintained various visualization tools and dashboards used to provide data-driven insights
EDUCATION
Bachelor of Applied Science in Electrical Engineering
University of Windsor
Windsor, ON