Mike Ghosn - Data Science

Location:

Redmond, WA

Salary:

Posted:

July 31, 2020

Contact this candidate

Resume:

Top Skills

Python (* years)

R (* years)

Model Development

Machine Learning Deployment

Deep Learning

Statistical Methods

Cloud Development

Big Data Analytics

Communication & Leadership

Summary

7 Years Data Science Experience

An accomplished data scientist with over seven years of experience bringing a strong working knowledge of machine learning / artificial intelligence techniques and rigorous statistical methods to bear on a variety of real-world business problems to yield lean, actionable results and insights for improvement. A highly organized and efficient individual whose leadership and thorough, precise approach to projects has yielded excellent results.

Expert Python developer specializing in developing and deploying machine learning solutions.

Always on top of the current trends in relevant technologies, shifts in the data science climate, and improvements in existing methodologies.

Strong leadership skills with specific experience in the Agile framework.

Ability to take machine learning beyond proof of concept stage and into full productions and deployment.

Extensive experience with third party cloud resources: AWS, Google Cloud, Azure

Expertise in all common supervised machine learning methodologies: Naïve Bayes Classifiers, Linear Regression, Logistic Regression, Support Vector Machines, Support Vector Regression, Random Forests, Regression Forests, and Survival Modeling.

Strong proficiency with TensorFlow for building, testing, validating, selecting, and deploying successful and reliable machine learning algorithms using Python or R.

Proficiency with a variety of python libraries such as: NumPy stack (NumPy, SciPy, Pandas, and matplotlib), BeautifulSoup and RoboBrowser (Python) or RSelenium, RCurl, curl, httr, and rvest (R)

Experience in ensemble meta-algorithm ensemble techniques, including Bagging, Boosting, and Stacking

Strong knowledge with Natural Language Processing (NLP) methods, such as word2vec, sentiment analysis, named entity recognition, and part of speech tagging, Recurrent Neural Networks, and Transformers.

All Technical Skills

Programming

Python, R, SQL, Scala, Java, Javascript, Shell, MATLAB, C#

Libraries

NumPy, Pandas, Scikit-Learn, Keras, Matplotlib, statsmodels, SciPy, TensorFlow, PyTorch, Deeplearning4j, TSA, ggplot2, Gensim, Searborn, NLTK, BeautifulSoup4, Scrapy, Selenium

Machine Learning

Supervised & unsupervised Learning algorithms

Machine Learning, Survival Analysis, Machine Intelligence, Deep Learning, Machine perception, Data Mining, Neural Networks, Gradient Boosting Decision Trees, Convolutional Neural Networks, Cluster Analysis, K-Means, K Nearest Neighbors, KNN, Recurrent Neural Networks, RNN, Support Vector Machines, SVM, Bagging, Boosting, Random Forest, Bootstrap Aggregating, Additive Models, Ensemble Learning Meta Algorithms, Stacking, Regression, Generalized Linear Regression, GLM, Gradient Boosting Machine, GBM, DBScan, Ensemble Regressors, Naïve Bayes Classifier, PCA, ARIMA, Pipelines.

Analytical Methods

Advanced Data Modelling, Forecasting time series Models, Regression Analysis, Predictive Analytics, Statistical Analysis (ANOVA, correlation analysis, t-tests and z test, descriptive statistics), Sentiment Analysis, Exploratory Data Analysis, Stochastic optimization, Capital/Project Justification and Budgeting, Linear Programming, VBA, Equity Options Trading and Analysis. Predictive Modelling with Time Series (ARIMA) analysis, Principal Component Analysis (PCA) and Linear Discriminate Analysis for features selection on cluster analysis; Bayesian Analysis and information theory. Linear/Logistic Regression, Classification and Regression Trees (CART)

Data Visualization

Tableau, Matplotlib, Seaborn, Altair, ggplot2, Plotly, missingo

NLP

TensorFlow, Keras, spaCy, PyTorch, LSTM, NLTK, Gensim, AWS Transcribe, Comprehend

Version Control

GitHub, Git, SVN, Mercurial, AWS CodeCommit, Azure DevOps Repos

IDE

Jupyter Notebook, PyCharm, Visual Studio, Spyder, Eclipse, Atom

Big Data Ecosystems

Hadoop, SnowFlake, Oracle ExaData, Vertica, Teradata, Pivotal Greenplum, SAP IQ

SQL RDBMS

Microsoft SQL, MySQL, Oracle DB, AWS RDS, T-SQL, PostgreSQL, IBM DB2, Amazon Aurora, Azure SQL, MariaDB, SQLite, Microsoft Access

NoSQL ONDMs

PyMongo, HappyBase, Boto3 (DynamoDB), EclipseLink, Hibernate

NOSQL Database

MongoDB, Cassandra, Redis, HBase, Neo4j, Oracle NoSQL, Amazon DynamoDB, Couchbase, CouchDB

DATA SCIENTIST / SR. MACHINE LEARNING OPTIMIZATION ENGINEER

Microsoft

Redmond, WA

March 2018 - Present

Microsoft is a multinational technology company that develops, manufactures, licenses, supports and sells computer software, electronics, personal computers, and other related services and products. My primary role at Microsoft was to ensure end to end development and execution of Microsoft’s Cortana Virtual Assistant. This utilized extensive knowledge in Data Science technologies such as NLP, building of advanced transformer models, and working on Microsoft’s Azure for pipeline implementation and deployment.

Utilized Azure Kubernetes Services (AKS) for data ingestion clusters management

Worked with Azure Designer to design and upgrade existing data pipelines

Automated key end to end dataflow transformations and load balancing

Assisted in creation of multiple endpoint API’s for Cortana services

Worked with bidirectional encoder representations from transformers (BERT) to implement various models for Microsoft’s Cortana Virtual Assistant

Created new API triggers using Azure Functions providing simple solutions for complex orchestration challenges

Management of docker containers via Kubernetes to ensure coordination of node clusters at scale in production.

Utilized Numpy, Pandas for exploratory data analysis

Used libraries NLTK, Gensim, Glove for NLP preprocessing and embedding

Utlized Apache Spark based Azure Databricks to ingest data from Azure Data Factory in batches and real time using Kafka.

Optimized dashboards on Power BI to ensure stable workflow and updated visualizations.

Lead a team of five to ensure proper work distribution and meeting project deadlines

Scrum master for daily scrum stand up meetings, presented my teams accomplishments and future goals

Utilized Ingress Controllers in Azure for route HTTP traffic to different applications

Made use of multiple cognitive API’s including speech, language, Bing Search, QnA services.

Optimization and redeployment of core and value add services surrounding Cortana on multiple platforms such as Windows, smartphones, Xbox console, Edge Browser, and VR headsets

Managed code repository using Git to ensure code integrity is stable at all times and ready to deploy

DATA SCIENTIST

State Farm

Bloomington, IL

April 2016 – March 2018

State Farm is an insurance company specializing in offering Auto, Home, and Business Insurance. My role as a data scientist at State Farm was to provide end to end solutions for cloud migration to AWS for key services such as insurance ratemaking, and some NLP models. I worked with NLP to help improve their Phone call A.I Virtual Assistant. I achieved this by deploying several models, a rate maker using GLMs and GBMs that worked together to create insurance plans, and a plan purchase predictor using GBMs to find which plan a customer would purchase. Implemented the Agile scrum framework within the data science and analytics teams of the company, and continue to lead weekly scrum meetings to prioritize and assign tasks to members of the team

Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.

Queried data from AWS RDS using Aurora Query Editor.

Used NumPy, Pandas, Altair for exploratory data analysis and visualization

Utilized Generalized Linear Model (GLM) with log link in Sci-Kit Learn and Gradient Boosted Machine (GBM) in XGBoost to create the rate maker.

Classification model was created using XGBoost to determine which plans a customer were most likely to choose.

Collaborated with data engineer team and e-commerce team to successfully deploy and integrate the model on the company website.

Built a Rest API using Flask to deploy the models an EC2 Instances within the existing software ecosystem.

Built visualizations with libraries such as Matplotlib, Seaborn, ggplot2, and Plotly.

Engineered an automated ETL pipeline for data ingestion and feature engineering using AWS Sagemaker.

Built internal packages in Python to either automate or streamline certain workflows in continuous model validation, reporting, data cleaning, data quality testing and reporting, and dependency checking within a given project

Performing continuous validation, testing, and implementation of models, and integration of new features

Constantly and continuously working to engineer and test new features from new data sources to improve model accuracy in all areas

Manage code repository using Git to ensure integrity of code base is maintained at all times

Used AWS tools such as Transcribe, Comprehend, Sagemaker, to update and improve framework of Phone Virtual Assistant.

Created NLP Text classification model, which received data from speech to text via phone calls.

DATA SCIENTIST

Airbus

Mirabel, QC

June 2014 – April 2016

Airbus is an aerospace manufacturer specializing in manufacturing and providing engineering support for business jet aircraft My role as a data scientist at Airbus was to provide predictive maintenance analytics for aircraft in use by customers. I built and deployed a proprietary LSTM Neural Network model using flight recorder data with Theano. Airbus then used our developed model to formulate dynamic maintenance schedules for their customers.

Worked in a Cloudera Hadoop environment, pulling data from clusters.

Used SQL for Hadoop data extraction with Hive.

Used Gaussian Kernel Smoothing Filter built in Python for sensor data noise reduction

Utilized NumPy, Pandas, Matplotlib, Seaborn, for exploratory data analysis and visualization.

Analyzed panel signal data received from flight data recorders.

Explored several different models for survival analysis such as Aalen’s Additive, Accelerated Failure Time Regression, and Cox Proportional Hazards.

Built Long Short-Term Memory Neural Network (LSTM) using NumPy, SciKit-Learn and Theano packages.

Gradient boosted decision tree regression models built with XGBoost.

Created hardware accelerated Neural Network models utilizing distributed GPU clusters on Theano and Dask.

Deployed pickled model inside a Flask app that was containerized using docker.

Preformed hyperparameter tuning using randomized grid search and K-Fold Cross-Validation with a stratified train-validate-test split.

gitAnalyzed performance of my proportional hazards model using the concordance index.

DATA ANALYST

PCC AEROSTRUCTURES

Toronto, ON

September 2013 – June 2014

PCC Aerostructures is a tier one manufacturer of aerospace parts and assemblies, primarily for landing gear used in Boeing aircraft. My role as a data analyst at PCC was to analyze quality data and create insight to help inform on business decisions. Mentored, trained, and taught more junior members of the data science and analytics team, in addition to duties.

Used the R package dplyr for data manipulation and analyzing

Maintained and contributed to many internal R packages used for building and diagnosing models, and automated reporting

Used R to perform ad-hoc analyses and deeper drill downs into spend categories of particular interest to clients on a project-to-project basis

Performed large data cleaning and preparation tasks using R and SQL to gather information from disparate and incompatible data sources from across a client’s entire enterprise to provide a complete view of all indirect spend

Helped to Maintain a large database of commodity and vendor information using SQL

Maintained various visualization tools and dashboards used to provide data-driven insights

EDUCATION

Bachelor of Applied Science in Electrical Engineering

University of Windsor

Windsor, ON

Contact this candidate