Post Job Free
Sign in

Data Scientist Machine Learning

Location:
Windsor, ON, Canada
Posted:
May 17, 2023

Contact this candidate

Resume:

Mahmood Abd

Mississauga, ON, Canada, L*M*Y*

Cell. 1-519-***-****

Cell +1-313-***-****

Email: *******@*****.***

Background

Good experience in Mathematics, statistic and Data analysis and Machine learning algorithms.

Hands on experience in Text mining, Text scrapping/ wrangling, Data streaming, extracting data from social media and websites using Python and Beautiful Soup for pulling data out of HTML.

Excellent experience in Python, SciPy, NumPy, Pandas, and Matplotlib with scikit-learn, Tensorflow, keras and Anaconda.

Proven knowledge in big data analytics and data mining, including running queries and compile data.

Excellent skills in computer version and image processing such as image reconstruction, and camera calibration.

Excellent experience Database management system such as PL/SQL, SQlite, dB Visualizer and MySQL.

Excellent experience in Data cleaning and preparation techniques such standarzation, normalization for time Series, Data Augmentation, transfer learning, Data normalization, data Binarization data, imputation, and data visualization.

Proven Natural Language processing (NLP) using Word2Vec and Glove using Scikit learn, NLTK and Pyspark.

Excellent experience in supervised such as classification, regression problems designing and prediction using Neural network, Support Vector machine (SVM), K-Nearest Neighbors (KNN), Logistic Regression (LR), linear regression Analysis (LRA), Naïve Bayes (NB).

Excellent experience in unsupervised machine learning algorithms such as K-means, hierarchical clustering, DBSCAN.

Experience in deep learning such as CNN, RNN, LSTM and Random Forest Tree with boosting, ensemble algorithms and TensorFlow and Keras.

Good experience and implementation of ARIMA model for analyzing and forecasting time series data.

Experience in algorithm evaluation techniques such as resampling (cross validation), improving results, feature selection, result visualization, validation and making prediction.

Experience in dealing with times series treatment such stationarization, aggregation, smoothing and polynomial fitting and recursive prediction.

Over 5-year experience in image processing and computer vison including feature extraction, normalization, Camera calibration, 3D image reconstruction, and linear algebra and numerical analysis techniques.

Good experience in vehicle to vehicle (V2V) and Vehicle in Infrastructure (V2I) and proven research.

Strong understanding of the big data cluster, and its architecture using Hadoop, spark and scoop.

Experience as Data engineering, data analysis pipeline and big Data ETL pipelines.

Excellent experience in Machine learning pipeline: Problem definition, Data ingestion, data preparation, data segregation, model training, model evaluation, model deployment and performance monitoring.

Excellent team player and ability to develop productive relationship in the workplace.

Good experience in Linux with makefile and script files and CUDA and cuDNN parallel programming platform and unit testing.

Good experience in Snowflakes, Anaconda Enterprise, Domino and using Azure for Machine learning.

Software Skills:

Machine Learning: Python, sci-kit learn (python), NLTK, spark, deep learning (RNN, CNN, LSTM, GAN, BERT), auto encoder and decoder and Tensorflow, and Keras, clustering such as k-mean and DBSCAN.

Programming: C++, Python, Visual Studio, Flask, PL/SQL, Pyspark, and C++.

Operating Systems: Windows, Linux.

Cloud Computing: AWS, OpenShift, Anaconda, Databricks, Domino, and Hadoop.

Database warehouse: Snowflakes, SQL server, Postgres, Teradata, HBase, Hive, Flume, Kafka, Mongo, and Oracle.

Project Management Tools: Agile, GiT, Bitbucket, Confluence, Rally and Jira.

PROFESSIONAL EXPERIENCE:

Data Scientist and AI Developer

GDI & A, Ford, Michigan, USA July 2019 – to Date

Responsibilities:

Design scalable pipelines system including designing, documentation implementation and testing. This role also includes data analysis, data mining and extracting the useful data pattern and data schema implementation.

Design ML model, data gathering, data cleaning and labeling, training and testing.

Developed a complete Machine learning pipelines including designing, documentation, testing and deployment on OpenShift and Anaconda. Then, Scheduling the job to be run on a certain time.

Developed and compete Data extraction ETL pipelines by extracting data from different data warehouse such as Flume, Kafka, HDFS, HIVE, SQL and Postgres,

I work as Hadoop and anaconda enterprise (AE) technical support by guiding other users to solve the technical problems, programming issues and deploying and scheduling jobs.

Develop Machine learning models to solve different ML problems such as supervised and unsupervised problem. The programming environments include Hadoop Pyspark, Anaconda Enterprise, Python, SQL, HIVE, HDFS and writing bash configuration files.

I work as a data scientist and AI developer with a data analysis team to develop a recommendation system, linear regression and other machine learning models. In these project individual customer profiles are built to tailor the user’s experience to recommend the most relevant websites (articles, video etc.) in real-time. In this project data manipulation and analysis is implemented to meet the algorithm requirements. Deep Machine Learning is designed and tested on a set of Databricks cluster based on Pyspark, the data is stored on snowflakes with terabytes.

My duty includes designing a complete machine learning mode including: Problem definition, Data ingestion, data preparation, data segregation, model training, model evaluation, model deployment and performance monitoring.

Trouble shooting and configuring Hadoop cluster and writing shell script to configure Pyspark, Hive, and flume with Jupyter notebook.

Porting data from PostgreSQL to MySQL server using automation scheduling using anaconda enterprise with python and daily scheduling. Data engineering, data analysis pipeline and big Data ETL pipelines.

Using Azure to develop machine algorithms pipeline applications like regression, classification and deep learning.

Develop Data mining and data analysis using RStudio and R under Anaconda.

Data Scientist and AI Developer

Dell, Austin, Texas, USA July 2018 – July 2019

Responsibilities:

This project is done by Dell development team in cooperation with Intel and I am working as data scientist and AI developer with data analysis team to determine the customer requirements in term of data manipulation and analysis. Designing and testing deep machine learning (BigDL and analytics zoo which build based on MLIB) on a set of Hadoop cluster based on Pyspark. I am doing programming and evaluation for analytics zoo and big deep learning using Pyspark using Jupyter notebook, Cloudera CDSW and using spark submit. I deal with different data types such as images, text and that stored in HDFS or local machine. I design and test application for multiple classes and multiple labels.

Evaluating Intel, BigDL/analytics zoo and Dell hardware infrastructure using Big Data with different test cases from petabyte images dataset to millions of rows and columns of structure and unstructured data.

Data analytics, cleansing and visualization.

Train the machine learning model and tune with different parameters.

Utilizing transfer learning on Hadoop and Pyspark, using spark submit, Cloudera and Jupyter notebook.

I am leading team, organize the stories on JIRA and document all coding, testing and road bock, completed task on confluence and using crucible fisheye for code review. I use Bitbucket for code repository, confluence for documentation and anaconda environment.

Developed a smart quotation system that build based on historical databased saved on SQL server as simi structure DB. The project is mainly heave NLP with tokenization, lemmatization and Word2vec, using neural network with embedding layer. The system hits very good accuracy as the data is very technical.

My duty includes designing a complete machine learning mode including: Problem definition, Data ingestion, data preparation, data segregation, model training, model evaluation, model deployment and performance monitoring.

Data engineering, data analysis pipeline and big Data ETL pipelines using python and RStudio. Porting structure and unstructured Data from MySQL server.

Data Scientist and AI developer

EASi, Troy, MI, USA November 2017 – July 2018

Responsibilities:

The project was for ABB Energy Company. Working with data analysis team to determine the customer requirements in term of data manipulation and analysis. Designing relational database using different DBMS such as SQL, SQLite and Hadoop Hive and Sqoop. Developing designing many applications using deep machine learning. Analyzing and visualizing data using Numpy, Pandas and Matplotlib in python and Tableau.

Implementing data ingestion standardization flow of Big data in different formats such as structured and unstructured data to the magnitude of Petabytes into big data platform Data Lake using tools like MapR, Hive, Pig, Sqoop, Spark, Python programming and R.

Developing deep neural network for sentiment analysis for big dataset of Gigabyte magnitude. Deep neural network using Tensorflow 1.0 is used with embedding layers with word2vec (skip gram). This project includes data preprocessing such as drop off the words that show up most often using subsampling techniques in order to get better representation and faster training. Batching also is used to pick the words around each input word in a certain window, and then negative sampling utilized to update the weight matrix in each iteration. The objective of this system is to find all words with similar semantic meanings.

I developed application for customer interest prediction of a historical data was used. This project dealt with big dataset. An instance with GPU, CUDA and cuDNN was configured and AWS cloud instance is utilized in order to deal with big dataset and speed up the performance. The correlation among data and outliers are identified with further processing before fed to the machine learning algorithm. Hadoop is utilized with spask and Hive, SQoop and programming language such as Python.

Developed transfer learning algorithm to deal with huge dataset using deep machine learning using VGG16, VGG19 and Inception algorithm using Keras and tensorflow the result was evaluated and was very promising in term of new unseen image classification.

Developed Generative Adversarial Networks GAN for scripts generation using big dataset in order to predict the answer for customer based on their inquiries. In this project, unstructured (text) data was used then converted into structure data based and save in SQL database before fed to the machine.

Developed sentiment analysis to predict the customer interest using Word2Vect algorithm under RNN machine learning. The data was manipulated to reduce the sparsity and make it fit with the customer requirements.

My duties include writing code and unit test for the code and doing statistical analysis for the code and review code and debugging and evaluating the results of the codes against a standard dataset using Unittest. Also configuring and using GPU on the server.

Developing NLP application for real time speech recognition which includes voice capturing, feature extraction, normalization, standardization and model designing. In this application LSTM is used and Mel-frequency campestral coefficients (MFCC) and wavelets are used as language feature descriptors. The application was successfully tested and verified with accuracy around 91% based on a set of commands (instructions) dataset.

My duty includes designing a complete machine learning mode including: Problem definition, Data ingestion, data preparation, data segregation, model training, model evaluation, model deployment and performance monitoring.

.

Data Scientist and Machine Learning developer

CITI group, Mississauga, ON, Canada January 2017- November 2017

Responsibilities:

Developed NLP application based on a large dataset collected for big stock market (Bloomberg). The data us text and unstructured data. The application includes data cleaning and converting into a structured data using Regular expression in Python scripting programming with anaconda environment. AWS cloud and GPU are utilized for big data.

Developed applications for data classification and regression based on deep learning such as CNN, RNN, GAN using Keras and Tensorflow. Good experience in Hadoop, pyspark and python to deal with big Data. Working with team and independently, with very strong statistical and analysis skills. Using GPU and AWS EC2 for big data processing.

Using data visualization tools such as Tableau and importing data into SQL and MS Excel for data archiving, retrieving and processing and designing statistical model for data analysis, using Python 3.5, NumPy, SciPy, Scikit-learn, Pandas, and deep machine learning such as (CNN, RNN ), word2vec and tensor flow and keras library for data cleaning, visualization and training, validation and testing machine learning algorithm.

Dealing with Big structured and unstructured data and the machine learning algorithms where run on GPU with cloud computing with AWS and GPU.

Model optimization and tuning by utilizing Dropout, regularization and early stopping in order to prevent overfitting and better generalization.

Developing Sentiment analysis for text generating based on historical text data using LSTM with tensorflow library in Jupyter notebook for Python 2.3.

This work is presented as a research paper to the spark summit conference in cooperation with the Intel development team.

Data Scientist

ONYX Engineering, Windsor, ON, Canada August 2016– December 2017

Responsibilities:

I worked as Data engineer and my role was: Extracting, transforming and loading data (ELT ) from big data files and analyze it and process it in spark over Hadoop cluster with varied number of nodes and cores.

Designed, developed and program an application for a real-time routing protocol for vehicle to Vehicle (V2V) and (V2I) application based on DSRC with Wi-Fi IEEE 802.11p. This application is designed in order to meet the requirements of real time routing protocol and evaluate dense traffic on message delivery.

Developed and programmed NLP and machine learning application using python for collecting tweets from news agencies and their twitters then analyses those tweets, recognize moments when something extraordinary happens and send these news to another module, which will send the news to the users using push notifications. The service is lunched every 10 minutes and download all news tweets. Then group tweets about the same event into one cluster, recognized the hottest news, then select the best tweet from each cluster. NumPy, SciPy, Pandas, python and Matplotlib are used for data cleaning and visualization and preparing for Machine learning.

Developed and programmed NLP and machine learning application with Pandas, NumPy, SciPy, Matplotlib, skit-learn, keras, tensorflow, and NLTK. The dataset is a set of documents were classified based on their contents. This project include ORC then convert unstructured data to structured, data cleaning, tokenization evaluation and prediction.

Defining and programing algorithms for vehicle detection based on deep learning algorithm such as CNN and Neural network in order to detect and count the vehicles in video sequence. The vehicle images segmented, and morphological opening used to remove noise fill the gaps in the image of object, bounding box also utilized in order to find the number of detected objects. This system successfully tested and the performance was evaluated.

Developed algorithm for objection detection and based on deep machine learning. The features of the vehicles used to train the mode and out of sample data set used to test the model, cross validation, train and test also used to evaluate the performance and detection accuracy. Machine learning algorithms such CNN with Keras and Tensorflow is utilized.

Developed, programmed, and application for Time Series forecasting for sales of goods, multilayer perceptron (MLP) and RNN utilized. The project includes historical data cleaning, normalization and machine learning training and evaluating its performance.

Developed and programmed algorithms for text mining, text scraping from social media, websites and clustering the relevant news (data) into clusters using K-mean clustering and hierarchical clustering, and document classification applications.

Developed and programmed IoT application that acquiring data from external measurement/sensing devices. The approach includes data cleaning, normalization, statistical analysis and generate a useful result on GUI interface.

Developed tracking and monitoring system communicate with IoT device and web-based application-using satellite modems (transmitter and receiver) are utilized to set the communication between the application and central computer. Web based application is utilized to invoke the data from Cloud and store it in SQL server.

Software Engineer and Data Scientist

Department of Electrical and Computer Engineering,

University of Windsor, Windsor, ON, Canada July 2013--August 2016

Designed and programmed Natural language processing (NLP) application for Arabic handwritten Character recognition. Supervised machine learning algorithms were used to classify the out of sample input. AlteconDB Arabic handwriting online database were utilized. Data cleaning and preparation was done first then normalization, feature selection, training, validation then testing were done to get better generalization and good classification results. The features that considered for training and testing the application are Wavelets and Mel- frequency campestral coefficients (MFCC). Cross validation with test-harness is employed for dataset validation. In this application RNN is employed. Python, NumPy, SciPy and Matplotlib used in this application.

Designed and programmed an application for 3D image reconstruction for computer vision. This project includes the following: camera calibration, construct fundamental matrix based on a set of captured points form checkerboard image. Then finding the highly correlated points using Zero Mean Normalized Cross-Correlation function (ZNCC), matching the points using epipolar geometry, and then using dense matching for reconstruct 3D image.

Developed and programmed an offline Arabic character recognition system in C++. IFN/ENIT –database of handwritten Arabic words. The Arabic word images were scaled, standardized, normalized extracted then after the statistical and structured features extracted and fed to one-vers SVM classifier model in order to predict the out-sample word then the recognized image displayed on GUI.

Designed and programmed complete machine learning application using unsupervised sunspots segmentation applications. The dataset was taken from the solar oscillation’s investigation Michelson Doppler image (SOI/MDI). The sunspots are extracted, data cleaning, data normalization and smoothed from full disk solar images. Invariant Morphological (structural) and statistical features were extracted and fed to the classifier. The recognition result was around 88% based on Zurich Classification manual.

Designed and developed deep machine learning application that can visually diagnose skin cancer based on transfer learning using VGG16 and Inception for skin cancer detection. The dataset was pulled from 2017 ISIC Challenge on Skin Lesion Analysis Towards Melanoma Detection. This application diagnoses three types of skin diseases (melanoma, nevus, or seborrheic keratosis). Preprocessing on data is done and patch normalization is utilized for better and performance.

Developed and programmed unsupervised sunspots classification and classification based on K-mean classifier. Data cleaning, and data normalization are performed the feature extracted. The similar sunspots are grouped into one cluster, then a set of experiment are implemented to evaluate the features selection, classifier performance then out of sample inputs is determined in which cluster is belonged.

Designed and programmed a set of application for edge detection, filtering such as Sobel, and Canny edge detection using large dataset of raw images.

Vehicle-to-Vehicle (V2V) communication and Vehicle to infrastructure communication (V2I) Vehicular to Vehicular (V2V) with secure safety message communication. Vehicle’s Authentication and message’s validation are tested using stochastic routing approach. This project implemented in Dedicated Short-Range communication (DSRC) based on IEEE 802.11 protocol. The feasibility of results approved, and the project tested in real-life. The project designed for an alliance of automotive companies and a team of six members worked closely together to achieve it.

Designed and developed a heuristic real-time routing protocol for wireless sensor network (WSNs) deployed in 2D and 3D space. The objective of this project was to meet the real time strict requirement in terms of data (packet) deliver, and deadline. The project tested using Telsob mote platform with IEEE 802.15.4 WIFI protocol.

Developing TV script generation application for deep machine learning using RNN (LSTM) with Tensorflow. The dataset was collected from 27 seasons. This application used to generate a new TV script for a scene at Moe’s Tavern.

Developing Image classification using CNN with Keras based on learning transfer with VGG16, 19, Inception and ResNet50. This application deals with very large dataset. The data set are pre-processed in order to make all images standard and normalized then fed to CNN. The application was tested without sample images and the prediction results was promising.

Software Developer,

TaTeches, Waterloo, ON, Canada January 2011 – July 2013

Responsibilities:

Developed applications that receive data from external measurement device as analog signal, perform DSP preprocessing such as filtering then performing statistical analysis, and visualization graphs. This project is flexible to read data from any input device and store the results in Data sheets in specific format. I used Python and C++ and Visual C++.

Software Developer,

Research in Motion (RIM), Waterloo, ON January 2009 – January 2011

Responsibilities:

Developed application that receive data from external measurement devices, perform calculation and analysis, and provide visual charts on GUI. This project is flexible to read data from any input device and store the results in Data sheets in specific format.

Programming Environment:

C, C++, MATLAB, Python, Excel, Scipy, NumPy, Matplotlib, RStudio and MS Office applications

Machine Learning algorithms:

Linear Discriminant Analysis (LDA)

K-Nearest Neighbors (KNN).

Classification and Regression Trees (CART).

Support Vector Machines (SVM).

Developing and verifying the mathematical model and tested with the real application.

Responsibilities

Cooperative with other members in data gathering, testing, debugging and documentation.

Cooperating with other team member in terms of testing, debugging, and documentations.

Handwriting Feature detection, extraction, pre-processing and selection.

Prepare data and feed it to the classifier.

Evaluate the classifier classification results.

Documentation.

Data Analyst

Dubai Capital Market, September 1999- September 2008

Performed daily prediction for the stock market prices using historical data. Deal with Data quality issue management and improvement, handling and alerting process. Reported the issues that may impacting data quality, documenting, and tracking. Worked on prices prediction of real states.

PROFESSIONAL CERTIFICATES

SQL and relational databases 101, IBM 2019

Spark MLib, IBM 2019

Big data Hadoop and spark Developer 2019

Machine learning, Coursera 2018

Deep learning Nanodegree, Udacity 2018

Certificate from Google in Deep machine learning, 2017

Microsoft Certificate: Microsoft Certified Professional (MCP), 2009

Oracle Certificate: Introduction to Oracle 9i: PL/SQL, 2008

EDUCATION

PhD - Electrical and computer Engineering, University of Windsor, ON 2015

MSc - Information technology, University of Liverpool, UK 2009

B.Sc. – Computer Science, University of Baghdad, Iraq. 1992

RESEARCH PUBLICATIONS:

Mehmmood Abd, “Chest x-ray: fast and accurate training with big and imbalanced dataset “, Spark and AI summit, san Francesco, USA, 2019, April 21

Mehmmood A. Abd, Brajendra K. Singh, Sarab F. Al Rubeaai, Kemal E. Tepe, Rachid Benlamri,“ Extending Wireless Sensor Network Lifetime with Global Energy Balance”, IEEE Sensors Journal, vol.15 no.9 pp.5053-5063, 2015.

Sarab F. Al Rubeaai, Mehmmood A. Abd, Brajendra K. Singh, Kemal E. Tepe, ”3D Real-Time Routing Protocol with Tunable Parameters for Wireless Sensor Networks”, Will appear IEEE Sensors Journal.

Mehmmood A. Abd, Sarab F. Al Rubeaai, and George Paschos, “Hybrid Feature for an Arabic word Recognition System,” Journal of Computer Technology and Application. vol.3, pp. 685-691, 2012.

Mehmmood A. Abd, Sarab F. Al Rubeaai, Kemal E. Tepe, Rachid Benlamri,” Game Theoretic Energy Balancing Routing in Three Dimensional Wireless Sensor Networks “,IEEE Wireless Communication and Network Conference WCNC2015, New Orleans, USA, pp. 1596-1601,2015.

Mehmmood A. Abd, Brajendra K. Singh, Sarab F. Al Rubeaai, Kemal E. Tepe, Rachid Benlamri, “Game Theoretic Energy Balanced GTEB Routing Protocol for Wireless Sensor Networks”, IEEE Wireless Communication and Network Conference WCNC2014, Istanbul, Turkey, pp. 2564 - 2569, 6-9 April 2014.

Sarab F. Al Rubeaai, Brajendra K. Singh, Mehmmood A. Abd, Kemal E. Tepe “Adaptive Packet Forwarding Region Based Three Dimensional Real-Time Geographical Routing Protocol (3DRTGP) for Wireless Sensor Networks”, IEEE Wireless Communication and Network Conference WCNC2014, Istanbul, Turkey, pp. 2120 - 2125, 2014.

Sarab F. Majed Al Rubeaai, Brajendra K. Singh, Mehmmood A. Abd, Tepe, Kemal E., "Region based three dimensional real-time routing protocol for wireless sensor networks," Sensors, 2013 IEEE, vol., no., pp.1,4, 3-6 Nov. 201.

Mehmmood A. Abd, Sarab F. Majed, V. Zharkova,” Automated Sunspots Groups Classification Using Support Vector Machines,” Int. Confr. Technological Developments in Networking, Education and Automation, Springer, pp. 321-325,2010.

Sarab. F. Majed al Rubeaai, Mehmmood .Abd and V.Zharkova, “Unsupervised segmentation of sunspot groups on full disk images, “, Int. Joint Confr. on Computer, Information, and Systems Sciences, and Engineering, Springer,pp 297-301,2010.

Mehmmood A. Abd and G. Paschos, “Effective Arabic Character Recognition Using Support Vector Machines,” Int. Confr. Innovations and Advanced Technologies in Computer and Information Sciences and Engineering, Springer, ISBN 978-1-4020-6267-4, PP 7-11, 2007.

Awards and honors

Frederick Atkins Graduate Award for outstanding research achievement for 2013-2014.

REFERENCES AVAILABLE UPON REQUEST



Contact this candidate