Resume

Mustafa Alam - DATA SCIENTIST

Location:

Dallas, TX

Posted:

June 15, 2021

Contact this candidate

Resume:

Mustafa Alam

214-***-****

adm45z@r.postjobfree.com

Summary

12+ Years Data Science Experience

Professional in the field of Data Science with 12+ years of experience in statistical analysis, data analytics, data modeling, and creation of custom algorithms. Application to the disciplines of machine learning and neural networks using a variety of systems and methods in training algorithms with different could platform. Industry experience includes predictive analytics in finance, marketing, advertising, geospatial and Internet of Things (IoT). Use of NLP and Computer Vision technologies.

Skills:

• Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction

• Familiarity with developing, deploying, and maintaining production NLP models with scalability in mind

• Worked on Natural Language Processing with NLTK, SpaCy and other module for application development for automated customer response

• Wrote automation processes using Python and the AWS Lambda service

• Utilized Docker to handle deployment on heterogeneous platforms such as Linux, Windows, OSX, and AWS

• Reviewed the use of MongoDB, node.js, and Hadoop to automate the data ingestion and initial analysis processes

• Reviewed and deployed the infrastructure on AWS to minimize cost while providing the required functionality

• Scale analytics solutions to Big Data with Hadoop, Spark/PySpark, and other Big Data tools

• Experience with Public Cloud (Google Cloud, Amazon AWS and/or Microsoft Azure)

• Experience working with big data infrastructure with tools such as Hive, Spark and h2o, sparkling water

• Implementing solutions with common NLP frameworks and libraries in Python (NLTK, spaCy, gensim) or Java (Stanford CoreNLP, NLP4J)

• Experience with knowledge databases and language ontologies

• Quantitative training in probability, statistics and machine learning

• Experience in the application of Neural Network, Support Vector Machines (SVM), and Random Forest.

• Creative thinking and propose innovative ways to look at problems by using data mining approaches on the set of information available.

• Identifies/creates the appropriate algorithm to discover patterns, validate their findings using an experimental and iterative approach.

Technical Skills

Programming

Python, Spark, SQL, R, Git, MATLAB, bash Libraries

NumPy, Pandas, Scipy, Scikit-Learn, Tensorflow, Keras, PyTorch, statsmodels, Prophet, lifelines, PyFlux, arch, FeatureTools, Lime

Version Control

GitHub, Git, BitBucket, Box, Quip IDE

Pycharm, Sublime, Atom, Jupyter Notebook, Spyder

Data Stores

Large Data Stores, both SQL and noSQL, data warehouse, data lake, Hadoop HDFS, S3 RDBMS

SQL, MySQL, PL/SQL, T-SQL, PostgreSQL

NoSQL

Amazon Redshift, Amazon Web Services (AWS), Cassandra, MongoDB, MariaDB Computer Vision

Convolutional Neural Network (CNN), Faster R-CNN, YOLO

Big Data Ecosystems

Hadoop (HBase, Hive, Pig, RHadoop, Spark, HDFS), Elastic Search, Cloudera Impala. Cloud Data Systems

AWS (RDS, S3, EC2, Lambda), Azure, GCP

Data Visualization

Matplotlib, Seaborn, rasterio, Plotly, Bokeh NLP

NLTK, Spacy, Gensim, Bert, Elmo

Machine Learning

Supervised and unsupervised Learning algorithms

Machine Learning, Natural Language Processing,, Deep Learning, Data Mining, Neural Networks,

Linear Regression, Lasso and Ridge, Logistic Regression, Ensemble Classifiers (Bagging, Boosting and Voting), Ensemble Regressors, KNN,

Naïve Bayes Classifier, Clustering (K-MEANS, GMMs, DBSCAN), PCA, SVD, ARIMA. Analytical Methods

Advanced Data Modeling, Regression Analysis, Predictive Analytics, Statistical Analysis (ANOVA, correlation analysis, t-tests and z-tests, descriptive statistics), Sentiment Analysis, Exploratory Data Analysis. Time Series analysis (ARIMA) and forecasting (TBATS, LSTM, ARCH, GARCH), Principal Component Analysis (PCA) and SVD; Linear and Logistic Regression, Decision Trees and Random Forest.

Professional Experience

AEP Texas in Dallas, Texas

October 2019 – Present

Senior Machine Learning Scientist

AEP Texas is a subsidiary of American Electric Power, based in Columbus Ohio. Lead a small team of data scientists and data engineers where we created numerous demand forecasting models from AEP Texas historical data hosted on Hadoop HDFS and Hive, to estimate short-term demand peaks for optimizing economic load dispatch. The project involved prediction of demand for electricity within the market area at 2 hour to 2 week outlooks. Multiple algorithms were employed explored and implemented.

• Endeavored multiple approaches for predicting day ahead energy demand with Python, including exponential smoothing, ARIMA, Prophet, TBATS, and RNNs (LSTM).

• Successfully built a Generalized Autoregressive Conditional Heteroskedasticity (GARCH) using PyFlux to model the uncertainty of other time series, ensuring a ‘safety’ stock of generating units.

• Incorporated geographical and socio-economic data scraped from outside resources to improve accuracy.

• Incessantly validated models using a train-validate-test split to ensure forecasting was sufficient to elevate optimal output of the number of generation facilities to meet system load.

• Prevented over-fitting with the use of a validation set while training.

• Built a meta-model to ensemble the predictions of several different models.

• Performed feature engineering with the use of NumPy, Pandas, and FeatureTools to engineer time-series features.

• Coordinated with facility engineers to understand the problem and ensure our predictions were beneficial.

• Participated in daily standups working under an Agile KanBan environment.

• Queried Hive by utilizing Spark through the use of Python’s PySpark Library.

JC Penny in Plano, Texas

June 2017 – September 2019

Senior Data Scientist

JC Penny is the largest U.S. based retail department store company. Along with its physical stores, JC Penny also has a successful online store which caters to all of customers fashion and home needs through their online store macys.com and their mobile app. As a Data Science consultant for JC Penny online store division I lead a team to optimize a recommendation engine using Mixed Hybrid Recommender system. The revenue impact from deployment of the new optimized Recommender system on their website and mobile apps was expected to be more than 9% increase in online sales.

The recommendation engine was reinforced with advanced NLP techniques using Tensorflow and Keras based Transformer models.

• Used techniques like collaborative filtering, content-based, demographic recommender system for creating the Hybrid Mixed recommender system.

• Used A/B testing to test the effectiveness of different types of recommender system and optimized the most effective recommender system after careful tests and research.

• Partially solved the “cold start” problem of recommender system by incorporating the Demographic based recommender system in the final Hybrid mixed recommender system. Worked on data preprocessing and cleaning the data utilizing Pandas, NumPy, and performing feature engineering and data imputation techniques for missing values in the dataset using Python.

• Performed stemming and lemmatization of text to remove superfluous components and make the resulting corpus as small as possible while containing all important information.

• Experience with Keras and TensorFlow in developing predictive algorithms.

• Solved analytical problems, and effectively communicated methodologies and results.

• Concept space embedding via ELMo was also tested and found to have similar results to bag of words with significant increase in computational time.

• Constructed an NLP-based filter utilizing embedding and LSTM layers in Tensorflow and Keras.

Micron in Boise, Idaho

November 2014 – January 2017

Data Scientist

Micron is a well-integrated semiconductor-based company with foundries and fabs all over the World. Micron specialize in planar based multi-stack systems. Their Austin facility hosts automotive, memory and institutional research. Their principal problem is the detection and forecasting of Angstrom-scale device failure. In order to solve this problem several solutions were implemented. First a combination of logistic regression and decision trees were used to classify failure based on various parameters during the production process. Finally, a machine vision stage was set up to detect physical visible error, using a convolutional neural network to verify production stages and aid in feature engineering for regression stages.

• Developed a predictive model and validate Neural Network Classification model to facilitate prediction algorithms.

• Improved efficiency of the model by boosting method on prediction model to improve efficiency.

• Used Convolutional Neural Networks and Machine Vision to detect and predict flaws in stereo lithography nano machined stacks.

• Used R and Python for programming for improvement of model and explored regression and ensemble models in machine learning to perform prediction.

• Developed a predictive model and validated Neural Network Classification model to predict the feature label.

• Developed machine learning algorithms utilizing Caffe, TensorFlow, Scala, Spark, MLLib, R SciPy, MatPlotLib, NLTK, Python, SciKit-Learn, etc.

• Performed statistical analysis and built statistical models in R and Python using various supervised and unsupervised Machine Learning algorithms like Regression, Decision Trees, Random Forests, Support Vector Machines, K- Means Clustering and dimensionality reduction.

• Used MLlib, Spark's Machine learning library to build and evaluate different models.

• Transformed logical data model to physical using ERwin ensuring the primary key - foreign key relationships, consistency of definitions of data attributes and indexes.

• Designed the Data Marts in dimensional data modeling using star and snowflake schemas.

• Used Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.

• Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.

• Define the list codes and code conversions between the source systems and the data mart enterprise metadata library with any changes or updates.

• Developing and enhancing statistical models by leveraging best-in-class modeling techniques.

• Developed a predictive model and validated Neural Network Classification model for predicting the feature label.

Petso Financial Consultants, LLC in Boise, Idaho

January 2010 – October 2014

Data Analyst and Expert System Specialist

Petso Financial is a private asset management firm with over 600,000,000 in assets. As a financial analyst I performed risk management calculations, survival analysis and customer lifetime value predictions.

• Designed a suite of Interactive dashboards, which provided an opportunity to measure performance and allowed executives to adjust business strategies

• Worked on outlier detection with data visualizations using box-plots, feature engineering using Gaussian Mixture Models and K-NN distances built using Pandas, NumPy.

• Analyzed data using data visualization tools and reported key features using statistic tools and supervised machine learning techniques to achieve project objectives.

• Analyzed large data sets and apply machine learning techniques and develop predictive models, statistical models.

• Used Tensor Flow library in GPU environment for training and testing of Deep Neural Networks.

• Used R and Python for Exploratory Data Analysis, A/B testing, ANOVA testing and Hypothesis test to compare and identify the effectiveness.

• Built and analyzed datasets using R, MATLAB and Python.

• Designing and developing various machine learning frameworks using Python, R, and MATLAB.

• Utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & K-Nearest Neighbor for data analysis.

• Experience with Keras and TensorFlow in developing deep learning based predictive algorithms.

• Implementation of machine learning algorithms and concepts such as: K-means Clustering (varieties), Gaussian mixture distribution, decision tree etc.

• Analyzed large data sets and applied machine learning, and predictive statistical models.

• Provided and created data presentations to executives to guide business decisions.

• Dealt with millions of rows of data using SQL and performed Exploratory Data Analysis.

• Successfully interpreted, analyzed and performed Predictive Modelling using Python with Numpy, Pandas packages.

• Worked with TensorFlow, Caffe2 and Torch.

Education

Ph.D. in Electrical Engineering

University of Texas at Arlington - Arlington, TX

M.S. in Electrical Engineering

University of Idaho - Moscow, ID

B.S. in Electrical & Electronic Engineering

Bangladesh University of Engineering and Technology - Dhaka, BD

Contact this candidate