Post Job Free
Sign in

Data Analyst Engineering

Location:
Hammond, IN
Posted:
June 18, 2020

Contact this candidate

Resume:

Call: +1-219-***-**** SHAHRUKH ALAM KHAN *******************@*****.***

In: https://www.linkedin.com/in/shahrukh-alam-2086b5157/ GitHub: https://github.com/shahrukh-ak Willing to work remotely Pursuing career in Data Science, Data Analysis, Machine Learning & AI QUALIFICATION

Purdue University Northwest

NED University of Engineering

& Technology

TECHNICAL SKILLS

Programming: Python (Pandas, NumPy, Scikit-Learn, Seaborn, Matplotlib, NLTK, Keras, TensorFlow, SQLite), R, C/C++, MATLAB/Octave etc.

Tools: Power BI, Tableau, Excel, RStudio, TensorFlow etc.

Statistical skills: Null Hypothesis, Probability Distributions, Basic Inference Methods, Linear Models, Nonparametric Statistics, Sampling, Resampling methods, Hypothesis Testing, Confidence Interval, P-value, Critical value, Confusion Matrix, Z-Test, T-Test, ANOVA, Chi-Square Test, Analysis of Variance, Correlation, Feature Engineering / Feature Selection techniques, Regression, Shapiro-Wilk normality test, A/B testing etc.

Databases: SQL, Spark SQL etc.

Big Data Tools & Techniques: Apache Hadoop architecture & HDFC (Flume, Sqoop, Hive, Pig, Spark), Map reduce Programming, YARN, Spark Programming & Data frames, Spark SQL, MLlib etc.

Machine Learning: Linear & Logistic Regression, Rule-based decision tree and Random Forests, Model fitting, model selection, Bayesian regression, classification, clustering, Naive Bayes and Discriminant Analysis, k-Means, EM, SVM, Hierarchical clustering, Neural Networks, k-fold cross validation technique, Deep Learning(TensorFlow, Keras), NLP, Computer vision, ASR, KNN, data mining, ID3 algorithm and C5 decision tree for classification and prediction, Association Analysis and Dimension Reduction Techniques, several machine learning/deep learning libraries etc.

Deep Learning Techniques: Deep learning pipeline, Deep learning in context of big data, data pre-processing techniques for deep learning, feature engineering, one-hot encoding, word embeddings, deep learning evaluation methods, Neural Network, Deep Neural Network, multi-layer Neural Networks, CNN, RNN etc.

Operating System: Windows Vista/XP/7/8/10, MAC, Linux etc.

Git: GitHub, Git.

WORK EXPERIENCE

BEULAHWORKS, LLC, USA

DATA ANALYST INTERN Jan/2020-May/2020

Implemented and tested a statistical data analysis software module using various technologies including R language, statistical algorithms, and Google Cloud Platform.

Conducted One Sample t-test, Welch Two Sample t-test, ANOVA, Paired t-test, 2-sample test for equality of proportions with continuity correction, One sample Chi-squared test for variance, Shapiro-Wilk normality test, One-sample Sign-Test, and made decisions based on p-value obtained from these tests.

Collaborated and Communicated the results of analysis to the decision makers by presenting actionable insights through visualizing the data using scatterplot, Box-and-whisker plot, Normal QQ Plots, histograms, bar plots.

Used Sampling, Resampling, Hypothesis, Testing, Confidence Interval, P-value, Critical value, Confusion Matrix, Z- Test, T-Test, Analysis of Variance, Correlation, Feature Engineering / Feature Selection techniques, A/B testing etc. using R programming to improve data quality.

PURDUE UNIVERSITY, USA

TEACHING ASSISTANT-GRADER Aug/2019-Dec/2019

Managed 25+ undergraduate students, compiling homework, tests, programming assignments and exams.

Helped student to find bugs and errors in their codes and also assisted & guided them in their projects. GRADUATE RESEARCH ASSISTANT Jan/2019-Dec/2019

Big data-Research:

Performed research on the data related to construction industry and used machine learning and statistical modelling techniques to develop and evaluate algorithms.

Conducted hypothesis, testing, one-sample test, Welch sample t-test, chi-squared test for variance, Shapiro-Wilk normality test, ANOVA & summary methods using R to improve data quality. Athletic Autonomous Drone:

Successfully completed hardware implementation of autonomous athletic drone.

Applied deep learning-based approach and developed motion capturing system. Masters of Science in Engineering (GPA = 3.5/4) Dec/2020 Bachelors of Engineering Nov/2017

Ba

GREEN WAYS ENERGY

DATA ANALYST

Worked on smart sensors and energy monitoring software to help homeowners in managing their home battery usage intelligently by developing the algorithm that analyzes data points such as real-time electricity usage, power generated from solar, the weather, the billing tariff of the utility company, and the battery capacity.

Updated company data warehousing techniques such as data recall and segmentation, resulting in a 20% increase in usability for non-technical staff members.

Performed market analysis on the forecast, demand and capital of the products and achieved increased in sales by 40%

Utilized R and Python to analyze data through model building, design and statistical analysis. Wrote efficient and well documented code for descriptive and inferential statistics. PROJECTS (More on GitHub: https://github.com/shahrukh-ak )

Linear regression for the analysis of GDP of US products: (R) Fitted linear regression model on the GDP of US products and made scatter plot for data visualization. Determined coefficient of determination and improved the model accuracy from 96.7% to 99.5% by analyzing the residual plots and Box-Cox transformation.

Link: https://github.com/shahrukh-ak/Analysis-of-GDP-of-US-Products

Application of Logistic Regression, Neural network and Deep neural network to MNIST dataset: (Python) Applied different techniques like logistic regression, neural network and deep neural network to MNIST dataset to achieve accuracy of 91%,96.8% and 97% respectively. The greatest accuracy is achieved from deep neural network. Link: https://github.com/shahrukh-ak/Deep_learning_Projects/tree/master/MNIST

Implementation of Logistic Regression, neural network & 2-layer,3-layer,4-layer deep neural Network for commercial Detection: (Python)

Methods like logistic regression, neural network, 2-layer deep neural network, 3-layer deep neural network and 4- layer deep neural network have been applied to one of the most famous dataset “TIMESNOW” for the detection of TV commercial. The best results were obtained from neural network giving an accuracy and precision of 100%. Link: https://github.com/shahrukh-ak/Deep_learning_Projects/tree/master/TV%20Channel%20Commercial%20Detection

Comparison of PSO and KMPSO techniques to analyze various datasets: (Python) Used PSO and KMPSO techniques to analyze different datasets like breast cancer, iris etc. KMPSO implementation was successful with promising results showing 90.5% accuracy. Link: https://github.com/shahrukh-ak/Big-data-Project

Implementation of auto encoder through deep learning: (Python) Deploying auto encoder on MNIST data to predict different flower species. Achieved an accuracy of about 99% in the results.

Link: https://github.com/shahrukh-ak/Deep_learning_Projects/tree/master/Auto-Encoder

Application of Logistic Regression, neural network and deep neural networks on iris using TensorFlow: (Python) Various methods have been deployed to compare the results such as logistic regression, neural network and deep neural network on iris data. The best results were obtained from neural network showing 96% accuracy. Link: https://github.com/shahrukh-ak/Deep_learning_Projects/tree/master/iris

Sci-kit learn k means algorithm to check water quality: (Python) Cleaned the data and then applied Sci-kit learn k means algorithm to perform clustering classifying the quality of water as GOOD, FAIR, BAD. Used the model for water quality prediction and achieved an accuracy of 89.6% in the outcome.

Link: https://github.com/shahrukh-ak/To-predict-water-quality-and-check-if-it-is-useable-or-not

Linear regression to find dependency of the plasma level: (R) Done data modelling using linear regression to find out dependency of plasma level of a polyamine on age. Used box transformation to improve coefficient of determination from 0.75 to 0.86 and thus our model is improved. Link : https://github.com/shahrukh-ak/Dependancy-of-plasma-level-of-a-polyamine-on-age

Applying CNN on color image dataset: (Python)

To classify scenery of images consisting of 4 classes i.e. sea, mountain, forest and building by applying CNN achieving an accuracy of 76% in the results.

Link: https://github.com/shahrukh-ak/Applying-convolutional-neural-netwrok-on-color-image-dataset Dec/2017-Nov/2018



Contact this candidate