Sign in

Python Data Analyst

Atlanta, GA
August 01, 2020

Contact this candidate


Alok Sheth

**** ******* **, *******- **, Phone No-469-***-****,,,


A result oriented professional offering 2+ years of experience in Business Intelligence/Analytics and programming, data modeling, statistical analysis, data visualization, and machine learning techniques. EDUCATION

The University of Texas at Dallas Dec 2019

M.S. in Business Analytics (Specialization in Data Science and Analytics) 3.51 Post-Graduation Diploma in Big Data and Analytics Oct 2016 - Mar 2017 Imarticus Learning Pvt. Ltd.

University of Mumbai Aug 2016

B.E. in Computers



Programming - SQL, Python, R, SAS, HiveSQL, Spark, PySpark, NoSQL Analytical Tools - Jupyter, AWS, GCP, R-studio, MySQL, MSSQL, Oracle, Zeppelin, Hadoop, MS Excel Visualization Tools - Python (Matplot lib, Seaborn, bokeh), R (ggplot), Tableau, Power BI Statistical Packages - Scikit-learn, Numpy, Pandas, Matplotlib, Seaborn, Keras, NLTK, TensorFlow Machine Learning - Regression, Classification, Clustering, NLP, LSTM, Random Forest, SVM, XGBoost, K-Means, HDBScan, TensorFlow, Spark MLlib

Statistical Methods - Regression, Hypothesis Testing, ANOVA, Confidence Intervals, PCA, Time series Certification - AWS Data Analytics Fundamental, SQL (Standard and window functions), Data Science with python


Nebula Partners, Atlanta- GA, USA March 2020 - Present Data Scientist

• Effectively develop, and collaborate with others to develop, database-centric solutions within a distributed team of SQL

• Building and productionizing predictive models on large datasets by utilizing advanced statistical modeling, machine learning, or other data mining techniques using python

• Performing DevOps tasks in publishing and deploying ML models in live production CPS ENERGY, San Antonio- TX, USA June 2019 – Dec 2019 Enterprise Advanced Analytics

• Used SQL queries in a Hadoop environment to extract large quantities of data

• Cleaned and preprocessed dataset in zeppelin and imported to jupyter for model building

• Developed time series XGBoost model using Python for forecasting number of meters-out caused by weather

• Minimized RMSE from 532 to 144 by using LSTM model and tuning hyper-parameters using hyperopt

• Built keras multiple-output model for predicting 6 different meter outages variables at once

• Clustered faults in voltages using HDBScan clustering based on the stopped time of the sag and swell events Simple and Real Analytics Pvt Ltd, Mumbai, India Oct 2016 - Oct 2017 Data Analyst

• Evaluating project efficiency across multiple business departments by conducting POCs using sample data

• Developing and implementing new business processes and strategies to change vague defined strategies into actionable ones

• Analyzing and forecasting sales & consumer behavior trends and suggesting marketing activities based on results

• Analyzed and processed complex datasets using advanced querying, visualization and analytics tools

• Performed pre-processing on a dataset with 5 million rows using python to improve quality of data by 37%

• Improved understanding of data set by creating statistical graphs using Python Matplotlib and Seaborn

• Performed complex operations on MySQL database to reduce processing speeds by 27%

• Increased ROI by 23% for sales department by applying K-means algorithm and doing EDA ACADEMIC PROJECT

NLP for finding similarity in the movies: Oct 2019 - Nov 2019

• Imported dataset in jupyter and converted text into tokens by removing punctuations, stopwords, and spaces

• Converted tokens into root words using snowball-stemmer and applied Tf-idf vectorizer to convert into vectors

• Used K-means clustering to group together movies with similar plot and calculated similarity distance NBA Player of the week Analysis using Hadoop, Python, and Spark: Nov 2018 - Dec 2018

• Imported data into Hadoop and performed feature selection and engineering to improve accuracy by 30%

• Built a K-means clustering model using Spark MLlib to segment players on attributes to achieve 74% accuracy

• Visualized the attributes using Tableau to understand the parameters of players receiving the award Predictive Analytics and Developing Marketing Insights Using SAS: Aug 2018 - Dec 2018

• Segmented customers based on total purchase amount using k-means clustering into high-value and low- value customers and recommended what group to be focused on to increase sales

• Predicted business opportunities for next month by performing Time Series Analysis on sales of the brand

• Interpreted customer preferences based on purchase history and historical data using the generalized logit model to improve product quality and services

Integrated Analysis Using Tableau and R: Jan 2018 - Mar 2018

• Performed data retrieval and pre-processing on dataset by treating missing values and encoding variables

• Performed K-means clustering using Tableau-R integration to segment victims based on their survival status

• Visualized the trend and summarized the insights to road accidents caused in the USA by creating dashboards LEADERSHIP & ORGANIZATION: Jan 2018 – Dec 2019

• Intelligence Analytics Society, UT Dallas – Assisted with seminars to invite industry leaders to learn about analytics

Contact this candidate