Data Analyst Python

Location:

Arlington, TX

Posted:

May 17, 2020

Contact this candidate

Resume:

Aayush Baid

+1-682-***-**** ******.******@*****.*** Arlington, Texas

EDUCATION

The University of Texas at Arlington, TX USA August 2018 - May 2020

Master of Science, Major: Information Systems, GPA 3.67/4.0

Relevant Coursework: Big Data, Data Mining, Database Management, Data Science, Enterprise Resource Planning, Project Management

Shri Shankaracharya Institute of Engineering & Technology, India August 2013- July 2017

Bachelor of Engineering, Major: Computer Science, GPA 8.5/10

TECHNICAL SKILLS

Programming languages: Python, SQL, R

Competencies: Machine Learning, Natural Language Processing, Agile

Tools: SAS, Tableau, Orange, MS Excel, PySpark, TensorFlow, Hadoop Ecosystem, SAP, HTML, VBA

PROFESSIONAL EXPERIENCE

36Central, Raipur, India August 2017- July 2018

Data Analyst

• Transformed and cleansed unstructured data using Python to conform to the business requirements for data-driven decision making.

• Trained machine learning model to predict Click-through rate (CTR) of the users for targeted marketing with 86% accuracy resulting in a 20% increase in revenue of the firm.

• Evaluated trends, correlations, and patterns in large data sets and maintained the database.

• Designed grammar rules to extract use case defined aspect words; Chunked parsed trees using Python NLTK.

• Implemented Word2Vector (Skip-gram) model in TensorFlow, classified keywords

• Developed Recurrent Neural Network algorithm for sentiment analysis of parsed review, achieved accuracy of 82%

Meridian Studies, Raipur, India January 2017 – June 2017

Data Analyst Intern

• Worked as a developer and developed solutions that promoted workflow activities, performed unit testing and delivered the projects

• Developed Hive tables to store the processed data in tabular format and mapped the data resided in the HDFS

• Analyzed the metadata and loaded the data in Hadoop distributed file system from the local file system

• Created visualization in python using matplotlib and seaborn

PROJECTS

Prediction of Cab Booking Cancellation

• Built a predictive model that successfully predicted the customers who are most likely to cancel the cab booking after making the reservation. The model successfully predicted with an accuracy of 81.53% on test datasets.

• Used the BeautifulSoup package to parse the HTML document of the website for data extraction.

• Used visualization tools like SAS, and Excel to visualize the dataset Utilized pivot tables.

• Implement models like Decision Tree, Logistic Regression and Neural Networks to get better productivity.

Big Data – Identifying customer complaints from online reviews of cell phones

• Processed 55 GB of text data; scored the severity of 20 most frequent complaints using Vader sentiment analysis.

• Implemented Topic Modeling (LDA) on DataBricks (PySpark) to identify the most frequent customer complaints.

Titanic-Machine Learning from Disaster

• Applied the tools of machine learning to predict which passengers survived the tragedy of the sinking of RMS Titanic

• Used data analysis techniques & implemented models like KNN, Neural Networks, SVM, Random Forest, Naïve Bayes using Python. The model was able to predict with an accuracy of 65% before data cleaning and filling the missing values.

• Filled missing values in the dataset that increased accuracy to 85%.

TMDB Box office Prediction

• A Metadata on over 7000 past films from the movie database, containing 20 columns with different percentages of missing values.

• For data collection, data cleaning, data processing, and interpreting data used Excel & cleaning functions in python.

• Scrapped the missing data from the internet and transformed data into dummy variables.

• Kaggle submission score was ranked among the top 2% of the overall submissions.

Contact this candidate