+1-682-***-**** email@example.com Arlington, Texas
The University of Texas at Arlington, TX USA August 2018 - May 2020
Master of Science, Major: Information Systems, GPA 3.67/4.0
Relevant Coursework: Big Data, Data Mining, Database Management, Data Science, Enterprise Resource Planning, Project Management
Shri Shankaracharya Institute of Engineering & Technology, India August 2013- July 2017
Bachelor of Engineering, Major: Computer Science, GPA 8.5/10
Programming languages: Python, SQL, R
Competencies: Machine Learning, Natural Language Processing, Agile
Tools: SAS, Tableau, Orange, MS Excel, PySpark, TensorFlow, Hadoop Ecosystem, SAP, HTML, VBA
36Central, Raipur, India August 2017- July 2018
• Transformed and cleansed unstructured data using Python to conform to the business requirements for data-driven decision making.
• Trained machine learning model to predict Click-through rate (CTR) of the users for targeted marketing with 86% accuracy resulting in a 20% increase in revenue of the firm.
• Evaluated trends, correlations, and patterns in large data sets and maintained the database.
• Designed grammar rules to extract use case defined aspect words; Chunked parsed trees using Python NLTK.
• Implemented Word2Vector (Skip-gram) model in TensorFlow, classified keywords
• Developed Recurrent Neural Network algorithm for sentiment analysis of parsed review, achieved accuracy of 82%
Meridian Studies, Raipur, India January 2017 – June 2017
Data Analyst Intern
• Worked as a developer and developed solutions that promoted workflow activities, performed unit testing and delivered the projects
• Developed Hive tables to store the processed data in tabular format and mapped the data resided in the HDFS
• Analyzed the metadata and loaded the data in Hadoop distributed file system from the local file system
• Created visualization in python using matplotlib and seaborn
Prediction of Cab Booking Cancellation
• Built a predictive model that successfully predicted the customers who are most likely to cancel the cab booking after making the reservation. The model successfully predicted with an accuracy of 81.53% on test datasets.
• Used the BeautifulSoup package to parse the HTML document of the website for data extraction.
• Used visualization tools like SAS, and Excel to visualize the dataset Utilized pivot tables.
• Implement models like Decision Tree, Logistic Regression and Neural Networks to get better productivity.
Big Data – Identifying customer complaints from online reviews of cell phones
• Processed 55 GB of text data; scored the severity of 20 most frequent complaints using Vader sentiment analysis.
• Implemented Topic Modeling (LDA) on DataBricks (PySpark) to identify the most frequent customer complaints.
Titanic-Machine Learning from Disaster
• Applied the tools of machine learning to predict which passengers survived the tragedy of the sinking of RMS Titanic
• Used data analysis techniques & implemented models like KNN, Neural Networks, SVM, Random Forest, Naïve Bayes using Python. The model was able to predict with an accuracy of 65% before data cleaning and filling the missing values.
• Filled missing values in the dataset that increased accuracy to 85%.
TMDB Box office Prediction
• A Metadata on over 7000 past films from the movie database, containing 20 columns with different percentages of missing values.
• For data collection, data cleaning, data processing, and interpreting data used Excel & cleaning functions in python.
• Scrapped the missing data from the internet and transformed data into dummy variables.
• Kaggle submission score was ranked among the top 2% of the overall submissions.