EDUCATION University of Illinois AUG 2018 – MAY 2020 Master of Science in Mathematics and Computer Science, GPA: 3.92 Coursework: Machine Learning, Data Mining, Data Visualization, No SQL Databases, Big Data Analytics, Statistics Manipal University Jaipur AUG 2014 - MAY 2018
Bachelor of Technology in Computer Science, GPA: 8.43 SKILLS Programming Languages: R, Python, Scala, Spark, SAS Data Management: Data Modelling, Data Mining, Web Scrapping, Data Pipelining Database: MySQL, Microsoft SQL Server, Oracle, NoSQL Databases – Cassandra, MongoDB, HBase, Riak Software: Tableau, MATLAB, MS Office, Spark, Docker, Power BI, Google Cloud Platform, GSuite, IBM Watson Local, SSIS Data Science Technologies: Hadoop – Map Reduce, PySpark, Tensorflow, Scikit Learn, Hive, Pandas, Numpy, Scikit learn, NLTK, PyEntitymatching, AWS Redshift, AWS SageMaker, AWS tools and techniques CERTIFICATION AWS Certified Cloud Practitioner, AWS Data Science Professional, IBM
Machine Learning, Stanford, Coursera
Statistical Learning, Stanford Online
EXPERIENCE Data Analyst Intern, Illinois Department of Innovation and Technology NOV 2018 – MAY 2020
• Automated database processes for Customer Relations, Survey Research, Security teams by building R or Python scripts reducing elapsed time by 90% and eliminating manual effort.
• Using Python library for Natural Language Processing in IBM Watson Local, improved the accuracy for text extraction and classification from PDFs, from 60% to 78%.
• Implemented a classification model to reduce growing number of lead poisoning cases across Illinois by analyzing key trends at zip code level.
• Performed predictive modeling using regression model to analyze time metrics for open work orders for a given category. Research Intern, GfK Pvt. Ltd. SUMMER - 2017
• Worked with a team of researchers to perform cluster classification on survey results collected by various organizations for product recommendations.
• Worked with Power Query, Pivot Tables, Data Manipulation on survey data in MS Excel. VOLUNTEER Volunteer Data Analyst, Morning Star Foundation JUL 2020 - Present
• Build data architecture for the Organization using AWS cloud services.
• Use Google Cloud Platform technologies to store raw data.
• Use Google Data Studio to analyze and visualize data collected from raw sources. Volunteer Data Analyst, NoSchoolViolence.org JUL 2020 – Present
• Use Natural Language Processing to build an application linking child’s behavior to patterns in school violence.
• Use Python libraries like BERT, NLTK to build the data model, collect and refine raw data. PROJECTS Automating Customer support table for Power BI visualization (R, SSIS, SQL, T- SQL, SSMS)
• Built R script to apply date time calculations on database table on server side in SQL Server Management Studio.
• Reduced time from 2 hours to 70 seconds in performing calculations and eliminated the need for creation of a new master table with millions of records at each execution.
Facial Attributes Recognition and Gender Identification (Python, Tensor flow, Seaborn)
• Worked on facial dataset to identify facial attributes like smiling face, angry face using deep learning techniques. Identify hair texture of a person. Further, generate a video for a specific person.
• Used Inception and MobileNet neural networks to train and classify the images with an accuracy of 91%. Storage Data Unification (Python, SQL, APIs, Web Scrapping)
• Python script created a mapping table to map names of various fields to the standard nomenclature, providing a hierarchical structure to the table. Using the mapping table, data could be drilled down to specific detail. Face Recognition in MATLAB (MATLAB)
• Performed face recognition on streaming data from videos using Viola Jones Algorithm, increasing the accuracy to 94%. Predict User Rating for medicine based on reviews (Hadoop, Spark, Regression, Spark ML)
• Performed sentiment analysis using text mining on review text in Spark.
• Performed Regression to predict medicine rating using One-Hot encoding technique on review text.
• Tuned Regression model hyper parameters to obtain an accuracy of over 85%.