Charith Reddy Musku
email@example.com j +1-812-***-**** j Linkedin j Github
Indiana University Bloomington Bloomington, Indiana Master of Science in Data Science; GPA: 3.8/4 Aug 2018 - May 2020 Relevant Coursework: Machine Learning, Natural Language Processing, Deep Learning, Text Mining, Big Data, Information Retrieval Dhirubhai Ambani Institute of Information Technology Gujarat, India Bachelor of Technology in Computer Science; GPA: 3.8/4 Aug 2012 - Jan 2016 RELEVANT EXPERIENCE
Data Scientist, Intern (Marketing) Bloomington, IN Indiana Unversity - Data Analysis & Decision Support Team, VP Research Aug 2019 - Present
Student Retention: Predicting students at risk of dropping out from University using Machine learning models like Random Forest, SVM, Logistic Regression. Presented a visual analysis of results along with the factors contributing to each drop out.
Email Marketing: Decision tree analysis to replace statistical A/B testing in identifying the factors that promote open rate of an email.
Reporting: Develop Power BI, Tableau reports to interpret and visualize data for expenditure analysis across marketing campaigns. Data Scientist, Intern (B2B Procurement) Palo Alto, CA SAP Labs - Leonardo Machine Learning June 2019 - Aug 2019
Intelligent Approval: Predicting the conﬁdence level for the approval of a purchase requisition using Random Forests, XGBoost and Neural Network. End to end process involving Data collection, analysis, training and a tree explanation module to interpret results. Machine Learning Engineer, R&D (NLP) Bangalore, India SAP Labs - Innovation Center Network Jan 2017 - Jun 2018
Text Classiﬁcation: Automatic classiﬁcation of incoming support requests received through E-mail/ Conversational AI. Experimented with linear classiﬁers and deep learning CNN with pre-trained word, sub-word and contextual embeddings (ELMo).
Named Entity Recognition: Extracting essential entities from the support request to anonymize personal information from ticket. Experimented with models like CRF, BERT, train with SpaCy, Flair with word2vec, GloVe pre-trained word embeddings.
Data Labeling: Active Learning approach for smart labeling of data, which reduced human efforts in manual labeling by 60%.
Deployment: Closely worked with Developers, to deploy models in production using Tensorﬂow Serving + Docker + Microservices. Software Engineer Bangalore, India
SAP Labs - Analytics Feb 2016 - Jan 2017
Dashboard Analytics: Admin Cockpit application to analyze & monitor customer tickets. Developed dashboard visualizations for Topic modeling, Sentiment analysis, Anomaly detection etc using UI5 (JS library), d3.js. RESTful Webservices using Springboot in Java. ACADEMIC PROJECTS
Automatic Speech Recognition: [Tech: Tensorﬂow, Python, ASR, CNN, MFCC, Speech, RNN, Speech-to-text]
An end to end neural networks approaches for an ASR system which converts speech to text. Using the features extracted from Mel-ﬁlter bank (MFCC) with a Recurrent Neural Network using CTC as loss function to deal with the silence/blank/repeat characters.
Trained over TIMIT corpus with 630 speakers consisting of 8 different dialects. Acheived a word error rate of 38%. Hybrid Restaurant Recommendation System [Tech: Hadoop, MapReduce, PySpark, Spark SQL, MLLib, Apache Parquet]
A personalized restaurant recommendation system using a hybrid of Colloborative ﬁltering using Matrix Factorization, Content based matching using NLP (Word2Vec similarity), Social Network Analysis (Friends’ opinion) and location-based for the cold-start problem
Trained over Yelp Dataset using restaurants from Toronto and presented a Map visualization of all recommendations of all algorithms. Time Series Forecasting of Stock Prices [Tech: Finance, Time-series, Forecasting, NLP, Sentiment Analysis, Python]
A deep learning approach for stock price prediction using time-series data. Used Stacked Autoencoders for feature extraction, LSTM for prediction. Integrated text mining approach to boost the model by performing sentiment analysis of company’s news headlines.
Trained over 13 years of data downloaded from Yahoo ﬁnance for training. Predicted stock prices with a Mean squared error of 0.006 Distracted Driver Detection [Tech: Computer Vision, Image classiﬁcation, CNN, Transfer learning, VGG16]
Used the transfer learning technique with VGG-16 Convolutional Neural Network as the pre-trained model, to detect and classify the driver behavior from the given images into 10 different classes like operating mobile, drinking, talking etc.
Trained the network over 24k driver images curated by the State Farm Insurance company and classiﬁed them with a log-loss of 0.22 SKILLS
Libraries: TensorFlow, PyTorch, Keras, Scikit-Learn, Numpy, Pandas, SpaCy, Gensim, Fasttext, NLTK, Matplotlib, Seaborn, Lucene, MLLib
Database & Big Data: Hadoop, Mapreduce, Apache Kafka, Spark, Elastic Search, MongoDB, Cassandra, AWS S3, RDS, MySQL, SAP HANA
Tools & Frameworks: GitHub, Jupyter Notebooks, Linux, Docker, AWS, SCP, Jira, Flask, SAP UI5, d3.js, Tableau, Power BI, Logstash, Kibana