Charith Reddy Musku
email@example.com j +1-812-***-**** j Linkedin j Github
Indiana University Bloomington Bloomington, Indiana Master of Science in Data Science; GPA: 3.9/4 Aug 2018 - May 2020 Relevant Coursework: Machine Learning, Natural Language Processing, Deep Learning, Text Mining, Big Data, Information Retrieval Dhirubhai Ambani Institute of Information Technology Gujarat, India Bachelor of Technology in Computer Science; GPA: 3.8/4 Aug 2012 - Jan 2016 EXPERIENCE
Data Scientist, Intern Palo Alto, CA
SAP - Leonardo Machine Learning Jun 2019 - Aug 2019
Procurement Fraud: A Fraud Monitoring workﬂow to predict the risk score of a procurement request using Random Forests. End to end process involving Data collection, cleaning, analysis, training and an explanation module to interpret and visualize the results. Also experimented with a Deep Learning Autoencoder model to identify fraud patterns. Software Developer Bangalore, India
SAP Labs India Feb 2016 - Jun 2018
Integrated Kafka with ELK Stack (Elasticsearch, Logstash, Kibana) for real-time log analytics (performance monitoring & alerting).
Service Ticket Intelligence - R&D: [Tech: Python, SpaCy, Gensim, NLP, Docker, Flask, REST, Text Classiﬁcation, NER]
- Developed & exposed ML models as RESTful API microservices, hosted from within Docker containers.
Experimented with then state-of-the-art deep learning model like char-level CNN for automatic classiﬁcation of new support tickets
Recommending KB articles to users using the solved ticket history using NLP techniques(LSA/LDA/Doc2vec/Cosine similarity).
De-identiﬁcation (hiding personal information like Name/ID) of support tickets using a custom Named Entity Recognition model. ACADEMIC PROJECTS
Question Answering over Bio-Medical Text: [Tech: Tensorﬂow, Python, QA, CNN, GLoVe, Word Embeddings, Bi-LSTM]
Developed an Automatic Question Answering system using Deep learning architecture called Bi-directional Attention Flow (BiDAF). Used a combination of GLoVe word embeddings, char-level CNN for special words & contextual embeddings with an attention layer.
Trained over BioASQ data with 30k QA pairs & achieved an F1 score 60.34. Also tried BioBERT (BERT pretrained on BioMedical data) Automatic Speech Recognition: [Tech: Tensorﬂow, Python, ASR, CNN, MFCC, Speech, RNN, Speech-to-text]
An end to end neural networks approaches for an ASR system which converts speech to text. Using the features extracted from Mel-ﬁlter bank (MFCC) with a Recurrent Neural Network using CTC as loss function to deal with the silence/blank/repeat characters.
Trained over TIMIT corpus with 630 speakers consisting of 8 different dialects. Acheived a word error rate of 38%. Hybrid Restaurant Recommendation System [Tech: AWS, S3, RDS, EC2, Kafka, PySpark, Flask, Python, Java, ZooKeeper]
A personalized restaurant recommendation system using a hybrid of Colloborative ﬁltering using Matrix Factorization, Content based matching using NLP (Word2Vec similarity), Social Network Analysis (Friends’ opinion) and location-based for the cold-start problem
A Real-time recommendation generator (Databricks + Spark + AWS), Kafka as stream processor & a Flask powered web page. Stock Price Prediction using Time Series data [Tech: Finance, Time-series, Forecasting, NLP, Sentiment Analysis, Python]
A deep learning approach for stock price prediction using time-series data. Used Stacked Autoencoders for feature extraction, LSTM for prediction. Integrated text mining approach to boost the model by performing sentiment analysis of company’s news headlines.
Trained over 13 years of data downloaded from Yahoo ﬁnance for training. Predicted stock prices with a Mean squared error of 0.006 Distracted Driver Detection [Tech: Computer Vision, Image classiﬁcation, CNN, Transfer learning, VGG16]
Used the transfer learning technique with VGG-16 Convolutional Neural Network as the pre-trained model, to detect and classify the driver behavior from the given images into 10 different classes like operating mobile, drinking, talking etc.
Trained the network over 24k driver images curated by the State Farm Insurance company and classiﬁed them with a log-loss of 0.22 SKILLS
Libraries: TensorFlow, PyTorch, Keras, Scikit-Learn, Numpy, Pandas, SpaCy, Gensim, Fasttext, CoreNLP, NLTK, Matplotlib, Seaborn, Lucene
Database & Big Data: Hadoop, Apache Kafka, Spark, Elastic Search, MongoDB, AWS S3, RDS, MySQL, PostgreSQL, Neo4J, HANA
Tools & Frameworks: GitHub, Docker, AWS, SCP, EC2, Linux, Jira, Sprinboot, MVC, Flask, REST, Logstash, Kibana, d3.js, Tableau