Sign in

Data Engineer

Bloomington, IN
February 27, 2020

Contact this candidate


Charith Reddy Musku j +1-812-***-**** j Linkedin j Github


Indiana University Bloomington Bloomington, Indiana Master of Science in Data Science; GPA: 3.8/4 Aug 2018 - May 2020 Relevant Coursework: Machine Learning, Natural Language Processing, Deep Learning, Text Mining, Big Data, Information Retrieval Dhirubhai Ambani Institute of Information Technology Gujarat, India Bachelor of Technology in Computer Science; GPA: 3.8/4 Aug 2012 - Jan 2016 RELEVANT EXPERIENCE

Data Scientist, Intern (Marketing) Bloomington, IN Indiana Unversity - Data Analysis & Decision Support Team, VP Research Aug 2019 - Present

Student Retention: Predicting students at risk of dropping out from University using Machine learning models like Random Forest, SVM, Logistic Regression. Presented a visual analysis of results along with the factors contributing to each drop out.

Email Marketing: Decision tree analysis to replace statistical A/B testing in identifying the factors that promote open rate of an email.

Reporting: Develop Power BI, Tableau reports to interpret and visualize data for expenditure analysis across marketing campaigns. Data Scientist, Intern (B2B Procurement) Palo Alto, CA SAP Labs - Leonardo Machine Learning June 2019 - Aug 2019

Intelligent Approval: Predicting the confidence level for the approval of a purchase requisition using Random Forests, XGBoost and Neural Network. End to end process involving Data collection, analysis, training and a tree explanation module to interpret results. Machine Learning Engineer, R&D (NLP) Bangalore, India SAP Labs - Innovation Center Network Jan 2017 - Jun 2018

Text Classification: Automatic classification of incoming support requests received through E-mail/ Conversational AI. Experimented with linear classifiers and deep learning CNN with pre-trained word, sub-word and contextual embeddings (ELMo).

Named Entity Recognition: Extracting essential entities from the support request to anonymize personal information from ticket. Experimented with models like CRF, BERT, train with SpaCy, Flair with word2vec, GloVe pre-trained word embeddings.

Data Labeling: Active Learning approach for smart labeling of data, which reduced human efforts in manual labeling by 60%.

Deployment: Closely worked with Developers, to deploy models in production using Tensorflow Serving + Docker + Microservices. Software Engineer Bangalore, India

SAP Labs - Analytics Feb 2016 - Jan 2017

Dashboard Analytics: Admin Cockpit application to analyze & monitor customer tickets. Developed dashboard visualizations for Topic modeling, Sentiment analysis, Anomaly detection etc using UI5 (JS library), d3.js. RESTful Webservices using Springboot in Java. ACADEMIC PROJECTS

Automatic Speech Recognition: [Tech: Tensorflow, Python, ASR, CNN, MFCC, Speech, RNN, Speech-to-text]

An end to end neural networks approaches for an ASR system which converts speech to text. Using the features extracted from Mel-filter bank (MFCC) with a Recurrent Neural Network using CTC as loss function to deal with the silence/blank/repeat characters.

Trained over TIMIT corpus with 630 speakers consisting of 8 different dialects. Acheived a word error rate of 38%. Hybrid Restaurant Recommendation System [Tech: Hadoop, MapReduce, PySpark, Spark SQL, MLLib, Apache Parquet]

A personalized restaurant recommendation system using a hybrid of Colloborative filtering using Matrix Factorization, Content based matching using NLP (Word2Vec similarity), Social Network Analysis (Friends’ opinion) and location-based for the cold-start problem

Trained over Yelp Dataset using restaurants from Toronto and presented a Map visualization of all recommendations of all algorithms. Time Series Forecasting of Stock Prices [Tech: Finance, Time-series, Forecasting, NLP, Sentiment Analysis, Python]

A deep learning approach for stock price prediction using time-series data. Used Stacked Autoencoders for feature extraction, LSTM for prediction. Integrated text mining approach to boost the model by performing sentiment analysis of company’s news headlines.

Trained over 13 years of data downloaded from Yahoo finance for training. Predicted stock prices with a Mean squared error of 0.006 Distracted Driver Detection [Tech: Computer Vision, Image classification, CNN, Transfer learning, VGG16]

Used the transfer learning technique with VGG-16 Convolutional Neural Network as the pre-trained model, to detect and classify the driver behavior from the given images into 10 different classes like operating mobile, drinking, talking etc.

Trained the network over 24k driver images curated by the State Farm Insurance company and classified them with a log-loss of 0.22 SKILLS

Languages: Python, R, Java, Javascript, SQL, C, C++, PHP, SPARQL

Libraries: TensorFlow, PyTorch, Keras, Scikit-Learn, Numpy, Pandas, SpaCy, Gensim, Fasttext, NLTK, Matplotlib, Seaborn, Lucene, MLLib

Database & Big Data: Hadoop, Mapreduce, Apache Kafka, Spark, Elastic Search, MongoDB, Cassandra, AWS S3, RDS, MySQL, SAP HANA

Tools & Frameworks: GitHub, Jupyter Notebooks, Linux, Docker, AWS, SCP, Jira, Flask, SAP UI5, d3.js, Tableau, Power BI, Logstash, Kibana

Contact this candidate