Data Software Engineer

Location:

San Ramon, CA

Posted:

January 17, 2018

Contact this candidate

Resume:

Chandan Uppuluri

Email: *******.********@*****.*** Phone: 682-***-****

Web: http://chandan-u.github.io/ GitHub: github.com/chandan-u LinkedIn: linkedin.com/in/chandanu

EDUCATION

Masters in Data Science, INDIANA UNIVERSITY, Bloomington Dec 2017

(Courses: Statistics, Machine Learning, Data mining, Information Visualization, Artificial Intelligence, Sentiment Analysis, Computational Linguistics, Deep Learning, Spark, Tableau)

Bachelor of Technology in Computer Science, GITAM UNIVERSITY, INDIA Apr 2012

SKILLS SUMMARY

Statistical Methods: time series, hypothesis testing and confidence intervals, principal component analysis and dimensionality reduction, A/B tests, Exploratory Analysis

Software and Programming Languages: Python (Tensorflow, keras, scikit-learn, numpy, nltk, pandas, genism), R (ggplot2, plotly, tidyR), C++, Java, SQL, Hadoop (Hive, Spark, YARN, Apache Airflow, HBASE), Linux, LaTeX, git, timbl, maltparser, StanfordNLP

Web/Visualization: Shiny Web app, sci2, ggplot2, plotly, Django, HTML5, CSS3, RESTful API, JavaScript, D3.js

WORK EXPERIENCE

World Well Being Project, UNIVERSITY OF PENNSYLVANIA May 2016 - Aug 2016

Data Science Research Assistant (Modeling, NLP, Data extraction)

Pioneered techniques of measuring the psychological/physical well-being of people by performing analytics on social media.

Built models such as LSTM’s to predict emotions from text (using n-grams, word embedding techniques as features

Gathered required structured/unstructured data from various static and streaming data sources to create data lakes.

Cleaned and Analyzed unstructured data (English & Arabic) for labelling on Amazon Mechanical Turk and model building.

School of Informatics and Computing, INDIANA UNIVERSITY Jun 2016 - Sep 2016

Instructor (Natural Language Processing in Python)

•Developed content for NLP in python (INFO-I590) as part of Data Science online courses.

HCL Technologies, India Jul 2012 - Sep 2015

Software Engineer (Data Visualization, Data Engineering)

•Developed dashboards with Sqoop, Hadoop, Hive, Django, HTML, CSS, JQuery to provide insights to client (T-Mobile) (POC)

•Enabled backend integration of T-Mobile MetroPCS merger through web services. (agile)

•Automated work flows which saved more than a hundred thousand dollars in a single quarter.

PUBLICATIONS

Building Customized Text Mining Tools via Shiny Framework: The Future of Data Visualization. (28th Modern Artificial MAICS)

DATA SCIENCE PROJECTS

Exploring Word Similarity Algorithms (nltk, gensim) (GitHub)

•Explored the disadvantages of Word2Vec in terms of scalability, concepts and disambiguation.

•Addressed the issues with alternative solutions such as LDA (Topic Modeling), PageRank etc.

Predicting Stock growth from news headlines (Tensorflow, keras, Spark, gensim) (GitHub)

•Improved accuracy by 9 percent in Predicting the growth of DJIA Stock index from top 25 news headlines.

•Derived features such as n-grams, tf-idf, word2vec, doc2vec (word-embedding) for Naive Bayes, SGD, SVM, CNN and LSTM.

•Deployed on spark to parallelize hyper parameter tuning/optimization across various feature sets.

Client project: Analysis and Visualization of Trends in Translation Studies (R, leaflet, Shiny) (live project)

•Gathered and Analyzed data from 1600+ publications and built interactive Spatial, Time Series, word clouds, topic analysis and network graphs (co authorship networks) data visualizations.

Stance detection using dependency parsing (timbl, NLTK, Maltparser, Java) (GitHub)

•Detected Stance in tweets using POS tagging, n-grams and dependency parsing techniques.

•Use of MPQA subjectivity lexicon increased the accuracy by 4 percent.

•Compared memory based learner Timbl with random forests.

Tweet Analysis using NLP (tweepy, Twitter-Streaming, nltk, plotly, jupyter, Airflow, gensim) (GitHub)

•Extracted and preprocessed twitter Streaming using Tweepy.

•Performed Analysis on words and hashtags using word-count, tf-idf, word-clouds, POS tagging and word2vec clusters etc.

•Discovered the effect of hash-tags on word2vec and word-clouds visualization.

Forecasting Housing Rental Demand (R, ggplot2, plotly, tidyR) (GitHub)

•Feature Extraction and Clustering improved accuracy from 70% to 75%.

•Built multi class classification models KNN, forests, Decision trees and XGBoost to predict housing demands in NY.

Data Pipelining and Orchestration for movie Recommendation System (Airflow, Spark, python3.5). (GitHub)

•Orchestrated End to End ETL pipeline using Apache Airflow and did data transformations using spark.

Contact this candidate