Surya Alla
972-***-**** *********@******.*** www.linkedin.com/in/surya-alla-800862190
EDUCATION
The University of Texas at Dallas Jan 2019 – Dec 2020
Master of Science, Computer Science (Specialization in Data Science)
Relevant Courses: Big Data Management, Machine Learning, Natural Language Processing, Design of Algorithms
Birla Institute of Science and Technology (BITS) Sept 2014 - Sept 2018
Bachelor of Engineering, Computer Science (Hons.)
SKILL SET
Programing Languages: Python, R, Java, C++, Scala, HTML, CSS, PHP, Objective-C, Swift, JavaScript, NodeJS
Big Data: Hadoop, Hive, Spark, Kafka, Logstash, MapReduce
Data Science Libraries: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, PyTorch, OpenCV
Machine/ Deep Learning: Scikit-Learn, Microsoft Cognitive toolkit, Keras, Firebase ML Kit, TensorFlow, Pytorch,
Apache Singa, Azure ML Studio
Cloud: Azure ML
SQL Databases: Oracle, MySQL, PostgreSQL
NoSQL Databases: Mongo DB, Cassandra, Couchbase
Tools: PyCharm, Xcode, Eclipse, IntelliJ IDEA, Git, Maven, Gradle, PyBuilder, AppCode,
Jenkins, Docker, Kubernetes
Data Visualization: MS Excel, PowerBI, Tableau, R Markdown, Elastic Search, Kibana
Algorithms: Linear/Logistic Regression, SVM, Ensemble Tree, Random Forest, Clustering, Gradient
Boosted Trees, Graph Theory
Interpersonal Skills: Analytical, Multitasking, Critical Thinking, Problem Solving, Time Management,
Continuous Learning
EXPERIENCE
Projects Intern
Delphi Consulting, Dubai, United Arab Emirates Jan 2018 - Jul 2018
Implemented Azure Active Directory with AD Migration
Application migration to Azure Cloud, Building VMs in Azure using Azure Resource Manager Templates to improve security for various clients
Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL Datawarehouse environment
Developed conceptual solutions & create proof-of-concepts to demonstrate viability of solutions
Implemented end-to-end data solutions (storage, integration, processing, visualization) in Azure
Implemented ETL and data movement solutions using Azure Data Factory
Developed and maintained multiple Power BI dashboards/reports
Intern
Elements WGD, Dubai, United Arab Emirates Jun 2016 - Aug 2016
Assisted in establishing and enforcing standards that will improve the ease of automating the build process and the development cycle.
Carried out root cause analysis of the problems that have been reported by users and provided resolution.
Enabled automated monitoring for one of the systems “Element Service Tracker” to proactively monitor and improve application availability.
Automated report generation using SQL as per the business user’s requirement which saved their time and helped in taking right decision on time.
ACADEMIC PROJECTS
Emotion Detection of Reddit posts
Advisor: Prof. Sanda Harabagiu
Scraped data from reddit posts to leverage NLP to derive insights about the corresponding reddit page.
Developed classifiers (LSTM neural networks) with a variety of featurization methods (word embeddings, bag-of-words, GridsearchCV) to identify sentiment with an F1 score of 0.81.
Developed intuitive dashboards and reporting module with various reports using Python's matlib library
Sentiment Analysis of COVID-19 tweets
Advisor: Prof. Latifur Khan
Crawled data from twitter using big tools Hadoop and apache spark, explored semantic (function words, readability), textual features (word embeddings) and developed classifiers (Vader Algorithm) to achieve F-1 score of 0.90.
Developed interactive dashboards using analytics and visualization tool kibana to visualize and analyze sentiment analysis output
Malaria Cells Detection using Machine Learning
Advisor: Prof. Anjum Chida
Developed a novel classifier using KNN for detecting uninfected images based on image arrays which achieved high F-1 score (0.96) with a highly imbalanced training set.
Explored models like Decision Tree, Naïve Bayes and KNN to identify the best model for classification.
Developed comprehensive dashboards and reports using visualization tool tableau to visualize and analyze the classification analysis output
Titanic: Machine Learning from Disaster [Paper]
Investigated the influence of various features on titanic dataset using principal component analysis in python
Findings show the top features to get the correct prediction, while simultaneously increasing the ability to make a correct prediction with each iteration, achieved an accuracy of 90%
Developed dashboards using visualization tool tableau to visualize and analyze the passenger survival status output
Stock Market Prediction
Advisor: Prof. Anurag Nagar
Developed a predictive model which leverages past stock market data to create a portfolio for clients which consists of stock that are worth investing in.
Validated Monte Carlo method and random seed generator method, to achieve future stock market prices
Created intelligent graphical reporting module using ggplot package in R and analysed the output generated.
Search Engine for Cricket
Advisor: Prof. Sanda Harabagiu
Crawled website data using Apache Solr and Nutuch, created document indexes, performed clustering using machine learning algorithms like Kmeans and algometric clustering as well as query expansion using java (Rocchio Algorithm) to create a cricket specific search engine
Displayed the results using GUI created with Java, RESTful APIs and a flask server to store all the data
ACHIEVEMENTS
Certifications: Developing Data Products, Statistical Inference, Regression Model, Getting Clean Data, Practical Machine Learning, R Programming, Exploratory Data Analysis, The Data Scientist’s Toolbox, Reproductive Research
Submitted a paper to UTD Computer Science department on analysis of Malware Detection using Data Mining approaches [Paper]