Post Job Free
Sign in

Data Scientist

Location:
Wylie, TX
Posted:
December 13, 2020

Contact this candidate

Resume:

Surya Alla

972-***-**** *********@******.*** www.linkedin.com/in/surya-alla-800862190

EDUCATION

The University of Texas at Dallas Jan 2019 – Dec 2020

Master of Science, Computer Science (Specialization in Data Science)

Relevant Courses: Big Data Management, Machine Learning, Natural Language Processing, Design of Algorithms

Birla Institute of Science and Technology (BITS) Sept 2014 - Sept 2018

Bachelor of Engineering, Computer Science (Hons.)

SKILL SET

Programing Languages: Python, R, Java, C++, Scala, HTML, CSS, PHP, Objective-C, Swift, JavaScript, NodeJS

Big Data: Hadoop, Hive, Spark, Kafka, Logstash, MapReduce

Data Science Libraries: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, PyTorch, OpenCV

Machine/ Deep Learning: Scikit-Learn, Microsoft Cognitive toolkit, Keras, Firebase ML Kit, TensorFlow, Pytorch,

Apache Singa, Azure ML Studio

Cloud: Azure ML

SQL Databases: Oracle, MySQL, PostgreSQL

NoSQL Databases: Mongo DB, Cassandra, Couchbase

Tools: PyCharm, Xcode, Eclipse, IntelliJ IDEA, Git, Maven, Gradle, PyBuilder, AppCode,

Jenkins, Docker, Kubernetes

Data Visualization: MS Excel, PowerBI, Tableau, R Markdown, Elastic Search, Kibana

Algorithms: Linear/Logistic Regression, SVM, Ensemble Tree, Random Forest, Clustering, Gradient

Boosted Trees, Graph Theory

Interpersonal Skills: Analytical, Multitasking, Critical Thinking, Problem Solving, Time Management,

Continuous Learning

EXPERIENCE

Projects Intern

Delphi Consulting, Dubai, United Arab Emirates Jan 2018 - Jul 2018

Implemented Azure Active Directory with AD Migration

Application migration to Azure Cloud, Building VMs in Azure using Azure Resource Manager Templates to improve security for various clients

Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL Datawarehouse environment

Developed conceptual solutions & create proof-of-concepts to demonstrate viability of solutions

Implemented end-to-end data solutions (storage, integration, processing, visualization) in Azure

Implemented ETL and data movement solutions using Azure Data Factory

Developed and maintained multiple Power BI dashboards/reports

Intern

Elements WGD, Dubai, United Arab Emirates Jun 2016 - Aug 2016

Assisted in establishing and enforcing standards that will improve the ease of automating the build process and the development cycle.

Carried out root cause analysis of the problems that have been reported by users and provided resolution.

Enabled automated monitoring for one of the systems “Element Service Tracker” to proactively monitor and improve application availability.

Automated report generation using SQL as per the business user’s requirement which saved their time and helped in taking right decision on time.

ACADEMIC PROJECTS

Emotion Detection of Reddit posts

Advisor: Prof. Sanda Harabagiu

Scraped data from reddit posts to leverage NLP to derive insights about the corresponding reddit page.

Developed classifiers (LSTM neural networks) with a variety of featurization methods (word embeddings, bag-of-words, GridsearchCV) to identify sentiment with an F1 score of 0.81.

Developed intuitive dashboards and reporting module with various reports using Python's matlib library

Sentiment Analysis of COVID-19 tweets

Advisor: Prof. Latifur Khan

Crawled data from twitter using big tools Hadoop and apache spark, explored semantic (function words, readability), textual features (word embeddings) and developed classifiers (Vader Algorithm) to achieve F-1 score of 0.90.

Developed interactive dashboards using analytics and visualization tool kibana to visualize and analyze sentiment analysis output

Malaria Cells Detection using Machine Learning

Advisor: Prof. Anjum Chida

Developed a novel classifier using KNN for detecting uninfected images based on image arrays which achieved high F-1 score (0.96) with a highly imbalanced training set.

Explored models like Decision Tree, Naïve Bayes and KNN to identify the best model for classification.

Developed comprehensive dashboards and reports using visualization tool tableau to visualize and analyze the classification analysis output

Titanic: Machine Learning from Disaster [Paper]

Investigated the influence of various features on titanic dataset using principal component analysis in python

Findings show the top features to get the correct prediction, while simultaneously increasing the ability to make a correct prediction with each iteration, achieved an accuracy of 90%

Developed dashboards using visualization tool tableau to visualize and analyze the passenger survival status output

Stock Market Prediction

Advisor: Prof. Anurag Nagar

Developed a predictive model which leverages past stock market data to create a portfolio for clients which consists of stock that are worth investing in.

Validated Monte Carlo method and random seed generator method, to achieve future stock market prices

Created intelligent graphical reporting module using ggplot package in R and analysed the output generated.

Search Engine for Cricket

Advisor: Prof. Sanda Harabagiu

Crawled website data using Apache Solr and Nutuch, created document indexes, performed clustering using machine learning algorithms like Kmeans and algometric clustering as well as query expansion using java (Rocchio Algorithm) to create a cricket specific search engine

Displayed the results using GUI created with Java, RESTful APIs and a flask server to store all the data

ACHIEVEMENTS

Certifications: Developing Data Products, Statistical Inference, Regression Model, Getting Clean Data, Practical Machine Learning, R Programming, Exploratory Data Analysis, The Data Scientist’s Toolbox, Reproductive Research

Submitted a paper to UTD Computer Science department on analysis of Malware Detection using Data Mining approaches [Paper]



Contact this candidate