Data Engineer

Location:

Tempe, AZ

Salary:

100000

Posted:

March 11, 2019

Contact this candidate

Resume:

Venkata Ravi Teja Reddy Yanamala (RAVI)

973-***-**** • ********@***.*** • linkedin.com/in/ravitejareddyyv/• github.com/nameisrtr SUMMARY

M.S. Computer Engineer with experience in building machine learning models and large scale data processing, seeking full time position from JUN 10, 2019.

EDUCATION

M.S., Computer Engineering; Graduating May 2019

Arizona State University, Tempe, AZ 3.81 GPA

B.E., Electronics and Communication Engineering; Graduated May 2017 Birla Institute of Technology, Mesra, Ranchi 7.66 GPA RELEVANT COURSEWORK

Foundations of Algorithms, Statistical Machine Learning, Artificial Intelligence, Deep Learning Media Processing & Understanding, Data mining, Distributed Database Systems, Information Theory. TECHNICAL SKILLS

Programming: Python (past experience in JAVA, MATLAB, C) Machine Learning: Regression, Naïve Bayes, SVM, Random Forests, K-Means, Feature engineering, dimensionality reduction Big Data: Hadoop, Spark, MapReduce

Packages: Sklearn, PyTorch, FastAi, NumPy, Pandas, Matplotlib Others: Git, MySQL, AWS

PROFESSIONAL EXPERIENCE

Data Collection intern at CYR3CON Dec 2018 – on going

• Developed quality customized crawlers and parsers for data collection from the dark web.

• Maintained and Cleaned the data in a MongoDB database.

• Gain insights from data and suggest clients of improvements to their security.

• Responsible for Troubleshooting any script related issues in the data pipeline. ACADEMIC PROJECTS

Predicting User Retention on Stack Overflow Fall 2018

• Built features, that best capture the signals that a user is more likely to leave the forum and We compared the performance of SVM, Decision Trees using these features. Decision Trees had the highest accuracy of 71.4%

• Predicted the question answer pair importance in terms of the information added to the stack overflow website and of the features we constructed, Length of highest scoring answer was the most dominant feature. Geospatial Hotspot Detection Spring 2018

• Deployed a Hadoop cluster of 3 nodes on AWS and executed Spatial Range Join and KNN queries with the help of GeoSpark JTS library.

• Developed user defined functions on SparkSQL for Range and Distance queries.

• Computed Spatial statistic (Getis-Ord Statistic) with the help of SparkSQL on spatial-temporal data (NYC yellow cabs) for locating hotspots for diverting the resources as per required. Life-long planning for path finding problem Fall 2018

• Improved path finding problem using lifelong A* search and D*-lite algorithms, assuming that Pacman can only observe the immediately adjacent cells.

• Compared the performance of the above improvement to a baseline approach using A* that simply re-plan every time when the current plan would lead to a collision. Movie recommendations using MovieLens dataset Fall 2017

• Recommended the user more movies to watch using models based on SVD, LDA, tensor decomposition.

• Improved the accuracy of recommendations with probabilistic relevance feedback.

• Built similar movie search using a KNN based and Improved the search with Locality Sensitive Hashing (LSH), which creates an in-memory index structure containing the given set of movies for efficient retrieval.

• Implemented a Movie classifier using a KNN based classification algorithm, a decision tree based classification algorithm, and multi-class SVM based classification algorithm.

Contact this candidate