Sign in

Computer Science Data

Phoenix, Arizona, United States
February 20, 2018

Contact this candidate



Email: GitHub: Naveen481

Phone: +1-346-***-**** Medium: naveenbalusu

LinkedIn: naveen481 Kaggle: naveen481

OBJECTIVE: A recent Computer Science graduate and a skilled Software Engineer with strong analytical and programming background actively looking for full-time opportunities in the fields of Machine Learning and Big Data. EDUCATION:

Master’s in Computer Science December 2017

Arizona State University, Tempe, USA GPA: 3.5/4.0

Bachelor’s in Computer Science May 2015

Vellore Institute of Technology, Vellore, India GPA: 3.6/4.0 WORK EXPERIENCE:

Big Data Intern, Clairvoyant LLC, Chandler, Arizona September 2017 – Present

• Built an Intrusion Detection System (IDS) that detects intrusions on sensitive information stored on Big Data clusters by detecting anomalies in various logs using unsupervised learning.

• IDS built will be demonstrated at Strata Data Conference 2018. Published blogs related to my research on Medium.

• Analysed the effects of various factors like connection types and security protocols on the Audit logs.

• Built various end-to-end ETL pipelines using Sqoop, Hive, Impala, Airflow, and RabbitMQ.

• Built test clusters using EC2 instances.

Research Assistant, ASU, Tempe, Arizona September 2016 – August 2017

• Worked as a Python Developer in “Cognitive Information Processing Systems Laboratory”.

• Developed a data visualization tool called “Looking Glass” to track the diffusion of online social movements.

• Built Natural Language Processing pipelines, parallelized various programs using Spark, web-based visualizations using D3.js, and performed Database Administration (PostgreSQL).

• Scraped over 150GB of social media and news data using various APIs and scrapers to train our models. RELEVANT COURSES: Statistical Machine Learning, Artificial Intelligence, Fundamentals of Statistical Learning, Data Mining, Distributed Database Systems, Semantic Web Mining, Foundations of Algorithms. KAGGLE COMPETITION:

Toxic Comment Classifier January 2018 – Present

• Implemented a multi-label classifier using a Recurrent Neural Network that classifies toxic comments into six classes with an accuracy of 98.3%.

• Used fastText word embeddings by Facebook. Trained the model on Google Compute Engine instance.

• Stood in top 13% on the public leaderboard at the time of submission. PROJECTS:

Pattern Detection in News Articles May 2017 – August 2017

• Implemented a MapReduce program which extracts text with similar meaning in documents and processed over 100,000 documents using a Spark cluster.

• Implemented co-reference resolution using Stanford CoreNLP in Java and extracted triplets using Spacy in Python.

• Co-clustered the generated stories with an accuracy of 90% using Non-Negative matrix factorization. Game playing using Deep Reinforcement Learning August 2017 – November 2017

• Built a Neural Network using TensorFlow and OpenAI that played Lunar Lander game and scored over 200 points.

• Implemented Pac-Man agents which performed well in the presence of ghosts using Particle filtering, Q-Learning, Value Iterations, Expectimax, alpha-beta minimax algorithms. Geospatial Data Analysis using Spark February 2017 – May 2017

• Identified top 50 hotspots in Manhattan area where most of the Taxi pickups have happened in January 2015 by analyzing Geospatial data from New York Taxi database by building a five-node Spark cluster.

• Implemented a MapReduce program in Scala that considers the time series data and counts the number of pickups happened in a location every hour and heap sorted them based on counts to rank the places by number of pickups. TECHNICAL SKILLS:

• Programming Languages: Python, Java, MATLAB, C++, HTML, CSS, JavaScript, Shell scripting.

• Machine Learning and Visualization: Scikit-Learn, Tensorflow, Pandas, Scipy, Numpy, NLTK, Matplotlib, Plotly.

• Big Data: Hadoop, Spark, Sqoop, Hive, Pig, Impala, Airflow, RabbitMQ, Kerberos, Ganglia. VENKATA NAVEEN BALUSU

Email: GitHub: Naveen481 Phone: +1-346-***-**** Medium: naveenbalusu

LinkedIn: naveen481 Kaggle: naveen481

• Databases & Other tools: MySQL, PostgreSQL, Cassandra, Apache Solr, Elastic Search, Git.

Contact this candidate