VENKATA NAVEEN BALUSU
Email: *****************@*****.*** GitHub: Naveen481
Phone: +1-346-***-**** Medium: naveenbalusu
LinkedIn: naveen481 Kaggle: naveen481
OBJECTIVE: A recent Computer Science graduate and a skilled Software Engineer with strong analytical and programming background actively looking for full-time opportunities in the fields of Machine Learning and Big Data. EDUCATION:
Master’s in Computer Science December 2017
Arizona State University, Tempe, USA GPA: 3.5/4.0
Bachelor’s in Computer Science May 2015
Vellore Institute of Technology, Vellore, India GPA: 3.6/4.0 WORK EXPERIENCE:
Big Data Intern, Clairvoyant LLC, Chandler, Arizona September 2017 – Present
• Built an Intrusion Detection System (IDS) that detects intrusions on sensitive information stored on Big Data clusters by detecting anomalies in various logs using unsupervised learning.
• IDS built will be demonstrated at Strata Data Conference 2018. Published blogs related to my research on Medium.
• Analysed the effects of various factors like connection types and security protocols on the Audit logs.
• Built various end-to-end ETL pipelines using Sqoop, Hive, Impala, Airflow, and RabbitMQ.
• Built test clusters using EC2 instances.
Research Assistant, ASU, Tempe, Arizona September 2016 – August 2017
• Worked as a Python Developer in “Cognitive Information Processing Systems Laboratory”.
• Developed a data visualization tool called “Looking Glass” to track the diffusion of online social movements.
• Built Natural Language Processing pipelines, parallelized various programs using Spark, web-based visualizations using D3.js, and performed Database Administration (PostgreSQL).
• Scraped over 150GB of social media and news data using various APIs and scrapers to train our models. RELEVANT COURSES: Statistical Machine Learning, Artificial Intelligence, Fundamentals of Statistical Learning, Data Mining, Distributed Database Systems, Semantic Web Mining, Foundations of Algorithms. KAGGLE COMPETITION:
Toxic Comment Classifier January 2018 – Present
• Implemented a multi-label classifier using a Recurrent Neural Network that classifies toxic comments into six classes with an accuracy of 98.3%.
• Used fastText word embeddings by Facebook. Trained the model on Google Compute Engine instance.
• Stood in top 13% on the public leaderboard at the time of submission. PROJECTS:
Pattern Detection in News Articles May 2017 – August 2017
• Implemented a MapReduce program which extracts text with similar meaning in documents and processed over 100,000 documents using a Spark cluster.
• Implemented co-reference resolution using Stanford CoreNLP in Java and extracted triplets using Spacy in Python.
• Co-clustered the generated stories with an accuracy of 90% using Non-Negative matrix factorization. Game playing using Deep Reinforcement Learning August 2017 – November 2017
• Built a Neural Network using TensorFlow and OpenAI that played Lunar Lander game and scored over 200 points.
• Implemented Pac-Man agents which performed well in the presence of ghosts using Particle filtering, Q-Learning, Value Iterations, Expectimax, alpha-beta minimax algorithms. Geospatial Data Analysis using Spark February 2017 – May 2017
• Identified top 50 hotspots in Manhattan area where most of the Taxi pickups have happened in January 2015 by analyzing Geospatial data from New York Taxi database by building a five-node Spark cluster.
• Implemented a MapReduce program in Scala that considers the time series data and counts the number of pickups happened in a location every hour and heap sorted them based on counts to rank the places by number of pickups. TECHNICAL SKILLS:
• Programming Languages: Python, Java, MATLAB, C++, HTML, CSS, JavaScript, Shell scripting.
• Machine Learning and Visualization: Scikit-Learn, Tensorflow, Pandas, Scipy, Numpy, NLTK, Matplotlib, Plotly.
• Big Data: Hadoop, Spark, Sqoop, Hive, Pig, Impala, Airflow, RabbitMQ, Kerberos, Ganglia. VENKATA NAVEEN BALUSU
Email: *****************@*****.*** GitHub: Naveen481 Phone: +1-346-***-**** Medium: naveenbalusu
LinkedIn: naveen481 Kaggle: naveen481
• Databases & Other tools: MySQL, PostgreSQL, Cassandra, Apache Solr, Elastic Search, Git.