Sign in

Data Project Engineer

Santa Clara, California, United States
January 28, 2017

Contact this candidate

Naga Venkateshwarlu Yadav Dokku

**** ********** **,***#*,Mountian View, CA -95051

818-***-**** • LinkedIn


DePaul University, Chicago, IL

Master of Science (Predictive Analytics) 12/2016

Indian School of Mines, Dhanbad, India

Integrated Master of Science (Mathematics & Computing) 05/2011


Data science background with data engineering skill sets and passionate about learning, innovation, analysis, and problem solving.


Wipro Technologies Pvt Ltd, Bangalore, India

Project Engineer 10/2011 – 12/2014

Work Summary:

Analyze business rules for PLM data migration, configure various agile software components to meet specifications for Data Migration Define and execute data cleansing on source data, including Extract-Transform- Load (ETL) methods and procedures.

Migrating the needed data from RDBMS in to HDFS using Sqoop and importing various formats of flat files into HDFS and later analyzed the imported data using Hadoop Components.

Technologies: SQL, Hadoop Map Reduce, Apache Sqoop

Key Academic Projects

Neural Networks, Machine Learning, and Natural Language Processing

1.Sentiment Analysis from Movie Reviews with Neural Networks in Python using the Keras deep learning library.

2.Predicting Cab Booking Cancellations. (Python, Machine Learning).

3.A data mining approach to predicting Yelp reviews ratings (Python, NPL).

Big Data

4.Comparing the performance of different AWS Hadoop cluster configurations using ‘311’service requests for the city of New York from 2010 to Present as sample data (Hadoop, MapReduce, Pig, Hive).

Data Mining and Analysis

5.Ensemble learners using the bagging method: Data-mining approach to defining health professional shortage areas (R, Data Mining).

6.Forecasting the Gross Domestic Product using Linear and ARIMA models. (R, Data Analysis).

7.Performance evaluation of different classification and regression techniques. (Python, Data Analysis).


Programming: Python, R, SQL (Familiarity: Scala, SAS)

Tools: Spyder, IPython Notebook/Jupyter, Spark Notebook, Zeppelin notebook (Familiarity: Git, Docker)

Cloud: AWS/EMR/EC2/S3 (also direct-Hadoop-EC2)

Big Data: Spark, Hadoop, Hive, Pig, Sqoop, (Familiarity: Cloudera Search)

Deep Learning: Keras (cNN, rNN, LSTM, etc.), TensorFlow

SQL/NoSQL: Hive, Spark SQL (Familiarity: Cassandra, Elasticsearch)

Domain: Big Data, Data Mining, Data Analytics, Machine Learning, Natural Language Processing

Contact this candidate