Big Data Developer

Location:

San Jose, CA

Posted:

April 04, 2018

Contact this candidate

Resume:

Overview

*+ years experience in designing, developing Hadoop and Spark applications using Java and related technologies.

Hands-on experience in working on technologies like Cloudera and MapR Hadoop Distributions (HDFS,MapReduce, YARN, Apache Spark), NoSQL databases, SQL, GIT, Amazon EMR, Glue and Athena.

Currently working at Comcast Advanced Advertising team as a Spark Developer and supporting requirements in analysing ad reach and program ranking reports.

Hands on experience in handling multiple huge datasets from different sources and joining them to produce analytic reports.

Experience in designing and developing ETL pipelines in Hadoop using Hive and Spark.

Worked on new Class of Oracle Cloud products leveraging Big Data, Hadoop and Data Mining technologies.

Worked on designing and developing the Real-Time Time Series Analysis and Contextualization module for Voice of Factory using Oracle IoT, Kafka, Spark Streaming, Spark SQL.

Worked on Designing and Developing Contextualization algorithm which matches IoT Machine data with corresponding ERP Information.

Built a NoSQL write layer to write Time Series data to Oracle NoSQL from Spark.

Experience working on continuous deployment tools like Jenkins and Hudson, build tools like Maven and Ant.

AWS Certified Solutions Architect – Associate (2018)

AWS Certified Cloud Practitioner (2018)

Experience working on Deep Learning python frameworks like TensorFlow and Keras.

Machine Learning Engineer from Udacity and Kaggle.

Have experience working with Different Python Libraries like Scikit-Learn, NumPy, Pandas, MatPlotLib,SciPy.

2 years experience in teaching Databases, Algorithms and Data Structures at Graduate level.

Technical Skills

Big Data Tools: Apache Spark, Apache Hadoop, Sqoop, Hive. NoSQL Databases.

Languages: Python, Java, SQL, PL/SQL.

Databases: MySQL, SQL Server, Oracle DB.

Other technologies: Amazon EC2, Amazon S3, JSON, XML, parquet, Avro, GIT, Anaconda.

Python Libraries: Scikit-Learn, NumPy, Pandas, MatPlotLib, Jupyter, TensorFlow, Keras,

PySpark, SciPy.

Machine Learning Models: PCA, Isomap, K-Means, K-NN, Linear Regression, Decision Trees,

Random Forests, Reinforcement Learning, AdaBoost, SVM, RNN, CNN,

Q-Learning, Gaussian NB, MDP, SGD, Gradient Boosting.

Certifications

Machine Learning Engineer from Udacity

AWS Certified Solutions Architect – Associate (2018)

AWS Certified Cloud Practitioner (2018)

EdX - Microsoft Professional Program Certificate in Data Science.

Publications

1.Personalized QOS based Ranking Approach for Cloud Service Selection.

(International Journal of Computer Applications, Vol 127 - No 18).

http://www.ijcaonline.org/archives/volume127/number18/22832-201-***-****

Education

Udacity – Machine Learning Engineer May 2017- Dec 2017

Osmania University - Masters in Computer Science CGPA- 8.7 Oct 2013- Aug 2015

Sri Venkateswara University - Bachelors in Computer Science CGPA -7.2 Aug 2008- May 2012

Professional Experience

Employer: DWH Consulting Nov 2017 – Current

Client: Comcast Cable

Role: Apache Spark Developer

Location: Sunnyvale, CA

Advanced Advertising division of Comcast works with multiple Advertisers, Programmers and other 3rd party vendors like Nielsen, Experian, LiveRamp, Adobe etc. to provide TV viewing and Ad Reach analytics. It also works on providing user segmentation related data services.

My work involved following tasks:

1.Creating ad-reach reports for different partners and 3rd party vendors based on Comcast’s Set Top Box data feed.

2.Creating Customer segmentation-based program viewing ranking using STB data and Audience segmentation data.

3.Migrating On-prem Spark jobs to AWS EMR and AWS Glue.

Technology Components Involved:

Apache Spark, MapR Hadoop, Apache Airflow, Hive, AWS EMR, AWS Glue.

Employer: DWH Consulting Sep 2015 – Oct 2016

Client: Oracle Corporation, India.

Role: Big Data Developer

Location: Hyderabad/Bangalore, India

Voice of Factory (VoF) is first of many pre-packaged applications from oracle that leverage IoT, In- Memory, Big Data, and Data Mining technologies to help manufacturers identify opportunities to improve yield & quality, and reduce manufacturing costs. VoF is designed to identify hidden patterns & correlations in Yield/Quality, Predict Yield, defects & downtimes, perform rapid root cause analysis and analyze impact of yield and Quality deviations.

Users can perform what and why analysis on their data using VoF’s correlation, clustering and Genealogy and tracing features. They can perform actions for resolution of issues or initiate audit workflows.

The Fast Data/ Streaming Analytics part of VoF can handle millions of events per second and contextualize the events with ERP data and perform predictive analytics tasks based on the machine learning models deployed and can also prescribe the next best action based on automated rules.

VoF is a SaaS on PaaS offering from Oracle that provides:

1.Consolidation and contextualization of information from multiple sources through Oracle IoT cloud, Kafka, Spark Streaming and Oracle NoSQL Database.

2.Yield and Quality Dashboards that have multiple screens with correlation and genealogy charts which provide What and Why kind of analysis. These dashboards are developed in Oracle JET Frontend framework and Jersey REST API. Data is sourced from Oracle DBaaS.

3.Insights from history data using a range of tools like Oracle JET, Jersey REST API and Oracle R with Data Mining Tools.

4.Real time predictive analytics capabilities using Spark Streaming, Spark SQL and Oracle Data Mining tools.

Technology Components Involved:

Apache Spark, Cloudera Distributed Hadoop (5.5.1), Oracle NoSQL Database, Oracle 12c DB, IoT Cloud service.

Osmania University Oct ‘13 - Aug ‘15

Role: Graduate Teaching Assistant

Location: Hyderabad, India

Provided academic support and instruction to students of two graduate level courses Database systems and Algorithms

Assisted head faculty member with classroom instruction material, exams, assignments and record keeping

Responsible for grading assignments, class participation, and exams.

Wipro Technologies Pvt Ltd June ’12 - Oct ‘13

Client – AstraZeneca

Role: Software Engineer

Location: Hyderabad, India

I was part of a team that worked on Developing query backend for reporting tools.

My work involved:

Developing SQL queries using joins, grouping, aggregation, nested subqueries etc.

Creating and feeding base tables from various sources.

Performance tuning queries for optimal execution.

Adding new sources to feed reporting tools.

Technologies: Oracle 10g, SQL, PL/SQL, Java.

ML Project Experience May ’17 – Dec ‘17

Diabetic Retinopathy(DR) Detection

Built a binary classification neural network model on 10000 images to detect the presence of DR in retinal images using TensorFlow and Python.

Achieved an accuracy of 85% using Convolutional Neural Networks.

Document Classification of News Articles

Developed a supervised learning model to classify a collection of text documents (the mini 20 Newsgroups dataset) on twenty different topics using Apache Spark ML and Scikit-Learn.

Extracted features by text tokenization and created bag of words using tf-idf.

Increased the F1 score to 91% using Support Vector Machines.

Customer Segmentation for Wholesale Distributors

Developed a predictive model using unsupervised learning methods analyze patterns in purchase orders among existing customers for a wholesale food distributor company.

Achieved a silhouette score of 0.42 using PCA dimensionality reduction and K-Means clustering in Scikit-Learn and Python.

The model assists in implementing A/B tests when changing the delivery scheme.

Created the Bi-plot visualization to show the projection of original features along the principal components using MatPlotLib.

Amazon Reviews Classification

Implemented a sentiment analysis model to predict positive or negative sentiment of 20000 Amazon reviews using Apache Spark ML, Spark SQL and Scala.

Achieved an error metric more robust to class imbalance (AUROC) of 0.845 using logistic Regression.

Finding Donors for Non-Profits

Developed a predictive model that classifies whether an individual makes more than a certain income, thus helping in determining the size of donation request.

Increased accuracy to 84% employing Stochastic Gradient Descent classifier using Jupyter, Scikit-Learn and Python.

Created Bar chart to visualize the performance of different models AdaBoost, Gaussian NB, SVM, Decision Trees using MatPlotLib.

Power Prediction of Generation Plant

Implemented a supervised learning model to predict power output from sensor readings in a gas-fired power generation plant using Apache Spark ML, Spark SQL and Scala.

Achieved an r2 score of 0.92 using multiple regression techniques.

Smart Cab

Developed an AI driving agent by applying Reinforcement learning to train a smart cab how to drive using NumPy, Pandas and Python.

Achieved 100% safety level and 90% reliability level for over 300 trips using optimized Q-Learning technique.

Image Classification - CIFAR 10

Trained a neural network on 50000 images to classify objects from CIFAR-10 dataset of ten different categories.

Achieved test loss less than 0.001 using six layered CNN model using TensorFlow, Keras and Python.

QoS based Cloud Recommender System

Developed a recommendation systems of cloud service providers in Java considering the Quality of service parameters like response time, throughput, failure probability.

Trained on 3000 users for 10 cloud providers using KRCC ranking and Pearson Correlation Coefficient.

REFERENCES - Can provided on request.

Contact this candidate