********@*****.*** Hadoop/Spark Developer +1-832-***-****
Objective
Seeking position as Hadoop Developer and Administrator which enables use of my exceptional knowledge in Hadoop components, SQL, Python, and Spark.
Professional Summary
Hadoop Ecosystem:
Hands on experience with Big Data Hadoop Ecosystem tools like HDFS, Map Reduce, YARN, Hive, Pig, Hbase, Impala, Oozie, Spark, Kylin for ingestion, storage, querying, processing and analysis of data.
Hands on experience with Cloudera and Hortonworks Hadoop distributions.
Hands on experience with developing applications using spark core and good knowledge on spark streaming.
Performance tuning in Hive& Impala using multiple methods but not limited to dynamic partitioning, bucketing, indexing, file compressions, vectorization, and cost based optimization, etc.
Hands-on experience using data ingestion tools kafka, flume and sqoop.
Hands on experience handling different file formats like Json, AVRO, ORC and Parquet.
Hands on experience in using apache Drill for low latency sub second queries.
Integrated Drill, Impala with Tableau using JDBC for data visualization.
Experience on analyzing data in NOSQL databases like Hbase, MongoDB.
Hands-on experience with connecting Tableau to different data sources and creating dashboards and worksheets.
Java & Other:
Hands-on programming experience in C and JAVA.
Hands on with UNIX commands, shell scripting and setting up CRON jobs.
Experience in software configuration management using GIT.
Experience in whole SDLC cycle: design, development and deployment of high-performance, scalable, distributed applications with Agile Scrum methodology and waterfall model.
Technical Skills
Hadoop Components
HDFS, Hue, MapReduce, PIG, Hive, Hbase, Sqoop, Impala, Oozie, Zookeeper, Flume, Kafka, Yarn and Cloudera Manager.
Spark Components
Apache Spark, Data Frames, SparkSQL, Spark, YARN, Pair RDDs.
Databases
Microsoft SQL Server, MySQL, Oracle.
Programming Languages
C, C++, Java, Asp.net.
Web Servers
Windows server 2005/2008 and Apache Tomcat.
IDE
Eclipse, Pycharm.
OS/Platforms
Windows, Linux (All major distributions), Unix.
NoSQL Databases
Hbase, MongoDB and Cassandra.
Educational Qualifications
Masters in Software Engineering 2015-2017
University of Houston Clear Lake, Houston,TX
Bachelor’s in computer Science and Engineering 2010-2014
Jawaharlal Nehru Technology University, Hyderabad, India
Professional Experience
Data Analytics
Role: Hadoop developer. Jan2014 – July 2015
Description: Project to predict buying habits of customers and display ads on an online store. This was a proof of Concept developed to determine if buying habits can be extracted without adding any additional monitoring scripts and with just based on ad interaction and social interaction of the consumer. The data was all gathered from social networks Twitter, Facebook to observe how vocal consumers are about their purchases.
Responsibilities:
Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
Installed Cloudera Manager on the clusters.
Used a 60-node cluster with Cloudera Hadoop distribution on Amazon EC2.
Developed ad-clicks based data analytics, for keyword analysis and insights.
Crawled public posts from Facebook and tweets.
Wrote MapReduce jobs with the Data Science team to analyze this data.
Converted output to structured data and imported to Spotfire with analytics team.
Defined problems to look for right data and analyze results to make room for new project.
TIBCO Spotfire with in-house custom application was used to perform and generate analytic
Environment:Hadoop, HBase, HDFS, MapReduce, Java, Spotfire, Cloudera Manager, Amazon EC2
ACADEMIC PROJECTS
Master’s Capstone Project Jan 2017 – May2017
Name: Analysis on Restaurants Data
Description: Our system will analyze the past dataset and determine how the business will work on days. This information can be utilized by future owners to improve their business during rush hours of the day. Our analysis will help future business owners to find best location to attain better profits. Besides having regular services, owners can also determine which sort of extra offers they should provide for their customers to improve their business.
Responsibilities:
The business data and reviews are taken from Yelp.
Extracted the data from json files using Pycharm.
Responsible for designing and implementing the data pipeline from end-to-end for this project.
Queried the files depending on requirement and converted the result into json file and used them for visual analytics.
Created worksheets and dashboards in tableau for data visualization and analysis.
Fortune Cookie Server Aug2016 Dec 2016
Description: A virtual project that generates fortune cookies for one client at a time and accepts multiple client requests, generates a different cookie for the each client for the first time and enabled fortune cookie encryption and decryption using symmetric key encryption technique, using multi-threading in C language.
Responsibilities:
Design and Implementation of test automation framework with object oriented python for test execution, and MySQL for accepting different test requests and soap protocol to propagate tests to different systems
Development of front-end for test plan, execution and result display
Created a framework in C for cookie server testing, which will call different server driver APIs written in C, executes them and generates a report
User Maintenance Tool Jan2016 May2016
Description: User Maintenance Tool is developed to allow users to edit the user’s access profiles and maintain User Roles.
Responsibilities:
Involved in documenting detailed design of the application.
Developed the application in Struts MVC pattern.
Worked with Java AJAX,JSP,Javascript,Jquery and CSS.
Used CVS for version control and maven as build tool.
Deployed the Application on Apache Tomcat Server.
Involved in writing a Test Suite using JUnit for Unit Testing of the Application.