Data Professional Experience

Location:

Posted:

October 10, 2017

Resume:

PROFESSIONAL QUALITIES

Experience in Big Data analytics using Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, YARN, Spark, Kafka, Zookeeper and Flume.

Strong knowledge of object-oriented programing and application development tools using Microsoft VB.Net, C#, Java.

Designed and developed a windows application where administrator and student both can track whole details of their academic and nonacademic progress.

Extensive experience in writing map-reduce jobs on Hadoop ecosystem, including major components like Hive, Pig, sqoop, Spark, HBase.

Solid understanding of the Hadoop distributed file system and spark in-memory computing, yarn, MapReduce and Hadoop Infrastructure.

Strong understanding of Data Warehousing concepts, Recommendation engine, Spark and Kafka.

Loaded the data into Spark RDD and do in memory data Computation to generate the Output response

Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Good knowledge of installing, configuring, testing Hadoop ecosystem components.

Experience in installing, configuring and using Hadoop, HDFS, Hive, Pig, HBase, Sqoop and Flume.

Very Good knowledge and Hands-on experience in Cassandra, Flume and Spark (YARN).

Migrated complex Map reduce programs into Spark RDD transformations, actions.

Technical skills

Big Data Ecosystems: Spark, Hadoop, Hive, Sqoop, MapReduce, HDFS, Pig, Hbase

Programming Languages: C#, Scala, Python, Java

Scripting Languages: PHP, XML, HTML, Pyspark

Databases: SQL server, Oracle, PostgreSql, Hbase.

Tools: SQL server 2014, MS Visual Studio, Intellij Idea

Platform: Linux, Windows

Professional experience

VP Tech., VA February 2016 to December 2016

Big data Analytics

Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.

Successfully export large data sets from RDBMS to hive data warehouse using sqoop techniques for make use of hive QL.

Major contributor of the project by involving in resource gathering, design, development and implementation. i.e. complete SDLC

Working with the efficient middle level team as a developer and helped to build better solutions and also being part of architect solutions.

Collaborated with data analysis team to analyze, identify the complexity of data sets.

Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS and developed Map Reduce program for parsing and loading into HDFS information.

Projects

Revenue par product by given month

Technologies: Sqoop, Spark,Scala, HDFS

Using orders, order_items and products data set to compute revenue per product for a given month.

Filtered orders which fall in the month passed as argument and join filtered orders and order_items to get order_item details for a given month and get revenue for each product_id.

US-Domestic Flights Analysis

Technologies: Sqoop, Hive, HDFS

Setup VM-ware workstation and participated to gather and Analysis dataset of US-Domestic flights

Implemented table using HiveQL, like-wise Flight Date, airline ID, flight no, origin and destination airport, departure time and delay in minutes, arrival time & delay in minutes, amount of time in the air, distance in miles.

Doing map reduce job, aggregate the maximum departure delay for each originating airport, the average arrival delay by flight and the minimum arrival delay for all origin-destination airport combinations.

New York Taxi data analysis

Technologies: Hive, Pig, HDFS

To quantify the Total Pick-ups and Drop-offs by Time of Day based on location we used hive to analysis it. Second analysis was based on the total Pick-Up and Drop-Off for a day per hour and location. For executing query, we used Hive. The final output consists of the total Pick-Up or Drop-Off count for every hour of a day based on location.

Involved to analysis of Driver with most distance travelled, Driver with most fare collected, Driver with most time travelled, Most drop off location, Driver with most efficiency based on distance and time.

Involved to determine the average revenue per hour which includes both Gross revenue and Net revenue by using Pig.

Employee Detail Analysis

Technologies: mysql, Sqoop, Pig, HDFS

Created table using MySQL and insert values of employee dataset

Imported and exported data from RDBMS to HDFS using sqoop.

By using Pig script, implemented aggregation function and count employee by department, salary by department and average salary.

Changed file format from .csv to parquet and Avro file format.

Education and Training

Masters with computer Information and System (December 2016)

California University of Management and Science, VA GPA: 3.81

Bachelor with electronics and communication engineering (September 2012)

Rajasthan Technical University, India GPA: 3.18

Big data training with Hadoop Essentials and fundamentals – Edureka (February 2016)

Contact this candidate