Post Job Free

Resume

Sign in

Data Engineer

Location:
Mohali, Punjab, India
Posted:
July 01, 2019

Contact this candidate

Resume:

Prasanth

Email: ac9qla@r.postjobfree.com Contact: 91-703*******

Professional Summary:

Having 4 Years of Experience in IT Industry in Big Data using Hadoop, Hive, PIG, Sqoop, Oozie and Scala, Apache Spark Programming, SparkMl lib

Good Knowledge & Experience on Scala, Apache Spark, Zookeeper, HBase

Good Knowledge on Hadoop Ecosystem, HDFS, Hadoop, Spark Architectures

Good Experience is using Hortonworks & Cloudera

Expertise in working with Spark Framework using Spark SQL, Spark Streaming.

Experience of Building Path from Kafka to Spark streaming using Scala Programming.

Prepared, processed numerous customer input files; parsed and reformatted the data to meet product requirements

Experience in manipulating/analysing large datasets and finding patterns and insights within structured data

Good Perception on Production/Application Support life cycle and Strong Analytical and Programming Skills

Experience in writing PIG scripts to access HDFS data in Hadoop Systems

Experience in writing of HIVE reports & Oozie scheduling

Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop

Experience on Analyzing the Data using K-Means Algorithm with the help of Spark MliB

Proficient in Technologies like SQL, PL/SQL, HiveQL, HBase, Spark SQL.

Good Experience in working with Oracle Database.

Experience in Implementing OOZIE workflows

Hands on experience in VPN Putty WinSCP VNCviewer etc

Experience in Dealing with UNIX commands

Having knowledge and experience on complete installation of jdk1.6.0, HDFS, PIG, Hive and Intellij

Knowledge on AWS Architecture/Services

Good Experience on Python

Good experience in dealing with OOPS concepts of functional programming

Having good knowledge & Experience in Other Utilities TOAD, SQL LOADER, SQL*PLUS.

Experience Summary:

Currently working as Senior Engineer for Emerson Information Technology Solutions, Mohali from June’15 to till date.

Technical Proficiency:

Big Data Ecosystems: Hadoop, HDFS, HBase, Zookeeper, Hive, Pig,

Sqoop Oozie, HBase, Scala, Spark

ERP Tool : Oracle Applications 11i/R12

Database : Oracle 9i, 10g,11g

Languages : Scala, SQL, PL/SQL, Java Se, C.

Tools : TOAD, Putty, SQL *Plus, SQL *Loader, Automic(UC4), HPSDM

GUI Tools : Developer 2000, XML Publisher, Web Console

Operating Systems : Windows NT /2000 Server/XP and Linux.

Education:

M. Tech (Software Engineering) from Gitam University in the year 2014.

Project 2.

Company : Emerson Automation Solutions, Mohali

Project : PF EDM (Process Factory Enterprise Data Model)

Duration : Feb 2017 to till date

Role : Data Engineer

Description:

The purpose of the project is to perform the analysis on the Effectiveness and validity of controls and to store terabytes of log information generated by the source providers as part of the analysis and extract meaningful information out of it. The solution is based on the open source Big Data software Hadoop. The data will be stored in Hadoop file system and processed using Apache Spark jobs, which intern includes getting the raw data, process the data to obtain controls and redesign/change history information, extract various reports out of the controls history and Export the information for further processing.

Roles and Responsibilities:

Involved in Design and Development of technical specifications using Hadoop technology.

Involved in moving data generated from various sources to HDFS for further processing.

Responsible for building scalable distributed data solutions using Hadoop.

Developed Spark scripts by using Scala shell commands as per the requirement.

Prepared the Hive Reports for the End users’ analysis.

Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Involved in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

Involved in creating tables, partitioning, bucketing of table and creating UDF’s in Hive.

Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.

Loading of Data to Hive tables by using Sqoop as required.

Writing of PIG scripts for generating of required data.

Testing of the Data/Result using the ML Algorithms

Created PLSQL package for generating of required data and moved to HDFS for further processing.

Project 1.

Client : Emerson Process Management, USA

Project Role : Hadoop Developer

Duration : June 2015 to Jan 2017

Designation : Technical Analyst

Description:

Maintaining the customer member details and rewards points transaction are very difficult in terms of storage and processing. Member loyalty management system is replacing the existing reward management system which is developed as a web service provider with the help of database sharing. Aim of this system is to reduce the response time of web service. The solution is based on the open source Big Data s/w Hadoop

Responsibilities

Application installation of Hadoop, Hive, Spark & Sqoop

HDFS support and maintenance and Adding/Removing a Node, Data Rebalancing.

Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself. Involved in developing the Pig scripts

Involved in developing the Hive Reports.

Implemented Partitioning, Dynamic Partitions, Buckets in Hive

Solved Performance issues of Hive & Pig Jobs by understanding the Joins, Group & aggregation functions

Built the physical data model for customer review and approval and constructed the registration database using Oracle 9i on a windows platform

Integrated multiple logical data models into a single data model

Analysis of Data using K-Means Algorithm with MLiB.

Created and implemented ER models and dimensional models

Produced documentation as per the company standards and SDLC.

Responsible for loading data files from various external sources like ORACLE.



Contact this candidate