Prasanth
Email: ********@*****.*** Contact: 91-703*******
Professional Summary:
Having 4 Years of Experience in IT Industry in Big Data using Hadoop, Hive, PIG, Sqoop, Oozie and Scala, Apache Spark Programming, SparkMl lib
Good Knowledge & Experience on Scala, Apache Spark, Zookeeper, HBase
Good Knowledge on Hadoop Ecosystem, HDFS, Hadoop, Spark Architectures
Good Experience is using Hortonworks & Cloudera
Expertise in working with Spark Framework using Spark SQL, Spark Streaming.
Experience of Building Path from Kafka to Spark streaming using Scala Programming.
Prepared, processed numerous customer input files; parsed and reformatted the data to meet product requirements
Experience in manipulating/analysing large datasets and finding patterns and insights within structured data
Good Perception on Production/Application Support life cycle and Strong Analytical and Programming Skills
Experience in writing PIG scripts to access HDFS data in Hadoop Systems
Experience in writing of HIVE reports & Oozie scheduling
Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop
Experience on Analyzing the Data using K-Means Algorithm with the help of Spark MliB
Proficient in Technologies like SQL, PL/SQL, HiveQL, HBase, Spark SQL.
Good Experience in working with Oracle Database.
Experience in Implementing OOZIE workflows
Hands on experience in VPN Putty WinSCP VNCviewer etc
Experience in Dealing with UNIX commands
Having knowledge and experience on complete installation of jdk1.6.0, HDFS, PIG, Hive and Intellij
Knowledge on AWS Architecture/Services
Good Experience on Python
Good experience in dealing with OOPS concepts of functional programming
Having good knowledge & Experience in Other Utilities TOAD, SQL LOADER, SQL*PLUS.
Experience Summary:
Currently working as Senior Engineer for Emerson Information Technology Solutions, Mohali from June’15 to till date.
Technical Proficiency:
Big Data Ecosystems: Hadoop, HDFS, HBase, Zookeeper, Hive, Pig,
Sqoop Oozie, HBase, Scala, Spark
ERP Tool : Oracle Applications 11i/R12
Database : Oracle 9i, 10g,11g
Languages : Scala, SQL, PL/SQL, Java Se, C.
Tools : TOAD, Putty, SQL *Plus, SQL *Loader, Automic(UC4), HPSDM
GUI Tools : Developer 2000, XML Publisher, Web Console
Operating Systems : Windows NT /2000 Server/XP and Linux.
Education:
M. Tech (Software Engineering) from Gitam University in the year 2014.
Project 2.
Company : Emerson Automation Solutions, Mohali
Project : PF EDM (Process Factory Enterprise Data Model)
Duration : Feb 2017 to till date
Role : Data Engineer
Description:
The purpose of the project is to perform the analysis on the Effectiveness and validity of controls and to store terabytes of log information generated by the source providers as part of the analysis and extract meaningful information out of it. The solution is based on the open source Big Data software Hadoop. The data will be stored in Hadoop file system and processed using Apache Spark jobs, which intern includes getting the raw data, process the data to obtain controls and redesign/change history information, extract various reports out of the controls history and Export the information for further processing.
Roles and Responsibilities:
Involved in Design and Development of technical specifications using Hadoop technology.
Involved in moving data generated from various sources to HDFS for further processing.
Responsible for building scalable distributed data solutions using Hadoop.
Developed Spark scripts by using Scala shell commands as per the requirement.
Prepared the Hive Reports for the End users’ analysis.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Involved in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Involved in creating tables, partitioning, bucketing of table and creating UDF’s in Hive.
Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
Loading of Data to Hive tables by using Sqoop as required.
Writing of PIG scripts for generating of required data.
Testing of the Data/Result using the ML Algorithms
Created PLSQL package for generating of required data and moved to HDFS for further processing.
Project 1.
Client : Emerson Process Management, USA
Project Role : Hadoop Developer
Duration : June 2015 to Jan 2017
Designation : Technical Analyst
Description:
Maintaining the customer member details and rewards points transaction are very difficult in terms of storage and processing. Member loyalty management system is replacing the existing reward management system which is developed as a web service provider with the help of database sharing. Aim of this system is to reduce the response time of web service. The solution is based on the open source Big Data s/w Hadoop
Responsibilities
Application installation of Hadoop, Hive, Spark & Sqoop
HDFS support and maintenance and Adding/Removing a Node, Data Rebalancing.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself. Involved in developing the Pig scripts
Involved in developing the Hive Reports.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive
Solved Performance issues of Hive & Pig Jobs by understanding the Joins, Group & aggregation functions
Built the physical data model for customer review and approval and constructed the registration database using Oracle 9i on a windows platform
Integrated multiple logical data models into a single data model
Analysis of Data using K-Means Algorithm with MLiB.
Created and implemented ER models and dimensional models
Produced documentation as per the company standards and SDLC.
Responsible for loading data files from various external sources like ORACLE.