BIG Data Developer

Location:

Champapet, Telangana, India

Salary:

$60/hr

Posted:

November 26, 2019

Contact this candidate

Resume:

Hadoop Developer

Santhoshi

*****************@*****.***

+1-609-***-****

I aspire to work with a collaborative team of talented individuals and produce work that serve to functionally and poetically fulfill the client needs. As part of the team, I will continue to seek out knowledge of the field that compliments my hard work.

Experience Summary:

Dynamic IT professional with 8 years of experience (Including 2 years at onsite) in project development, project maintenance, Business analysis and project support.

Experience in Commercial and retail banking applications.

Extensive knowledge in all stages of the Software Development Life Cycle (SDLC) beginning from initiation, definition to implementation and support.

Experience in different Software development methodologies like Waterfall Model and Agile Software Development process.

Hands on experience in BigData Hadoop-MapReduce, Pig, Hive, Hbase, Sqoop &Flume, Hadoop-Yarn and Machine learning.Basic knowledge Storm and Spark.

Experienced in performing analytics on structured and unstructured data using Hive queries.

Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.

Efficient in building Hive, pig and map Reduce scripts.

Experience in Big Data platforms like Hortonworks, Cloudera.

Experience in creating web pages using HTML, Java Script, and CSS.

Experience in C,C++,JAVA and mainframe technologies like COBOL,DB2, VSAM,CICS and REXX.

Experience in troubleshooting & performance tuning of Mainframe applications.

Extensively used Dynamic SQL commands and SQL stored procedures, Functions and joins to interact with the Database. Database Backup and Recovery.

Strong interpersonal and communication skills with the ability to understand both business and technical needs from clients and customers.

Strong analytical and quantitative skills as well as verbal and presentation skills to enhance relations amongst team members as well as clients.

Technical Skills:

Big Data

Hadoop–SPARK, SCALA, HDFS, Map Reduce, Pig, Hive, Sqoop, HBase, Flume, HBase, Zookeeper

Programming Language

Mainframe,Core Java

Scripting Language

Java Script,CSS,HTML

Databases

DB2,VSAM&MySQL

Operating System

Windows,LINUX(Ubuntu), IBM Mainframes

Mainframe Technologies

COBOL,CICS,JCL,VSAM,REXX

IDE and tools

Eclipse,Microsoft Visual Studio

Certifications:

Title

Version

Acquired On

IBM Certified Database Administrator - DB2 9 DBA for z/OS

9.0

23/09/2011

IBM Certified Database Associate - DB2 9 Fundamentals Certification

1.0

19/08/2010

Big Data Programming

8/20/2018

Machine Learning –AI (Cousera)

11/10/2018

Professional Experience:

DATAFLAIR INC October 2018 – September 2019

Big data Developer

World Bank DataAnalytics:

This Application is used for the analytics of the world banklogs data.

The data is stored in the HDFS in distributed manner over a cluster of nodes, which addresses the problems of scalability high availability and fault tolerance.

The weblogs are processed using spark which used all the basic features like transformation, action etc.

It contains the data of population, health, internet, GD.

Imported data from relational data stores to Hadoop using Sqoop.

Involved in developing Hive DDLs to create, alter and drop tables.

Performance tuning of various joins to map side joins. Performance improvement of hive queries using partitioning and bucketing.

Contributed to data movement to the Data lake from multiple data sources.

Experience in working with different data sources like Flat files, XML files, JSON files, ORC files, Parquet files and AVRO files using different SerDe's in Hive.

Configuring and performance tuning the Sqoop jobs for importing the data from RDBMS to the Data Lake.

Big data analysis using Hive queries and Hive UDFs for the needed functionality that is not available from Apache Hive.

Developed Spark jobs and Hive jobs to summarize and transform data.

Converted unstructured data to structured data using PySpark.

Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.

Creating the raw Avro data for an efficient feed to the map reduce processing.

Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.

Implemented Spark using Python and SparkSQL for faster testing and processing of data.

Extensively worked with Spark Data frames for ingesting data from flat files into RDD's to transform unstructured data in structured data.

Performed various data validations, data cleansing and data aggregation using series of Spark transformations.

Used features like Parallelize, Partitioned, Caching (both in-memory and diskSerialization)

Environment: Hive, Sqoop, Spark Core, Spark SQL, Python, PySpark, Kafka.

JP MORGAN CHASE (USA) Feb 2013- June 2017

Hadoop Developer

Tax System:

Capture Sales/Purchase tax KPI's/uses cases from BusinessAnalysts, users

Analyze the Retail Sales/Purchase data set

Develop SPARK Scala, SPARK SQL Programs using EclipseIDE on Windows/Linux environment.

Create KPI’stestscenarios, test cases, test resultdocument

Test the Scalaprograms in Linux SparkStandalonemode.

Setup multicluster on AWS, deploy the Spark Scala programs

Provided solution using Hadoop ecosystem-HDFS, MapReduce, Pig, Hive, HBase, and Zookeeper.

Provided solution using large scale server-side systems with distributed processing algorithms.

Created reports for the BI team using Sqoop to export data into HDFS and Hive.

Provided solution in supporting and assisting in troubleshooting and optimization of MapReduce jobs and Pig Latin scripts.

Deep understanding of Hadoop design principles, cluster connectivity, security and the factors that affectsystem performance.

Worked on Importing and exporting data from different databases into HDFS and Hiveusing Sqoop.

Import and export the data from RDBMS to HDFS/HBASE

Wrote script and placedit in client side so that the data moved to HDFS will be stored in temporary file and then it will start loading it in hive tables.

Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.

Involved in developing the HiveReports, Partitions of Hive tables.

Created and maintained technical documentation for launching HADOOP Clusters and for executing HIVEqueries and PIG Scripts.

Involved in running Hadoop jobs for processing millions of records of text data

Environment: Java, Hadoop, HDFS, Map-Reduce, Pig, Hive, Sqoop, Flume, HBase, Spark, Scala, Linux, Putty.

Credit Card offer:

Analyzed the data by performing Hivequeries and running Pig scripts to know the insights of existing customers and potential Users.

Imported data from various data sources, performed transformations using Hive,MapReduce, and loaded data into HDFS.

Built scalable distributed data solutions using Hadoop.

Install and Configure Cassandra Database and Data load using java API.

Loaded the third party provided data by writing a MapReduce and HIVE.

Imported and exported data into HDFS and HIVE using SQOOP.

Logged various level of information like error, info and debug into the log files using the Log4j.

Written HIVE UDF’s and configured HIVE.

Thoroughly analyzed the Business Specification Document and gathered the requirements and created technical specifications

Environment: Hadoop MapReduce, HDFS, Hive,Cassandra

Log Analytics System:

Perform analytics over huge volumes of IVR log data.

Perform pre-processing using Map reduce jobs and convert the raw data stored in XML format in HDFS CSV files.

Load the processed data is loaded into Hive table, and is processed by running HQL queries.

The architecture solves problem of limited storage capacity and limited processing capacity, the system is scalable, reliable, and fault tolerant

Environment: Hadoop, Hive, Flume, Java

CITI BANK (USA) Dec 2009 –Feb2013

Domain : Banking and Financial Services

Intraday Transaction:

Preparation of architectural design document analyzing what are all the interfaces which got impacted through this project.

Impact analysis

High level design document describing the changes to the existing programs and functionality of the new COBOL-CICS program.

Development of new COBOL-CICS program which will interface with vendor for getting the balance enquiry and pre-validate balance on the Checking/Savings/Loan accounts before transfer.

Development new COBOL DB2 program to extract the valid accounts from database.

Code walkthrough for changes to the existing impacted modules.

Testing turnover documents depicting how the new functionality is tested in CICS.

Developing UTP (Unit test plan) and UTR (Unit test results).

Post implementation checkout in Production.

Programming language: Cobol, CICS, DB2, VSAM.

Rusa Test Data Management Systems:

Impact analysis

Program requirement document (PRD) which describes what are the data which are required for creating master file, and, from which files, these data can be picked up.

Development of COBOL program to read the VSAM files and populate the data in the master file.

Developing COBOL DB2 program to load the critical data of all valid customers in to database.

Schedule batch jobs to run DB2 utilities at regular intervals to take image copy of the database for recovery purpose.

Database maintenance. System and UAT testing.

Preparation of Unit test plan document.

Programming language: Cobol, JCL, DB2, SQL.

Exstream Notices:

Technical design document stating what is the proposed logic and what are all the changes required.

Impact analysis

Technical turnover document describing the functionality of the new COBOL programs and changes to the existing programs.

Development of new COBOL program which accepts the input from various programs and prepares the notice in Exstream format.

Preparing Unit test results document.

Working with upstream systems for testing needs.

Post implementation checkout.

Programming language: Cobol, JCL.

Contact this candidate