Hadoop Developer
Santhoshi
*****************@*****.***
I aspire to work with a collaborative team of talented individuals and produce work that serve to functionally and poetically fulfill the client needs. As part of the team, I will continue to seek out knowledge of the field that compliments my hard work.
Experience Summary:
Dynamic IT professional with 8 years of experience (Including 2 years at onsite) in project development, project maintenance, Business analysis and project support.
Experience in Commercial and retail banking applications.
Extensive knowledge in all stages of the Software Development Life Cycle (SDLC) beginning from initiation, definition to implementation and support.
Experience in different Software development methodologies like Waterfall Model and Agile Software Development process.
Hands on experience in BigData Hadoop-MapReduce, Pig, Hive, Hbase, Sqoop &Flume, Hadoop-Yarn and Machine learning.Basic knowledge Storm and Spark.
Experienced in performing analytics on structured and unstructured data using Hive queries.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Efficient in building Hive, pig and map Reduce scripts.
Experience in Big Data platforms like Hortonworks, Cloudera.
Experience in creating web pages using HTML, Java Script, and CSS.
Experience in C,C++,JAVA and mainframe technologies like COBOL,DB2, VSAM,CICS and REXX.
Experience in troubleshooting & performance tuning of Mainframe applications.
Extensively used Dynamic SQL commands and SQL stored procedures, Functions and joins to interact with the Database. Database Backup and Recovery.
Strong interpersonal and communication skills with the ability to understand both business and technical needs from clients and customers.
Strong analytical and quantitative skills as well as verbal and presentation skills to enhance relations amongst team members as well as clients.
Technical Skills:
Big Data
Hadoop–SPARK, SCALA, HDFS, Map Reduce, Pig, Hive, Sqoop, HBase, Flume, HBase, Zookeeper
Programming Language
Mainframe,Core Java
Scripting Language
Java Script,CSS,HTML
Databases
DB2,VSAM&MySQL
Operating System
Windows,LINUX(Ubuntu), IBM Mainframes
Mainframe Technologies
COBOL,CICS,JCL,VSAM,REXX
IDE and tools
Eclipse,Microsoft Visual Studio
Certifications:
Title
Version
Acquired On
IBM Certified Database Administrator - DB2 9 DBA for z/OS
9.0
23/09/2011
IBM Certified Database Associate - DB2 9 Fundamentals Certification
1.0
19/08/2010
Big Data Programming
8/20/2018
Machine Learning –AI (Cousera)
11/10/2018
Professional Experience:
DATAFLAIR INC October 2018 – September 2019
Big data Developer
World Bank DataAnalytics:
This Application is used for the analytics of the world banklogs data.
The data is stored in the HDFS in distributed manner over a cluster of nodes, which addresses the problems of scalability high availability and fault tolerance.
The weblogs are processed using spark which used all the basic features like transformation, action etc.
It contains the data of population, health, internet, GD.
Imported data from relational data stores to Hadoop using Sqoop.
Involved in developing Hive DDLs to create, alter and drop tables.
Performance tuning of various joins to map side joins. Performance improvement of hive queries using partitioning and bucketing.
Contributed to data movement to the Data lake from multiple data sources.
Experience in working with different data sources like Flat files, XML files, JSON files, ORC files, Parquet files and AVRO files using different SerDe's in Hive.
Configuring and performance tuning the Sqoop jobs for importing the data from RDBMS to the Data Lake.
Big data analysis using Hive queries and Hive UDFs for the needed functionality that is not available from Apache Hive.
Developed Spark jobs and Hive jobs to summarize and transform data.
Converted unstructured data to structured data using PySpark.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
Creating the raw Avro data for an efficient feed to the map reduce processing.
Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
Implemented Spark using Python and SparkSQL for faster testing and processing of data.
Extensively worked with Spark Data frames for ingesting data from flat files into RDD's to transform unstructured data in structured data.
Performed various data validations, data cleansing and data aggregation using series of Spark transformations.
Used features like Parallelize, Partitioned, Caching (both in-memory and diskSerialization)
Environment: Hive, Sqoop, Spark Core, Spark SQL, Python, PySpark, Kafka.
JP MORGAN CHASE (USA) Feb 2013- June 2017
Hadoop Developer
Tax System:
Capture Sales/Purchase tax KPI's/uses cases from BusinessAnalysts, users
Analyze the Retail Sales/Purchase data set
Develop SPARK Scala, SPARK SQL Programs using EclipseIDE on Windows/Linux environment.
Create KPI’stestscenarios, test cases, test resultdocument
Test the Scalaprograms in Linux SparkStandalonemode.
Setup multicluster on AWS, deploy the Spark Scala programs
Provided solution using Hadoop ecosystem-HDFS, MapReduce, Pig, Hive, HBase, and Zookeeper.
Provided solution using large scale server-side systems with distributed processing algorithms.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Provided solution in supporting and assisting in troubleshooting and optimization of MapReduce jobs and Pig Latin scripts.
Deep understanding of Hadoop design principles, cluster connectivity, security and the factors that affectsystem performance.
Worked on Importing and exporting data from different databases into HDFS and Hiveusing Sqoop.
Import and export the data from RDBMS to HDFS/HBASE
Wrote script and placedit in client side so that the data moved to HDFS will be stored in temporary file and then it will start loading it in hive tables.
Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
Involved in developing the HiveReports, Partitions of Hive tables.
Created and maintained technical documentation for launching HADOOP Clusters and for executing HIVEqueries and PIG Scripts.
Involved in running Hadoop jobs for processing millions of records of text data
Environment: Java, Hadoop, HDFS, Map-Reduce, Pig, Hive, Sqoop, Flume, HBase, Spark, Scala, Linux, Putty.
Credit Card offer:
Analyzed the data by performing Hivequeries and running Pig scripts to know the insights of existing customers and potential Users.
Imported data from various data sources, performed transformations using Hive,MapReduce, and loaded data into HDFS.
Built scalable distributed data solutions using Hadoop.
Install and Configure Cassandra Database and Data load using java API.
Loaded the third party provided data by writing a MapReduce and HIVE.
Imported and exported data into HDFS and HIVE using SQOOP.
Logged various level of information like error, info and debug into the log files using the Log4j.
Written HIVE UDF’s and configured HIVE.
Thoroughly analyzed the Business Specification Document and gathered the requirements and created technical specifications
Environment: Hadoop MapReduce, HDFS, Hive,Cassandra
Log Analytics System:
Perform analytics over huge volumes of IVR log data.
Perform pre-processing using Map reduce jobs and convert the raw data stored in XML format in HDFS CSV files.
Load the processed data is loaded into Hive table, and is processed by running HQL queries.
The architecture solves problem of limited storage capacity and limited processing capacity, the system is scalable, reliable, and fault tolerant
Environment: Hadoop, Hive, Flume, Java
CITI BANK (USA) Dec 2009 –Feb2013
Domain : Banking and Financial Services
Intraday Transaction:
Preparation of architectural design document analyzing what are all the interfaces which got impacted through this project.
Impact analysis
High level design document describing the changes to the existing programs and functionality of the new COBOL-CICS program.
Development of new COBOL-CICS program which will interface with vendor for getting the balance enquiry and pre-validate balance on the Checking/Savings/Loan accounts before transfer.
Development new COBOL DB2 program to extract the valid accounts from database.
Code walkthrough for changes to the existing impacted modules.
Testing turnover documents depicting how the new functionality is tested in CICS.
Developing UTP (Unit test plan) and UTR (Unit test results).
Post implementation checkout in Production.
Programming language: Cobol, CICS, DB2, VSAM.
Rusa Test Data Management Systems:
Impact analysis
Program requirement document (PRD) which describes what are the data which are required for creating master file, and, from which files, these data can be picked up.
Development of COBOL program to read the VSAM files and populate the data in the master file.
Developing COBOL DB2 program to load the critical data of all valid customers in to database.
Schedule batch jobs to run DB2 utilities at regular intervals to take image copy of the database for recovery purpose.
Database maintenance. System and UAT testing.
Preparation of Unit test plan document.
Programming language: Cobol, JCL, DB2, SQL.
Exstream Notices:
Technical design document stating what is the proposed logic and what are all the changes required.
Impact analysis
Technical turnover document describing the functionality of the new COBOL programs and changes to the existing programs.
Development of new COBOL program which accepts the input from various programs and prepares the notice in Exstream format.
Preparing Unit test results document.
Working with upstream systems for testing needs.
Post implementation checkout.
Programming language: Cobol, JCL.