Resume

Senior Hadoop Developer

Location:

Charlotte, NC, 28262

Posted:

February 16, 2017

Contact this candidate

Resume:

Naveen Kumar

SUMMARY

* ***** ** ** **********, including 3 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, SPARK, SCALA, Sentry navigation, Algorithms and Big Data Analytics.

6 years of experience in Database Architecture, Administration, System Analysis, Design, Development and Support of MS SQL Server, MSBI ETL tools, Core Java, JSP, Servlets, JavaScript, XML, JQuery, Python and Scala scripting.

Worked extensively on Database programming, Database Architecture, Hadoop.

Having 3 years of hands on experience working with HDFS, MapReduce framework and Hadoop ecosystem like Hive, HBase, Sqoop, and Oozie.

Good understanding of Hadoop Architecture and underlying Hadoop framework including Storage Management.

Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig and Flume.

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.

Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.

Worked on backend using Scala and Spark to perform several aggregation logics.

Exposed in working with SPARK data frames and optimized the SLA’s.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

Hands on experience in Linux Shell Scripting. Worked with Big Data distributions Cloudera (CDH 5.8.3).

Hands on experience in Micro strategy and Tableau to generate Hadoop data report.

Worked more into Production deployment on every month end Hadoop release items.

Responsible for hadoop production support to run the Hadoop autosys job’s and validate the data and communicate to business.

Worked on installing Autosys JIL files and configure the autosys job’s to schedule hadoop taks.

Involved in creating POCs to ingest and process streaming data using Spark and HDFS.

Increased system scalability by implementing various Algorithms to consume large data.

Expert in SQL Server RDBMS and have worked extensively on PL/SQL.

Experience in Data Loading, ETL process and Supported Cube Processing in SSAS and Transferred data to Data Warehouse Environment (ETL) using SSIS.

Expert in writing complicated SQL Queries and database analysis for good performance.

Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS),

Multithreading in Core Java, J2EE, Web Services (REST, SOAP), JDBC, Java Script and JQuery.

Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

Expert in Building, Deploying and Maintaining the Applications.

Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.

Experience in Scrum, Agile and Waterfall models.

Worked on 24*7 environments to provide production support.

Co-ordinate with offshore team and cross-functional teams to ensure that application is properly tested, configured and deployed.

Excellent analytical, Interpersonal and Communication skills, fast learner, hardworking and good team player.

Technical Skills

Big Data Technologies: MapReduce, Pig, Hive, Sqoop, Hbase, Flume, Cassandra, Spark, Yarn, Oozie, Cloudera CDH5.X

Databases: SQL Server, Netezza, Oracle, Teradata, HBase.

Operating Systems: Red-Hat Linux, Ubuntu and Windows, Unix and Linux.

Web Development: HTML, JSP, JavaScript, JQuery, CSS.

Database Tools: SQL Server Enterprise Manager, SQL Query Analyzer, SQL Profiler, SQL Alerts, Index Tuning Wizard, BCP, DTS, Import/Export Wizard, SQL Mail, SQL Server Management Studio, SSIS 2012/2008, SSAS 2012/2008, SSRS 2012/2008.

ETL Tools: SSIS 2008/2010/2012.

Reporting Tools: Micro strategy, Tableau, SSRS, Crystal Reports.

Programming Languages: C, C++, Java, C#.Net, Python, Scala.

Version Controls: TFS, SVN

Web Servers: Tomcat, IIS

Query Languages: HiveQL, Spark SQL, Pig, SQL, PL/SQL.

Operating Systems: Linux (Ubuntu), Windows.

Education: Bachelor’s in Computer science, 2008 INDIA.

Professional Experience:

Bank of America, Charlotte - NC

August 2015 to Till Date

Sr. Hadoop/Spark Developer.

Project Tile: ASPEN

Project Description: ASPEN is the Enterprise Credit Risk platform for Top of House reporting. This platform enables end users to comply with Regulatory Disclosures and Accounting Policy Standards and reconciles to General Ledger across Credit Risk portfolios. Both commercial and consumer portfolios are contained within ASPEN for a integrated view of Risk system of records and Finance General Ledger.

Responsibilities:

-Collaborate with the Internal/Client BA’s in understanding the requirement and architect a data flow system.

-Developed complete end to end Bigdata processing in hadoop echo system.

-Optimized hive scripts to use HDFS efficient by using various compression mechanisms.

-Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.

-Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

-Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.

-Load the data into Spark RDD and performed in-memory data computation to generate the output response.

-Migrated complex Map reduce programs, Hive scripts into Spark RDD transformations and actions.

-Writing UDF/Mapreduce jobs depending on the specific requirement.

-Created Java algorithms to find the mortgage risk factor and credit risk factors.

-Created algorithms for all complex Map and reduce functionalities of all Mapreduce programs.

-Testing all the month end changes in DEV, SIT and UAT environments and getting the business approvals to perform the same in Production.

-Successfully migrated hadoop cluster of 120 edge nodes to other shared cluster (HaaS – Hadoop as a service) and setup the environments (DEV, SIT and UAT) from scratch.

-Worked in writing shell scripts to schedule the hadoop process in Autosys by creating JIL files.

-Worked in writing SPARK sql scripts for optimizing the query performance.

-Converted all the VAP processing from Netezza and implemented by using SPARK dataframes and RDD’s.

-Extensively worked in code reviews and code remediations to meet the coding standards.

-Written sqoop scripts to import and export data in various RDBMS systems.

-Written PIG scripts to process unstructured data and available to process in Hive.

-Created hive schemas using performance techniques like partitioning and bucketing.

-Used SFTP to transfer and receive the files from various upstream and downstream systems.

-Configured Unix service id’s and AD groups in all the environments(DEV, SIT, UAT and PROD) to access the resources based on the AD groups.

-Developed Oozie workflow jobs to execute hive, pig, sqoop and mapreduce actions.

-Developed Autosys jil scripts for defining, scheduling and monitoring jobs(Unix shell scripts)

-Involved in complete end to end code deployment process in Production.

-Prepared automated script to deploy every month end code changes in all the environments.

-Worked on all the CDH upgrades and did the regression testings.

-Worked in exporting data from Hive tables into Netezza database.

-Implemented all VAP processing’s in hive tables.

-Worked with Hadoop administration team for configuring servers at the time of cluster migration.

-Responsible to business and clients on every month job schedules and change requirements to validate the data.

-Responsible for all the SLA meet times to make sure the Hadoop job’s run in time.

-Co-ordinate with offshore team to explain the business requirements and prepare the code changes for every month end releases.

Environment: CDH 5.8.3, HDFS, SPARK, Pig, Hive, Beeline, Sqoop, Map Reduce, Oozie, Putty, HaaS(Hadoop as a Service), Java 6/7, Netezza, SQL Server 2012, Sub Version, Toad, Teradata, Oracle 10g, YARN, UNIX Shell Scripting, Autosys, Agile Methodology, JIRA, Version One.

Altria Client Services at Richmond, VA

February 2013 to July 2015

Hadoop Developer

Project Tile: Master Data Management (MDM)

Project Description: Master Data Management is a system of getting huge feed files data from 5 different sources and processes to improve the consistency and the quality of Customer Master Data. In this process will integrate, cleanse customer master data and make resulting “certified” records available to Altria applicationsand stakeholders.

This process contains Address verification and Matching are the major achievements. Address verification process will collect all the data from the source and cleanse it and integrate the verified addresses back to the data store.

In matching process huge data will be collected from the sources and process it and create a master records based on the child’s data for a set of child records and integrate into the data store.

This whole master data will be used by the entire stake holders and other sales applications.

Responsibilities:

-Involved in requirement analysis, design, coding and implementation.

-Processed data into HDFS by developing solutions, analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.

-Written the Apache PIG scripts to process the HDFS data.

-Used Sqoop to import the data from Hadoop Distributed File System (HDFS) to RDBMS.

-Established custom Map Reduces programs in order to analyze data and used Pig Latin to clean

unwanted data.

-Created algorithms on Address cleansing and Address matching count factors.

-Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.

-Created Hive tables to store the processed results in a tabular format.

-Created external tables in Hive.

-Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.

-Developing Scripts and Auto sys Jobs to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.

-Used different file formats like Text files, CSV files, excel files and JSON files.

-Generated reports using Tableau.

Environment: HDFS,Hadoop, Pig, Hive, Sqoop, Map Reduce, Oozie, Java 6/7, SQL Server 2012, Sub Version, Toad, Teradata, Oracle 10g, YARN, UNIX Shell Scripting, SOAP, REST services, Agile Methodology.

Client: Altria Client Services

June 2012 to Feb 2013

Hadoop Developer

Project Tile: SalesEDGE Reporting

Project Description: SalesEDGE is a multi-project program to build a new sustainable platform for applications utilized by Sales and Marketing. This project aims at Deliver an integrated multi-category point of sale management and ordering solution capable of sustaining leadership in the retail tobacco space, adaptable to diverse changing business dynamics. This application will take the scanned data daily like some un-structured feed files and do some business rules upon it and make it some structured data and import to RDBMS. This process will go on schedule base.

Responsibilities:

-Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Map Reduce, Hive, Pig and Sqoop.

-Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond

accordingly to any warning or failure conditions.

-Managing and scheduling Jobs on a Hadoop cluster.

-Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS

-Developed a Map reduce program to make the structured data.

-Written Hive queries for data analysis to meet the business requirements.

-Created Hive tables and worked on them using Hive QL.

-Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

-Managed and reviewed Hadoop log files.

-Scheduled the Mapreduce and Pig jobs using Oozie work flow.

-Created Hive tables and worked on them using Hive QL.

-Created Dimensions and Measures in Tableau.

-Created Analyzed reports using Tableau Desktop and Tableau server.

Environment: HDFS,Hadoop, Pig, Hive, Sqoop, Map Reduce, Oozie, Java 6/7, SQL Server 2012, Tableau Server,UNIX Shell Scripting, Agile Methodology.

Altria Client Services India (Off-shore)

October 2010 to June 2012

Sr. SQL Server / BI Developer

Project Tile: Wholesale Payments

Project Description: Wholesale Payments is an integrated system responsible for calculating all Distributor’ s Incentive amount against certain performance components based on the Trade Program Contracts. A Trade Program Contract has certain contracts that in turn contain performance components and metrics against which a Distributor’ s performance would be evaluated and an invoice containing the incentive amount would be generated and sent to a Wholesaler. System process the feed file received from Tera Data on Weekly/Monthly/Quarterly basis. A Wholesaler would be issued Non-Performance if it does not meet the constraints as per the business requirement for that Trade program Contract Component. This system is capable of adapting to diverse changing business dynamics.

Responsibilities

-Involved extensively in Requirement elicitation and analysis.

-Created Functional specifications documents based on the requirements

-Extensively worked on the stored procedures to migrate the legacy data on to the ware house accommodating various business transformations.

-Designed T-SQL scripts to identify long running queries and blocking sessions

-Data migration (import & export – BCP) from Text to SQL Server

-Developed Backup and Restore scripts for SQL Server 2000

-Installed and Configured SQL Server 2005 in Test Environment with latest service packs

-Involved in Unit, Functional and Integration testing process

-Migrated databases from SQL Server 2000 to 2005

-Created highly complex SSIS packages using various Data transformations like conditional split, Fuzzy Look-up, For Each Loop Container, Multi Cast, column conversion, Fuzzy Grouping, Script Components, incremental loading and slowly changing dimensions transformations

-Created SSIS packages for file transfer from one database to other using FTP task and also created SSIS packages for extracting data From Excel, flat files, MS access to SQL 2005 using SSIS, BCP and Bulk Insert

-Created SSIS templates for developing SSIS packages in such a way that they can be dynamically deployed in to development, testing and production environments

-Created logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.

-Responsible for Deploying, Scheduling Jobs, Alerting and Maintaining SSIS packages

-Assisted development team in deploying and testing the application, which uses SQL Server as a database.

Environment: SQL Server 2010, Business Intelligence Studio (BIDS), SSIS, SSRS.

Farmers, India

October 2008 to September 2010

SQL and JAVA Developer

Project Tile: Farmers Agent

Project Description:

“Farmers Agent” is CMS based web application which was developed for Farmers Insurance Group of

Companies which is the US’s 2nd largest insurance company. This application was developed to provide all the resources, information, and activities needed to improve the Farmers business processes. In this application we are providing an individual web site for every farmer’s agent, based on agent plan type UI & Privileges of the website will be changed.

Responsibilities

-Involved extensively in Requirement elicitation and analysis.

-Creating the SSIS Packages and Stored procedures.

-Worked extensively in Performance tuning and Query optimization.

-Co-Ordinate with the offshore team.

-Involved in Client business Meetings.

-Investigating, analyzing and documenting the reported defects.

-Implemented stored procedures, views, synonyms and functions.

-Creating, documenting and performing unit-test plans to ensure the Quality of the product.

-Played a key role in preparing LLD and Functional Specification documents.

Environment: SQL Server 2008, Business Intelligence Studio (BIDS), SSIS, Windows XP, JAVA, JQUERY, Java Script, JSP, Servlets.

Contact this candidate