Sign in

Data Project

Albany, New York, United States
October 18, 2016

Contact this candidate


Hadoop Senior Developer - Lead

Deepan G

Mobile: 518-***-**** E-mail:

Professional Summary:

Over 8+ years of professional IT experience which includes in Data ware housing, ETL, Big data, Hadoop ecosystem in Healthcare, Banking, Retail and Communication sectors.

4+ years of experience in development and deployment of Hadoop Ecosystems like HDFS, Map Reduce, Hive, Hbase, Cassandra, Phoenix, SQuirreL SQL Client, Pig, Sqoop Oozie, and Kafka.

1+ years of experience in development and deployment of Spark and Scala

Experience in loading data from Legacy Systems to Hadoop Distributed File System

Experience in loading data into Hive Tables, analyze, and transform the data by using UDF, SerDe, and Spark, and load data into Hive Target Tables.

Expertise in transform the data using Pig and load data into Hive Tables.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

Experience in managing and reviewing Hadoop log files.

Experience in troubleshooting errors in, Pig, Hive, HBase Shell and MapReduce.

Expertise in loading XML data and JSON data into Hive Tables.

Performance tuning of the Hadoop, Hive, Cassandra and Spark clusters.

Experience in handling different file formats like text files, Sequence files, Avro data files using different SerDe's in Hive

Good experience in file storage concepts of HDFS and optimization

Excellent understanding and knowledge of NOSQL databases in Hbase, Cassandra, MongoDB

Experience with Hortonworks HADOOP Distribution

Experience Managing Data ingestion and processing operations

Extensive experience in leading Data warehousing development and maintenance projects using ETL technology, and Performed project management activities like Estimations, Project Planning, Scheduling, Deployment, Tracking, Resource Management and Risk Management, Coordination of Team in onsite and offshore and Building strong experience in client relationship

Extensively worked on Data warehousing tools : Informatica Power Center 9.6.1, Informatica Power Exchange 9.6.1, Informatica IDQ 9.6.1, Oracle 11g, MS SQL, PLSQL, UNIX Shell scripting, IBM ClearQuest, Control-M and Autosys

Expertise in Advance knowledge in Informatica parallel processing and partitioning, Power Exchange data maps creation, Cobol copy book creation, Java Transformation, Web Service Consumer, XML Parser, XML target, Transaction Control, Normalizer, Data Masking, Data profiling, Golden Gate and Data Integration Hub

Worked on Informatica Big Data Parser to process the structured and semi-structured format data of HIPAA data

Sound Experience in Data Architect; Data Analysis, HLD and LLD, Data Modelling, Data Migration, Data Integration, Data deploy and manage data.

Proficient in Data warehousing concepts, Data modeling, Dimensional Star Schema and Snowflakes Schema methodologies, Implementing Slowly Changing Dimensions, Converting Legacy into Enterprise Environment

Worked on waterfall and agile methodologies.

Expertise in handling Performance Tuning of Informatica mapping and sessions, optimizing of SQL queries using SQL trace, SQL Plan, Oracle partitioning, join types and various indexes

Excellent experience in interpersonal and analytical skills, ability to work on multiple tasks, Quick learner, meets the stringent deadlines as well as fast-paced and demanding environment

Experience in Data Analysis, Data Profiling, Data Cleansing, Transformation, Consolidation, Integration, Data Import, Data Export using various source (Oracle, MS SQL Server, XML and Flat files) and target


Big Data Fundamentals

Hadoop Fundamentals I

Accessing Hadoop Data Using Hive

Introduction to NoSQL and DBaaS

Introduction to Pig

Oracle Certified Associate, Java SE 8 Programmer


Dovetail Topper Award from –Cognizant Academy

BRAVO Associate of the Quarter from DWBI –Cognizant

Master Blaster Quarterly Winners Award from EIM – Cognizant

Have worked proposals on California state project “CA-MMIS PBM OS+” and Won the Project in Bidding

Trained and shared my Technical and Healthcare knowledge to many employees in Cognizant

Chronological Work Experience

Hadoop Senior Developer

Xerox Corp - Albany, New York. 01/15 - Present Project 1 - NY MMIS PBM Data Conversion

Xerox works with the state of New York to update its Medicaid claims processing system to a next-generation technology platform that will help managing Medicaid rolls. NY State generates huge volume of claims after CA State. Xerox implemented its PBM Health Enterprise solution, a flexible, adaptable and analytical Medicaid Management Information System (MMIS). PBM Conversion team converts the Legacy data into Enterprise PBM data to make the meets of HIPAA standards, PHI Data and enhance the business to global to meet the customer’s expectation.


Working with architect, business managers, Golden Gate DBA group to understand the requirement and source system in order to prepared design documents specifying the various Big Data approaches, pros and cons of the different approaches with suggestion of the best approach

Studies the client requirement and design, and prepares the estimation of the Project

Analyzing the client requirement, performing the feasibility study, performing the impact analysis, prepares the high and low level design documents, and prepares detailed technical design document

Load data from various data sources into HDFS

Experience in loading data from Legacy Systems( Mainframe) to Hadoop Distributed File System

Loading Mainframe data to Hive tables with static/dynamic partitions

Load Member, Provider, SA data into Hive Bucket Tables to analysis data with Functional Team

Loading data into Hive tables using UDF and SerDe

Loading data into Hive tables using Spark and Scala

Loading and transforming large sets of structured, semi structured and unstructured data

Install, configure, and operate Hadoop ecosystem Hadoop, Hive, Hbase, Zookeeper, Phoenix, SQuirreL SQL Client, Pig, Sqoop, Oozie

Delegate the work to Team Members and review work after Team Completed

Load data into Hive Tables, analyze, and transform the data by using UDF, SerDe and load data into Hive Target Tables.

Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive, Pig, and Sqoop

Export data from HDFS to Oracle using Sqoop for BI Reports

Developing Informatica mappings, workflows, preparation of unit test, unit test data and data validation for converting Legacy data.

Worked on Informatica Big Data Parser to process the structured and semi-structured format data of HIPAA data

Set up daily meeting with scrum master and keep the project in straight line to meet goal for each sprint

End to End ETL development and ETL to Hadoop for PBM Data Conversion Project

Meeting with Product owner along with scrum master and delivering status of each sprint

Interface with business users to understand business requirements

Assisted the new joiners to the team by explaining the project functional knowledge

Environment: Hadoop 2.x, Hive, Pig, MapReduce, Spark, Sqoop, Oozie, Informatica Power Center 9.6.1, Power Exchange 9.6.1, Oracle 11g, PL/SQL, UNIX Shell script, Control-M, Spark and Scala, Kafka

Hadoop Developer

Bank of America - Charlotte, NC 01/12–12/14

Project 2 – BOA Enterprise Data Management applications

The File Maintenance application is part of Enterprise Data Management applications of the bank. It is the central repository hub where it receives transactional data from various applications of the bank in various formats and histories of data is stored and maintained. Data is pulled as necessary from it for various audit compliance, Litigation requests, Strategic planning.


• Gathering data requirements and identifying sources for acquisition.

• Data discovery to translate and Map business rule attributes to low-level data elements.

• Create Sqoop jobs for importing the data from different application tables to hive tables.

• Develop Hive scripts for end user / analyst requirements to perform ad hoc analysis.

• Integrate Hive and HBase for storing the data in HBase.

• Writing Shell Scripts to load data after preprocessing the data

• Create UDFs for Hive for standardizing data wherever required.

• Unit testing MR components using MR unit.

• Create workflows and scheduling using workflow coordinator using Oozie.

• Coding and peer review of assigned task, Unit testing and Volume Testing and Bug fixing.

• Responsible for test case review for all components of the project

• Participate and contribute to estimations and Project Planning with team and Project Manager.

• Create deployment plan, run book and implementation checklist.

• Perform root cause analysis and providing a permanent fix to the problems identified.

• Involved in presenting induction to the new joiner's in the project.

• Ensure availability of document/code for review.


Hadoop, Java, Hive, Sqoop, Spark SQL, Oozie, UNIX, MySQL, MapReduce, YARN, Kafka

Senior ETL Developer

Walmart - Bentonville, AR 01/11–12/11

Project 3 - Tax and Treasury project

Tax and Treasury project, which is part of Walmart financial process. This project is responsible to read the Walmart sales database and the credit card transactions. Then, it splits the transactions based on the card type and generates the settlement files for the respective providers. The project is critical as it deals with millions of dollars on a daily basis. The platform is built on Hadoop ecosystem with HDFS/HBase being the primary data storage.


Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.

Extensively involved in Design phase and delivered Design documents.

Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.

Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.

Importing and exporting data into HDFS and Hive using SQOOP.

Migration of 100+ TBs of data from different databases (Oracle, SQL Server) to Hadoop.

Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.

Experienced in defining job flows.

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Experienced in managing and reviewing the Hadoop log files.

Used Pig as ETL tool to do Transformations with joins and pre-aggregations before storing the data onto HDFS.

Responsible to develop data pipelines from different sources

Utilized Apache Hadoop environment by Cloudera Distribution.

Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Worked on Oozie workflow engine for job scheduling.

Involved in Unit testing and delivered Unit test plans and results documents.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Hue, Java,SQL, Oracle, Ambari, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse.

Senior ETL Developer

Xerox Corp - Atlanta, Georgia 09/09–12/10

Project 4 - Alaska Medicaid Management Information Systems

Alaska MMIS deals with various testing perform by Functional, SIT and UAT Team to meet the business criteria for the MMIS enterprise Environment. Alaska Conversion Team converts the Legacy data into the Enterprise System data from 8 various functional areas on ACS Phase 1 project which deals with Coding and Testing. After load the data into development environment, Conversion Team extracts the data from development environment and deployed the data into various testing environments with the help of database administrator.

Conversion team ensures the data between development environments and testing environments and also request to Functional and Testing team to validate the data. Functional and Testing team analyze and validate the data with their business and track the defects in the Clear Quest. Conversion team analyzes the defect and prepares the resolution to the concern defect. Additional requirements and requirements are something missed in the ACS Phase 1 will add in the CQ as Change Request (CR). Conversion Team analyzes CR, Coding, and Testing, and also prepares the resolution for CQ.


Studied the client requirement and design, and prepared project estimation

Performed the feasibility study, impact analysis, prepared the high and low level design documents, and detailed technical design document

Developed mappings and workflows, prepared of unit test case, unit test data and data validation

Validated legacy source files provided by the client prior to Job execution by using Informatica Power Exchange

Performed Data cleansing in the source and loaded into staging tables for each data conversion

Ownership of Provider Enrolment Portal (PEP) module – successful implementation of PEP Go Live

End to End development of the claims system of Medical Claims, Pharmacy Claims, 837 and 835 Claims

Before data conversion, identified the risks in the various environments; defines the right probability, severity and impact for the risk and shares with management levels

Completed migration of ETL Informatica code from 8.5.1 to 9.0.1 version in the UNIX background.

Performed impact analysis of change requests based on revised requirements and implemented the CR

Identified the Risks, Mitigation and Contingency plan with target dates

Monitors and Revisits the risk on a weekly basis and categorizes it based on the severity

Prepared the balancing report that helps in identifying the area where records are growing or falling according to the Business requirements, thereby assisting the client to substantiate the data and to amend the tangible requirement

Responsible for performing the Unit Testing and Integration Testing after implementation of Change Requests

Followed up with the respective POC’s for the environment issues/defects and for the resolution of the defects

Performed the pre-conversion activities as well as the data conversion runs for all the functional areas

Responsible for the data extract process and delivered the data to SIT and FIT testing environments in the client level

Prepared the Oracle SQL queries and implements the changes in Development environment

Data validated in Development environment and requested SME to validate the data in Development environment

Prepared the data release document that includes data extract from development environment, prepared Oracle SQL queries that will modify the production data and created a request in Alaska Release Management Portal to deploy the data in Production environment

Successful implementation of Alaska MMIS Data Conversion - Go Live

Prepared balancing report for Go-Live data across various functional areas

Prepared the Go-Live documents and shares with onsite team, and also maintained documents for other states in USA for future reference

Environment: Informatica Power Center 9.0.1/9.1.0, Power Exchange 8.6.1/9.0.1, Oracle 11g, PL/SQL, UNIX Shell script, Control-M,

ETL Developer

WellPoint - Worthington, Ohio 04/08–08/09

Project 5 - BH and Life and Disability Development

The Operational Data Store (ODS) is a central data store for all Life and Disability (L&D) data. Data from various functional areas such as membership and billing, claims, sales, quote tracking etc. are being loaded into the ODS. This data can be used by all L&D business functions for their non-transactional and decision support reporting needs. Developed ETL code and loaded the data to ODS from the above mentioned source system.


Created design specification documents, developed functional and technical specification documents

Used Informatica Power Center to create mappings, sessions and workflows for populating the data into the dimension, facts, and lookup tables simultaneously from different source systems

Used various transformations of Informatica, such as Source Qualifier, Expression, Look-up transformation, Update Strategy, Filter, Rank, Normalizer, Router transformation and Joiner.

Peer review of ETL coding, prepared Unit Test Case and Unit Test Data

Unit testing and defect resolutions, Resolved issues/bugs raised by QA team

Involved in Informatica administrator team in installation of Power Center Server, Power Center Client, Creating and Configuring Services, Power Center Repository Service administration, Managing domain folders, users, permissions and logs, Repository Management, Repository Security and Folder Setup.

Extensively transformed the existing PL/SQL scripts into stored procedures to be used by Informatica mappings with the help of Stored Procedure Transformations.

Analyzed Session Log files in case the session failed to resolve errors in mapping or session configurations.

Used debugger to test the mapping and fixed the bugs.

Created effective test cases and performed unit testing for the respective ETL mappings in Informatica.

Performed Integration testing to ensure the successful execution of data load process.

Monitored batch jobs running in production and UAT regions with workflow

Worked in incident tickets raised by the users in Remedy (a Tool to raise or view the Tickets)

Performed problem analysis, debugging and coding as part of new development

Participated in Systems Testing (SIT), User Acceptance Testing (UAT), data analysis and troubleshooting for existing products

Environment: Informatica Power Center 8.1/8.6.1, Teradata, TPT, MLOAD, UNIX Shell Scripting, Windows XP

Contact this candidate