Hadoop Senior Developer - Lead
Deepan G
Mobile: 518-***-**** E-mail: acw4iu@r.postjobfree.com
Professional Summary:
Over 8+ years of professional IT experience which includes in Data ware housing, ETL, Big data, Hadoop ecosystem in Healthcare, Banking, Retail and Communication sectors.
4+ years of experience in development and deployment of Hadoop Ecosystems like HDFS, Map Reduce, Hive, Hbase, Cassandra, Phoenix, SQuirreL SQL Client, Pig, Sqoop Oozie, and Kafka.
1+ years of experience in development and deployment of Spark and Scala
Experience in loading data from Legacy Systems to Hadoop Distributed File System
Experience in loading data into Hive Tables, analyze, and transform the data by using UDF, SerDe, and Spark, and load data into Hive Target Tables.
Expertise in transform the data using Pig and load data into Hive Tables.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in managing and reviewing Hadoop log files.
Experience in troubleshooting errors in, Pig, Hive, HBase Shell and MapReduce.
Expertise in loading XML data and JSON data into Hive Tables.
Performance tuning of the Hadoop, Hive, Cassandra and Spark clusters.
Experience in handling different file formats like text files, Sequence files, Avro data files using different SerDe's in Hive
Good experience in file storage concepts of HDFS and optimization
Excellent understanding and knowledge of NOSQL databases in Hbase, Cassandra, MongoDB
Experience with Hortonworks HADOOP Distribution
Experience Managing Data ingestion and processing operations
Extensive experience in leading Data warehousing development and maintenance projects using ETL technology, and Performed project management activities like Estimations, Project Planning, Scheduling, Deployment, Tracking, Resource Management and Risk Management, Coordination of Team in onsite and offshore and Building strong experience in client relationship
Extensively worked on Data warehousing tools : Informatica Power Center 9.6.1, Informatica Power Exchange 9.6.1, Informatica IDQ 9.6.1, Oracle 11g, MS SQL, PLSQL, UNIX Shell scripting, IBM ClearQuest, Control-M and Autosys
Expertise in Advance knowledge in Informatica parallel processing and partitioning, Power Exchange data maps creation, Cobol copy book creation, Java Transformation, Web Service Consumer, XML Parser, XML target, Transaction Control, Normalizer, Data Masking, Data profiling, Golden Gate and Data Integration Hub
Worked on Informatica Big Data Parser to process the structured and semi-structured format data of HIPAA data
Sound Experience in Data Architect; Data Analysis, HLD and LLD, Data Modelling, Data Migration, Data Integration, Data deploy and manage data.
Proficient in Data warehousing concepts, Data modeling, Dimensional Star Schema and Snowflakes Schema methodologies, Implementing Slowly Changing Dimensions, Converting Legacy into Enterprise Environment
Worked on waterfall and agile methodologies.
Expertise in handling Performance Tuning of Informatica mapping and sessions, optimizing of SQL queries using SQL trace, SQL Plan, Oracle partitioning, join types and various indexes
Excellent experience in interpersonal and analytical skills, ability to work on multiple tasks, Quick learner, meets the stringent deadlines as well as fast-paced and demanding environment
Experience in Data Analysis, Data Profiling, Data Cleansing, Transformation, Consolidation, Integration, Data Import, Data Export using various source (Oracle, MS SQL Server, XML and Flat files) and target
Certification:
Big Data Fundamentals
Hadoop Fundamentals I
Accessing Hadoop Data Using Hive
Introduction to NoSQL and DBaaS
Introduction to Pig
Oracle Certified Associate, Java SE 8 Programmer
Achievements:
Dovetail Topper Award from –Cognizant Academy
BRAVO Associate of the Quarter from DWBI –Cognizant
Master Blaster Quarterly Winners Award from EIM – Cognizant
Have worked proposals on California state project “CA-MMIS PBM OS+” and Won the Project in Bidding
Trained and shared my Technical and Healthcare knowledge to many employees in Cognizant
Chronological Work Experience
Hadoop Senior Developer
Xerox Corp - Albany, New York. 01/15 - Present Project 1 - NY MMIS PBM Data Conversion
Xerox works with the state of New York to update its Medicaid claims processing system to a next-generation technology platform that will help managing Medicaid rolls. NY State generates huge volume of claims after CA State. Xerox implemented its PBM Health Enterprise solution, a flexible, adaptable and analytical Medicaid Management Information System (MMIS). PBM Conversion team converts the Legacy data into Enterprise PBM data to make the meets of HIPAA standards, PHI Data and enhance the business to global to meet the customer’s expectation.
Responsibilities:
Working with architect, business managers, Golden Gate DBA group to understand the requirement and source system in order to prepared design documents specifying the various Big Data approaches, pros and cons of the different approaches with suggestion of the best approach
Studies the client requirement and design, and prepares the estimation of the Project
Analyzing the client requirement, performing the feasibility study, performing the impact analysis, prepares the high and low level design documents, and prepares detailed technical design document
Load data from various data sources into HDFS
Experience in loading data from Legacy Systems( Mainframe) to Hadoop Distributed File System
Loading Mainframe data to Hive tables with static/dynamic partitions
Load Member, Provider, SA data into Hive Bucket Tables to analysis data with Functional Team
Loading data into Hive tables using UDF and SerDe
Loading data into Hive tables using Spark and Scala
Loading and transforming large sets of structured, semi structured and unstructured data
Install, configure, and operate Hadoop ecosystem Hadoop, Hive, Hbase, Zookeeper, Phoenix, SQuirreL SQL Client, Pig, Sqoop, Oozie
Delegate the work to Team Members and review work after Team Completed
Load data into Hive Tables, analyze, and transform the data by using UDF, SerDe and load data into Hive Target Tables.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive, Pig, and Sqoop
Export data from HDFS to Oracle using Sqoop for BI Reports
Developing Informatica mappings, workflows, preparation of unit test, unit test data and data validation for converting Legacy data.
Worked on Informatica Big Data Parser to process the structured and semi-structured format data of HIPAA data
Set up daily meeting with scrum master and keep the project in straight line to meet goal for each sprint
End to End ETL development and ETL to Hadoop for PBM Data Conversion Project
Meeting with Product owner along with scrum master and delivering status of each sprint
Interface with business users to understand business requirements
Assisted the new joiners to the team by explaining the project functional knowledge
Environment: Hadoop 2.x, Hive, Pig, MapReduce, Spark, Sqoop, Oozie, Informatica Power Center 9.6.1, Power Exchange 9.6.1, Oracle 11g, PL/SQL, UNIX Shell script, Control-M, Spark and Scala, Kafka
Hadoop Developer
Bank of America - Charlotte, NC 01/12–12/14
Project 2 – BOA Enterprise Data Management applications
The File Maintenance application is part of Enterprise Data Management applications of the bank. It is the central repository hub where it receives transactional data from various applications of the bank in various formats and histories of data is stored and maintained. Data is pulled as necessary from it for various audit compliance, Litigation requests, Strategic planning.
Responsibilities:
• Gathering data requirements and identifying sources for acquisition.
• Data discovery to translate and Map business rule attributes to low-level data elements.
• Create Sqoop jobs for importing the data from different application tables to hive tables.
• Develop Hive scripts for end user / analyst requirements to perform ad hoc analysis.
• Integrate Hive and HBase for storing the data in HBase.
• Writing Shell Scripts to load data after preprocessing the data
• Create UDFs for Hive for standardizing data wherever required.
• Unit testing MR components using MR unit.
• Create workflows and scheduling using workflow coordinator using Oozie.
• Coding and peer review of assigned task, Unit testing and Volume Testing and Bug fixing.
• Responsible for test case review for all components of the project
• Participate and contribute to estimations and Project Planning with team and Project Manager.
• Create deployment plan, run book and implementation checklist.
• Perform root cause analysis and providing a permanent fix to the problems identified.
• Involved in presenting induction to the new joiner's in the project.
• Ensure availability of document/code for review.
Environment:
Hadoop, Java, Hive, Sqoop, Spark SQL, Oozie, UNIX, MySQL, MapReduce, YARN, Kafka
Senior ETL Developer
Walmart - Bentonville, AR 01/11–12/11
Project 3 - Tax and Treasury project
Tax and Treasury project, which is part of Walmart financial process. This project is responsible to read the Walmart sales database and the credit card transactions. Then, it splits the transactions based on the card type and generates the settlement files for the respective providers. The project is critical as it deals with millions of dollars on a daily basis. The platform is built on Hadoop ecosystem with HDFS/HBase being the primary data storage.
Responsibilities:
Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
Extensively involved in Design phase and delivered Design documents.
Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
Importing and exporting data into HDFS and Hive using SQOOP.
Migration of 100+ TBs of data from different databases (Oracle, SQL Server) to Hadoop.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
Experienced in defining job flows.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Experienced in managing and reviewing the Hadoop log files.
Used Pig as ETL tool to do Transformations with joins and pre-aggregations before storing the data onto HDFS.
Responsible to develop data pipelines from different sources
Utilized Apache Hadoop environment by Cloudera Distribution.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Worked on Oozie workflow engine for job scheduling.
Involved in Unit testing and delivered Unit test plans and results documents.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Hue, Java,SQL, Oracle, Ambari, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse.
Senior ETL Developer
Xerox Corp - Atlanta, Georgia 09/09–12/10
Project 4 - Alaska Medicaid Management Information Systems
Alaska MMIS deals with various testing perform by Functional, SIT and UAT Team to meet the business criteria for the MMIS enterprise Environment. Alaska Conversion Team converts the Legacy data into the Enterprise System data from 8 various functional areas on ACS Phase 1 project which deals with Coding and Testing. After load the data into development environment, Conversion Team extracts the data from development environment and deployed the data into various testing environments with the help of database administrator.
Conversion team ensures the data between development environments and testing environments and also request to Functional and Testing team to validate the data. Functional and Testing team analyze and validate the data with their business and track the defects in the Clear Quest. Conversion team analyzes the defect and prepares the resolution to the concern defect. Additional requirements and requirements are something missed in the ACS Phase 1 will add in the CQ as Change Request (CR). Conversion Team analyzes CR, Coding, and Testing, and also prepares the resolution for CQ.
Responsibilities:
Studied the client requirement and design, and prepared project estimation
Performed the feasibility study, impact analysis, prepared the high and low level design documents, and detailed technical design document
Developed mappings and workflows, prepared of unit test case, unit test data and data validation
Validated legacy source files provided by the client prior to Job execution by using Informatica Power Exchange
Performed Data cleansing in the source and loaded into staging tables for each data conversion
Ownership of Provider Enrolment Portal (PEP) module – successful implementation of PEP Go Live
End to End development of the claims system of Medical Claims, Pharmacy Claims, 837 and 835 Claims
Before data conversion, identified the risks in the various environments; defines the right probability, severity and impact for the risk and shares with management levels
Completed migration of ETL Informatica code from 8.5.1 to 9.0.1 version in the UNIX background.
Performed impact analysis of change requests based on revised requirements and implemented the CR
Identified the Risks, Mitigation and Contingency plan with target dates
Monitors and Revisits the risk on a weekly basis and categorizes it based on the severity
Prepared the balancing report that helps in identifying the area where records are growing or falling according to the Business requirements, thereby assisting the client to substantiate the data and to amend the tangible requirement
Responsible for performing the Unit Testing and Integration Testing after implementation of Change Requests
Followed up with the respective POC’s for the environment issues/defects and for the resolution of the defects
Performed the pre-conversion activities as well as the data conversion runs for all the functional areas
Responsible for the data extract process and delivered the data to SIT and FIT testing environments in the client level
Prepared the Oracle SQL queries and implements the changes in Development environment
Data validated in Development environment and requested SME to validate the data in Development environment
Prepared the data release document that includes data extract from development environment, prepared Oracle SQL queries that will modify the production data and created a request in Alaska Release Management Portal to deploy the data in Production environment
Successful implementation of Alaska MMIS Data Conversion - Go Live
Prepared balancing report for Go-Live data across various functional areas
Prepared the Go-Live documents and shares with onsite team, and also maintained documents for other states in USA for future reference
Environment: Informatica Power Center 9.0.1/9.1.0, Power Exchange 8.6.1/9.0.1, Oracle 11g, PL/SQL, UNIX Shell script, Control-M,
ETL Developer
WellPoint - Worthington, Ohio 04/08–08/09
Project 5 - BH and Life and Disability Development
The Operational Data Store (ODS) is a central data store for all Life and Disability (L&D) data. Data from various functional areas such as membership and billing, claims, sales, quote tracking etc. are being loaded into the ODS. This data can be used by all L&D business functions for their non-transactional and decision support reporting needs. Developed ETL code and loaded the data to ODS from the above mentioned source system.
Responsibilities:
Created design specification documents, developed functional and technical specification documents
Used Informatica Power Center to create mappings, sessions and workflows for populating the data into the dimension, facts, and lookup tables simultaneously from different source systems
Used various transformations of Informatica, such as Source Qualifier, Expression, Look-up transformation, Update Strategy, Filter, Rank, Normalizer, Router transformation and Joiner.
Peer review of ETL coding, prepared Unit Test Case and Unit Test Data
Unit testing and defect resolutions, Resolved issues/bugs raised by QA team
Involved in Informatica administrator team in installation of Power Center Server, Power Center Client, Creating and Configuring Services, Power Center Repository Service administration, Managing domain folders, users, permissions and logs, Repository Management, Repository Security and Folder Setup.
Extensively transformed the existing PL/SQL scripts into stored procedures to be used by Informatica mappings with the help of Stored Procedure Transformations.
Analyzed Session Log files in case the session failed to resolve errors in mapping or session configurations.
Used debugger to test the mapping and fixed the bugs.
Created effective test cases and performed unit testing for the respective ETL mappings in Informatica.
Performed Integration testing to ensure the successful execution of data load process.
Monitored batch jobs running in production and UAT regions with workflow
Worked in incident tickets raised by the users in Remedy (a Tool to raise or view the Tickets)
Performed problem analysis, debugging and coding as part of new development
Participated in Systems Testing (SIT), User Acceptance Testing (UAT), data analysis and troubleshooting for existing products
Environment: Informatica Power Center 8.1/8.6.1, Teradata, TPT, MLOAD, UNIX Shell Scripting, Windows XP