Post Job Free

Resume

Sign in

Hadoop Administrator

Location:
Plano, TX
Salary:
140K
Posted:
July 22, 2016

Contact this candidate

Resume:

Kancharla Venkatesh Cloudera Hadoop Administrator

Plano, Texas

Phone: 469-***-**** acvtp6@r.postjobfree.com

Summary

Over 9+ years of Demonstrated experience in Retail and Telecom industry, as an ETL Informatica, UNIX, Linux and Hadoop Admin including 3 years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop eco-system components in the existing cluster.

Involved in various stages of Software Development Life Cycle (SDLC) such as, Requirement analysis, Design, Development, Testing, Implementation and Maintenance.

Expert in responsibilities associated with Data such as Data analysis, Data validation, Data modeling and Data cleaning.

Hands-on experience in analyzing business and data, such as gathering user requirements, developing technical specifications and creating data models.

Adept in creating and maintaining documentation related to Technical design and specifications, Business rules, Data mappings, ETL processes and Testing.

Vast knowledge in Business Intelligence (BI), Data warehousing, Dimensional modeling and Design methodologies and Hybrid concepts.

Proficiency of ETL development in INFORMATICA Power Center (Admin, Designer, Workflow Manager, Workflow Monitor, Repository Manager, Metadata Manager) for Extracting, Cleaning, Managing, Transforming and Loading data.

Good knowledge in Unix shell scripting and application maintenance on all Unix

Having knowledge on all Hadoop eco system components like map reduce using python, PIG, HIVE, HBASE, and SQOOP.

Trained by MapR on HBASE, HIVE and HDFS ecosystems.

Experience on Hadoop Administration, responsibilities include software installation, configuration, software upgrades, backup and recovery, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster up and run on healthy.

Experience in managing the Hadoop infrastructure with Cloudera Manager

Strong Experience in Installation and configuration of cloudera distribution Hadoop CDH 5

Having hands on experience in using Hadoop Technologies such as HDFS, HIVE, SQOOP, Impala, Flume and Solr.

Having hands on experience in writing Map Reduce jobs in Hive, Pig.

Having experience on importing and exporting data from different systems to Hadoop file system using SQOOP.

Having experience on creating databases, tables and views in HIVEQL, IMPALA and PIG LATIN.

Strong knowledge of Hadoop and Hive’s analytical functions.

Having good experience on all flavors of Hadoop (Cloudera, MapR etc.)

Good Knowledge at Information Retrieval Concepts.

Accomplished in optimizing performance issues and adding complex business rules by creating re-usable transformations and robust Mappings/Mapplets.

Immense experience working with RDBMS, such as Oracle, MS SQL, etc. and other non-relational data sources such as flat files and XMLs.

Proficient in SQL and PL/SQL especially writing complex queries and procedures.

Well versed in working-with and developing Reporting Applications in OBIEE.

Skilled in designing UML diagrams using tools such as MS Visio for Entity Relationship modeling, component diagrams, class diagrams and flowcharts.

Vast practice in preparing and working-with Test scenarios, Test cases and Debugging. Experienced in developing definitions and processes for test phases including unit test, product test, integration test and system test.

Leading a team of 10 members at offshore, managing day-to-day planning, operations and problem-solving to meet level of service and production.

Technical Skills

Languages SQL, PL/SQL and UNIX shell scripting

Operating systems UNIX/LINUX, Windows

Database DB2, Oracle and Teradata

Tools DB2 CLP, Teradata SQL Assistant, TOAD, SQL*Navigator, BMC Remedy

Special Software INFORMATICA Power Center 9.1 and Oracle Business Intelligence Enterprise Edition 11G (OBIEE).

Big Data Technologies Hadoop, HDFS, HBase, Sqoop, Pig, Hive, Flume, Solr

Products DCM (Teradata), ReSA (Oracle)

Certifications:

IBM DB2 Certified Associate

IBM Certified Database Administrator - DB2 9.7 for Linux, Unix and Windows

Professional Experience

Client: AT&T, Plano, TX March 2014 to Till Date

Role: Senior Hadoop Administrator

AT&T is an American multinational telecommunication corporation. It is the largest provider for both mobile and landline telephone service, and also provides broadband subscription television services. Being one of the largest telecommunication providers AT&T has huge customer data that can be analyzed and taken advantage of. To consumer marketing professionals, data about the users of mobile network are highly valuable so that the US-based network operator is turning access to and collaboration on its data into a new business service. In order to ensure secure data sharing and at the same time easing access and use of data, good management of data is required which involves data aggregation from multiple sources. AT&T has created programmable interfaces to each of its data sets that ensure read-only access to the data.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, SQL, Cloudera Manager, Sqoop, Flume, Oozie

Responsibilities:

Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, HBASE, ZOOKEEPER) using Cloudera Manager.

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.

Experience in benchmarking, performing backup and recovery of NameNode metadata and data residing in the cluster.

Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster

Involved in the installation of CDH5 and up-gradation from CDH4 to CDH5.

Installed Oozieworkflow engine to run multiple Hive and Pig Jobs.

Strong knowledge in configuring Name Node High Availability.

Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.

Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.

Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.

Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop

Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.

Experience in configuring Zookeeper to provide high availability and Cluster services coordination.

Loading logs from multiple sources directly into HDFS using tools like Flume.

Experienced in importing and exporting data from a relational database to HDFS using Apache Sqoop.

Worked on Disaster Management with Hadoop Cluster.

Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.

Experience in deploying and managing the multi-node development, testing and production

Rack aware configuration for quick availability and processing of data

Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing

Good knowledge about Map-Reduce Framework which includes MR daemons, sorting and shuffle phase, task execution.

Worked with Puppet for automated deployments.

Good experience in troubleshoot production level issues in the cluster and its functionality.

Client: AT&T, Plano, TX Mar 2012 to Feb 2014

Role : Hadoop Developer

Description: AT&T is an American multinational telecommunication corporation. It is the largest provider for both mobile and landline telephone service, and also provides broadband subscription television services. Being one of the largest telecommunication providers AT&T has huge customer data that can be analyzed and taken advantage of. To consumer marketing professionals, data about the users of mobile network are highly valuable so that the US-based network operator is turning access to and collaboration on its data into a new business service. In order to ensure secure data sharing and at the same time easing access and use of data, good management of data is required which involves data aggregation from multiple sources. AT&T has created programmable interfaces to each of its data sets that ensure read-only access to the data.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Eclipse

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop

Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster

Setup and benchmarked Hadoop/HBase clusters for internal use

Developed Simple to complex Map/reduce Jobs using Hive and Pig

Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms

Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop

Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior

Used UDF’s to implement business logic in Hadoop

Continuous monitoring and managing the Hadoop cluster using Cloudera Manager

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required

Installed Oozie workflow engine to run multiple Hive and Pig jobs

Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team

Client: JCPenney Inc., Plano, TX Feb 2011 to Mar 2012

Role: ETL & UNIX Developer

J.C. Penney Company Inc., is a chain of American mid-range department stores based in Plano, Texas, a suburb north of Dallas. The company operates 1,093 department stores in 49 of the 50 U.S. states (except Hawaii) and Puerto Rico and 6 stores in JC Penney Mexico which are controlled by Mexican capital. J. C. Penney operates by stores, online and Catalog sales merchant offices nationwide in many small markets.

Project: DataMart development for Executive Dashboard

Executive Dashboard provides key metrics of sales, pricing and traffic for the top level executives at company level, division level and regional level. The objective of this project is to build data mart by extracting data from various systems and integrating traffic information with them. Traffic information will be provided by third party companies like Shopper Track and IBM Coremetrics where secured paths should be established. Designed datamart to report the flash sales and flash traffic information.

Environment: UNIX Shell scripting, Informatica PowerCenter Client 9.1, SVN, DB2

Key Responsibilities / Achievements:

Understand and document business processes, collaborate with business users in requirement analysis, functional design, and technical design.

Establish secured connection with third party systems to bring traffic information, every 15minutes.

Design a datamart to integrate data from various systems like Stock Ledger, Pricing and EDW (Enterprise Data Warehouse).

Create the functional specification documents for Promote Inbound and Outbound interfaces.

Develop test plans, test cases, test scripts, and test validation data sets for INFORMATICA integration/ETL processes.

Develop data management, information governance strategy and ETL standards documents.

Perform peer reviews on Architecture, Design, Code, and standards documentations for ETL processes.

Analyze and design the implementation for different business rules in processing the Events/Offers.

Build UNIX shell scripts to automate the ETL processes.

Work on troubleshooting, performance tuning and performance monitoring for enhancement of Informatica jobs.

Develop common processes for error handling and error reporting.

Utilize standard corporate tools to record changes, service requests and problem managing activities for the purpose of tracking.

Control the quality of deliverables and Risk Management Planning.

Daily monitor and track service levels during warranty period.

Provide support in post-deployment phase and involve in project transition to production support.

Provide production support and implement the changes to improve system performance.

Client: JCPenney Inc., Plano, TX Jan 2010 to Feb 2011

Role: ETL & UNIX Developer

Project: Enterprise Data Warehouse (EDW)

The Enterprise requires weekly Demand through Gross Sales, Gross Profit and Inventory information at the subdivision/item/SKU level for the direct channel. During this phase of project the EDW (Enterprise Data Warehouse) will focus primarily on three work streams: Demand, Inventory, Gross Profit and the population of data into the EDW at the lowest level of granularity. Provide daily/weekly/month-to-date/year-to-date, this year and previous year actual data with drilldown capability for the Demand, Alternate Sales, Omissions and Cancelations.

Environment: UNIX Shell scripting, Informatica PowerCenter Client 9.1, SVN, DB2

Key Responsibilities / Achievements:

Populate foundation data within the EDW in order to support weekly downstream data requirements

Leverage contributions of the DSAS (Direct Sales Accounting System) project

Provide the infrastructure for reporting.

Build reconciliation mechanism to ensure no discrepancy in data from source and target systems.

Involve in building the strategy to Implement Customer and transaction level data warehouse on db2.

Document user requirements and translate requirements into system solutions.

Architect Star & Snowflake based logical & physical data models for Data Warehouse systems using data modeling tools such as Erwin.

Involve in creating the functional specification documents for ETL interfaces.

Design, develop, deploy and support integration processes across the enterprise by utilizing Informatica V9.1

Develop test plans, test cases, test scripts and test validation data sets for Data Mart, Data Warehouse integration/ETL processes.

Perform software testing including Unit Testing, Functional Testing, Database Testing, Load Testing, Performance Testing, and User Accepting Testing.

Provide platform for testing team to perform White/Black Box Testing, System Testing, Regression Testing, Integration Testing, End to End Testing.

Design and Implement ETL processes for History load and Incremental loads of EDW, Customer and Transaction level data warehouse.

Document all the interface processes in current data warehouse system and translate them into new ETL processes using Informatica.

Involve in Data Assessment to identify key data sources and run the system extracts and queries.

Perform Data cleansing activities to improve the data quality.

Automate all the new ETL jobs using UNIX shell scripts and add data validation checks including business rules and referential integrity checks.

Create dimension and fact tables in db2 and perform data bulk loads.

Perform troubleshooting, performance tuning and performance monitoring for enhancement of jobs.

Maintain warehouse metadata, naming standards and warehouse standards for future application development.

Provide support in post-deployment phase and involve in project transition to production support.

Train production support team to support the application.

Client: JCPenney Inc., Plano, TX Mar 2009 to Jan 2010

Role: Program Analyst

Project: Application Support and Maintenance

Marketing and Business Intelligence (MBI) constitutes a centralized, refined and retail rules-applied database repository (Data Store) that stores information of key performance indicator. MBI has different applications which creates reports using the standard reporting tools (Microstrategy, OBIEE, and .Net) for the end users (Store Managers, Executives, and Regional Managers).

Environment: UNIX Shell scripting, Informatica PowerCenter Client 9.1, SVN, DB2, Oracle, BI Publisher, Microstrategy.

Key Responsibilities / Achievements:

Application Maintenance and enhancements of BI application.

Gathering Enhancement requirements.

System Analysis and Design.

Demonstrate screens to users as functionality are available.

Involve in various administration activities.

Ensure the Production problems are resolved within SLA for the global support project

Ensure the overall quality of the project

Resolve Issues/Disputes in the application

Train and mentor the team’s performance and help them support better

Provide value addition to the customer by suggesting changes which would increase the usability of the system.

Own INFORMATICA ETL jobs’ performance enhancements.

Perform effective Onsite-Offshore Communication.

Client: JCPenney Inc., Plano, TX Apr 2007 to Mar 2009

Role: UNIX and Teradata bteq Program Developer

Project: Demand Chain Management Application Upgrade (Teradata DCM)

This project is to upgrade the JCPenney Demand Chain Management (DCM) application from the current functionality used in the DCM R3.2.3 to the comparable functionality in DCM R4.3.5. JCPenney currently runs the DCM application for the stores and Subs that are on Replenishment. The project includes implementation of additional functionality introduced with the new DCM R4.3.5, as well as any specific new functionality requested by the business users

Environment: UNIX Shell scripting, Teradata Bteq.

Key Responsibilities / Achievements:

• Set up appropriate Test environments to validate R4.3.5 upgrade

• Set up the DCM R4.3.5 Production environment

• Deliver business and technical training to the JCPenney resources

• Perform benchmark comparisons to determine batch window and CPU utilization between R3.2.3 running on TD 12.0.03 and R4.3.5 running on TD 14.0.

• Evaluate and develop/modify required ETL in order to satisfy DCM R4.3.5 requirements

• Execute System tests

• Support JCPenney User Acceptance Test

• Enable new DCM R4.3 functions as per business requirements

• Develop and Implement additional functions required and approved as part of the Change control process

•Data Analysis and issue identification

•Propose Architectural design changes to improve data warehouse performance

•Visualize a data architecture design from high level to low level, and design performance objects for each level

• Troubleshooting database issues related to performance, queries, stored procedure

• Create ER diagram and conceptual, logical, physical data model

•Fine-tune the existing scripts and process to achieve increased performance and reduced load times for faster user query performance

•Accountable for Architect related deliverables to ensure all project goals are met within the project time lines

•Performs mapping between source and target data, as well as performing logical model to physical model mapping and mapping from third normal form to dimensional (presentation layer).

•Creates, validates and updates the data dictionary and analysing documentation to make sure that the information captured is correct

•Design logical and physical data model using Erwin data modelling tool and visio

•Architecture and design support to provide solution for business initiated requests/ projects

• Writing teradata sql queries to join or any modifications in the table

•Creation of customized Mload scripts on UNIX platform for Teradata loads

•Provide design for CDC implementation for real time data solutions

•Interact with business to collect critical business metrics and provide solution to certify data for business use

•Analyse and recommend solutions for data issues

•Writing Teradata BTEQ scripts to implement the business logic

Environments & Technologies worked with:

Cloudera CDH4.3 & CDH5, Hadoop, Map Reduce, HDFS, Hive, Impala, Tableau, INFORMATICA, Unix Shell Scripting, DB2, OBIEE, Teradata, SQL, Teradata SQL Assist, SDS version Control, BMC Remedy



Contact this candidate