Post Job Free
Sign in

Manager Data

Location:
King of Prussia, PA
Posted:
January 30, 2017

Contact this candidate

Resume:

Manoj Kumar

Hadoop Administration

Email: *******@*******.***

Phone: 269-***-****

SUMMARY:

7+ years of professional IT experience which includes around 4 years of hands on experience in Hadoop Administration using Cloudera and Hortonworks Distributions on large distributed clusters.

Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.

Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, Spark, Drill-bits, SOLR, FLUME, HCATALOG, HBASE, ZOOKEEPER) using MapR Control System, Cloudera Manager and Hortonworks Ambari.

Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.

Well versed with installation, configuration, managing and supporting Hadoop cluster using various distributions like Apache Hadoop, Cloudera-CDH and Hortonworks HDP.

Experience in Benchmarking, Backup and Disaster Recovery of Name Node Metadata.

Experience in performing minor and major Upgrades of Hadoop Cluster.

Experience on Cloud Computing such as AWS.EC2 and VPC instances. S3, IAM, Cloud watch.

Experience in deploying Hadoop 2.0(YARN).

Capable of processing large sets of structured, semi-structured-structured data and supporting systems application architecture.

Experience in managing and reviewing Hadoop log files.

Experience in securing Hadoop clusters using Kerberos and Sentry.

Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Ambari, Nagios and Ganglia.

Skilled in setting up apache KNOX, RANGER and SENTRY.

Providing support to Data analyst in running PIG and HIVE queries.

Experience in minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.

Experience on System Administrator on Linux (Centos, Ubuntu, Red Hat).

Handsome experience in Linux admin activities on RHEL &Cent OS.

Good knowledge in Java, J2EE, HTML, JSP, Servlets, CSS, JavaScript, XML

Familiar with Java virtual machine (JVM) and multi-threaded processing.

Extensive experience in working with the Customers to gather required information to analyze, provide data fix or code fix for technical problems, and providing Technical Solution documents for the users.

Also involved in the upgrading the CDH4 to latest version which includes the upgrading of Cloudera Hadoop Manager to version CDH5.

Technical Skills:

Hadoop Ecosystem: Hive, Pig, Sqoop, map reduce, Flume, Impala, Oozie, Sentry, Spark, Zookeeper, Storm.

Hadoop Management: Ambari, Cloudera Manager, Hortonworks, AWS.

Hadoop Paradigms: Map Reduce, Yarn, High Availability.

Application Software: SSH, telnet, ftp, Terminal client and Remote Desktop Connection.

RDBMS: Oracle 10g/11g/12c, MS SQL Server 2000/2003/2008R2/2012, DB2, Teradata, MySQL.

Programming Languages: Linux, Unix Shell scripting, JAVA, SQLC/C++, Pig Latin, PL/SQL,

Monitoring and Alerting: Cloud Watch, Ambari Metrics Collector, Nagios app dynamics, Ganglia.

Operating Systems: Centos 5,6, Red hat 6, Ubuntu Server 14.04(Trusty), Windows Server 2012, Red hat Enterprise, Linux 3, 4.x, 5.x, 6.x (ES/AS/WS), HPUX 10.x/11.

PROFESSIONAL EXPERIENCE

Client: CSL Behring, king of prussia, pennsylvania SEP 2016 to till date

Role: Sr. Hadoop Administrator

Environment: HADOOP HDFS, YARN, HORTON WORKS, SAP HANA VORA,Zookeeper, Pig, Hive and Hcatalog, Oozie, Sqoop, Hbase, Spark, Hue, Zeppelin, Ranger, Keberos, Knox, Atlas, Apache Nifi and Minifi.

Responsibilities:

Designed and Implemented a Hadoop architecture of 23 node cluster for STAGING and PRODUCTION with various components such as HDFS, YARN, NAMENODES and DATANODES.

Configured oracle 12c database for HIVE, RANGER, OOZIE and HUE for metadata management and SECURITY.

Created own YUM repository for AMBARI, HDP, HDP-UTILS and EPEL for installation and update packages.

Pre-configured Hadoop clusters with SSH Keyless Login access and other networking configurations like PORTS, FQDN, IPTABELS, user accounts and file permissions, http, ftp for Hadoop installation.

Installed and Deployed HADOOP cluster with HORTONWORKS HDP 2.4.3 via AMBARI 2.2.

Implemented both stages with Name Node High Availability on the Hadoop cluster to overcome single point of failure.

Implemented Name Node backup using QJM for High availability.

Involved in installing and configuring KERBEROS to implement security to the Hadoop cluster and providing authentication for users.

The analyzed data mined from huge volumes SAP data was exported to HDFS using Sqoop.

Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.

Integrated SAP HANA VORA 1.3 which provides in-memory processing engine for data ingestion into HDFS using SPARK 1.6.3 executive framework.

Installed APACHE NIFI and MINIFI to make data ingestion Fast, Easy and Secure from internet of anything with HORTONWORKS DATA FLOW.

Installed OOZIE workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability for analyzing the HDFS audit data.

Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.

Client: Vanco payment Solutions, Minneapolis, MN May 2015 to Aug 2016

Role: Sr. Hadoop Administrator

Environment: HADOOP HDFS, MapReduce, HORTON WORKS,Hive, Pig Hive, Oozie, Flume Sqoop, Hbase.

Responsibilities:

Installed and configured a Horton Works HDP 2.2 and Hadoop 2.6 using AMBARI.

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.

Responsible for building scalable distributed data solutions using Hadoop.

Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.

Created HBase tables to store variable data formats of PII data coming from different portfolios.

Managing and reviewing Hadoop log files and debugging failed jobs.

Implemented Kerberos Security Authentication protocol for production cluster.

Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.

Implemented test scripts to support test driven development and continuous integration.

Worked on tuning the performance Pig queries.

Managed the design and implementation of data quality assurance and data governance processes.

Worked with Infrastructure teams to install operating system, Hadoop updates, patches, version upgrades as required.

Backed up data on regular basis to a remote cluster using distcp.

Responsible to manage data coming from different sources.

Involved in data analysis projects using Map Reduce on the HORTONWORKS DATA PLATFORM.

Cluster coordination services through Zookeeper.

Loaded the dataset into Hive for ETL Operation.

Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.

Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Implemented Fair scheduler to allocate fair amount of resources to small jobs.

Assisted the BI team by Partitioning and querying the data in Hive.

Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Client: Phone.com, Poway, CA June 2014 – April 2015

Role: Hadoop Administrator

Environment Cloudera, HDFS, Zookeeper, Oozie, HDFS Map Reduce, HBase, Flume, Sqoop, Shell Scripting, Cloudera manager, Nagios and Ganglia.

Responsibilities:

Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage &review log files.

Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.

Experienced on adding/installation of new components and removal of them through Cloudera Manager.

Implemented and Configured High Availability Hadoop Cluster (Quorum Based)

Installed and Configured Hadoop monitoring and administrating tools: Nagios and Ganglia.

Back up of data from active cluster to a backup cluster using distcp.

Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.

Hands on experience working on Hadoop ecosystem components like Hadoop Map Reduce, HDFS, Zookeeper, Oozie, Hive, Sqoop, Pig, Flume.

Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.

Experience in using Flume to stream data into HDFS - from various sources.

Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.

Installed Oozie workflow engine to run multiple Hive and pig jobs.

Worked on analyzing Data with HIVE and PIG +9.

Helped in setting up Rack topology in the cluster.

Upgraded the Hadoop cluster from CDH3 to CDH4.

Deployed a Hadoop cluster using CDH3 integrated with Nagios and Ganglia.

Implemented automatic failover zookeeper and zookeeper failover controller.

Deployed Network file system for Name Node Metadata backup.

Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.

Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.

Client: Opera solutions, New Jersey Feb 2013 - May 2014

Role: Hadoop Administrator

Environment: Amazon EC2, Apache Hadoop 1.0.1, Map Reduce, HDFS, Cent OS 6, Hbase, Hive, Java (jdk 1.6), Eclipse, VMware ESX 5.1/5.5, Apache and Tomcat Web Server, 12, Oracle Rac 12c, Bash Scripting, Red Hat Linux Enterprise Linux 4/5 4, Solaris 9, Sun Fire V, DMX.

Responsibilities:

Implemented CDH Hadoop cluster on RHEL. Assisted with performance tuning and monitoring.

Installations of all eco system components as a part of project.

Developed multiple map reduce jobs in java for data cleaning and preprocessing.

successfully loaded all types of data from different version of RDBMS and UNIX by creating HBASE table

supported in the developing strategy and project plan along with code/Design analysis

Worked on open-source cluster computing framework based on Apache Spark

Tracking all the Daemons along with navigator data in the cluster by creating usage reports

Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, Hbase database and Sqoop.

Setup and benchmarked Hadoop / HBase clusters for internal use.

Assisted with data capacity planning and node forecasting.

Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Administrator for Pig, Hive and Hbase installing updates patches and upgrades.

Responsible on-boarding new users to the hadoop cluster (adding user a home directory and providing access to the datasets).

Adding new Data Nodes when needed and running balancer.

Responsible for building scalable distributed data solutions using Hadoop.

Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.

Continuous monitoring and managing the Hadoop cluster through Cloudera manager and Cloudera health tests.

Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloud era distribution parcels.

Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.

Client: IGATE Global Solutions, Hyderabad, India May 2011– Dec 2012

Role: Linux System Administrator

Environment: Linux, TCP/IP, LVM, RAID, XEN, Networking, Security, RPM, user management.

Responsibilities:

Building, Installing, Configuring Sun/HP/Dell servers from scratch with OS of Solaris (10/8) and Linux (Red Hat 6.X, 5.X, 4.X).

Performed network based automated installations of Operating System using Jumpstart for Solaris and Kickstart for RHEL Linux through TPM (Tivoli Provisioning Manager)

Installation and Configuration of Mail server of Send mail

Installation, configuration and maintenance of Virtualization technologies such as VMware

Design, development, and implementation of the package and patch management process.

Troubleshooting the VM machines using Virtual center and VMware Infrastructure client.

Cloning and troubleshooting VM ESX hosts and guest servers.

Installation, setup, configuration, security administration and maintenance for flavors of servers like Active Directory, NFS, FTP, Samba, NIS, NIS+, LDAP, DHCP, DNS, SMTP/Mail Server, Apache Servers, Proxy Servers in heterogeneous environment.

Implementation of RAID Software and hardware.

LVM implementation.

Rsync Backup Scheduling Tool.

Writing Shell scripts for system maintenance and automation of server.

Creation of Virtual machine in the vSphere client.

Yum configuration.

Client : Twinix infomedia pvt.ltd, Hyderabad, India Aug 2009 – March 2011

Role: Linux Administrator

Environment: Linux (Red Hat Enterprise, CentOS), Windows 2000/NT, HP, IBM, Solaris, Oracle 8i, Cisco routers/switches, Dell 6400, 1250, Sun E450, E250.

Responsibilities:

Installation and configuration of Red Hat Linux, Solaris, Fedora and CentOS on new server builds as well as during the upgrade situations.

Log management like monitoring and cleaning the old log files.

Administration of RHEL4.x, 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.

System audit report like no. of logins, success & failures, running cron jobs.

System performance for hourly basis or daily basis.

Remotely coping files using sftp, ftp, scp, winscp, and filezilla.

Created user roles and groups for securing the resources using local operating

System authentication.

Experienced in tasks like managing User Accounts and Groups, managing Disks and File systems.

Installing RedHat Linux using kick start and applying security polices for hardening the server based on company’s policies.

Install and configure Instruction Detection System (IDS) like Tripwire, Snort, and Lids.

Configuring & monitoring DHCP server.

Taking backup using tar and recovering during the data loss.

Experience in writing bash scripts for job automation.

Documenting the installation of third-party software’s.

Configuring printers to the Solaris and Linux servers and also installing third party soft wares.

Maintaining relations with project managers, DBA’s, Developers, Application support teams and operational support teams to facilitate effective project deployment.

Manage system installation, troubleshooting, maintenance, performance tuning, managing storage resources, network configuration to fit application and database requirements.

Responsible for modifying and optimizing backup schedules and developing shell scripts for it.

Performed regular installation of patches using RPM and YUM.

Maintained LVM, VxVM and SVM file systems along with NFS.



Contact this candidate