Hadoop admin

Location:

San Francisco, CA

Posted:

January 23, 2017

Contact this candidate

Resume:

Yaswanth

ID: *********.******@*****.*** Ph: 716-***-****

Hadoop Administrator

Professional Summary:

+Overall 8 years of experience in IT industry including 3+ years of experience in BIG DATA Hadoop Administration and 4 years of experience in Linux ecosystems.

+Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.

+Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.

+Design Big Data solutions for traditional enterprise businesses.

+Backup configuration and Recovery from a NameNode failure.

+Excellent command in creating Backups & Recovery and Disaster recovery procedures and Implementing BACKUP and RECOVERY strategies for off-line and on-line Backups.

+Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network

+Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.

+Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.

+As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.

+Good Experience in setting up the Linux environments, Password less SSH, Creating file systems, disabling firewalls, swappiness, Selinux and installing Java.

+Good Experience in Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions

+Installing and configuring Hadoop eco system like Pig, Hive.

+Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.

+Experience in importing and exporting the logs using Flume.

+Optimizing performance of Hbase/Hive/Pig jobs.

+Hands on experience in Zookeeper and ZKFC in managing and configuring in NameNode failure scenarios

+Expertise using Talend, Pentaho, Sqoop for ETL operations

+Handsome experience in Linux admin activities on RHEL & Cent OS.

+Experience in deploying Hadoop 2.0(YARN).

+Expertise in implementing enterprise level security using AD/LDAP, Kerberos, Knox, Sentry and Ranger.

+Extensive experience in developing the SOA middleware based out of Fuse ESB and Mule ESB.And Configured, Elastic Search Log Stash, Kibana to monitor spring batch jobs.

+Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability

+Familiar with writing Oozie workflows and Job Controllers for job automation.

+Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment – Amazon Web Services (AWS)-EC2 and on private cloud infrastructure – Open Stack cloud platform.

+Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.

+Excellent written, verbal and personal communication skills. Strong drives to learn and apply new technologies in real-world situations.

+Ability to work in Team and Independent. Ability to work in fast paced environments.

+Highly proficient in formulating and refining product requirements

KEY KNOWLEDGE, SKILLS & EXPERTISE

Operating Systems

Windows 7 Enterprise Windows 8, Windows XP,95, '98, 2000,2010, LINUX, Centos, Ubuntu

Hadoop Tools& Distributions

Hadoop Ecosystem HDFS, Map Reduce Hive, Pig, Zookeeper, Sqoop, Oozie, Flume Pentaho and Avro. Cloudera (CDH4/CDH5), Horton Works, MapR

Frameworks & IDE Tools

Hibernate 2.x/3.x, Spring 2.x/3.x, Struts 1.x/2.x, Eclipse, NetBeans

Web Servers

Web Logic, Apache Tomcat, JBOS, Apache Http web server, AWS, Redhat

Database& NoSql

Database Systems Oracle 11g/10g, DB2, SQL, My SQL, HBASE, Mongo DB, Cassandra

Methodologies, management Tools

Scrum, Waterfall & Agile Methodology, Puppet, Chef, Ansible

Scripting & security

Shell Scripting, HTML Scripting, Python, Kerberos, Dockors

Cluster Management Tools

HDP Ambari, Cloudera Manager, Hue, SolrCloud

Programming

C, C++, Core Java, PL/SQL.

Education:

Masters from Silicon Valley University(CA) in CS-2013

Professional Work History

GE, San Ramon, CA (October 2015 to Present)

Hadoop Administrator

Responsibilities

Currently working as administrator on Cloudera (CDH 5.5.2) distribution for 4 clusters ranges from POC to PROD.

Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.

Adding/installation of new components and removal of them through Cloudera Manager.

Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.

Monitored workload, job performance and capacity planning using Cloudera Manager.

Involved in Analyzing system failures, identifying root causes, and recommended course of actions.

Imported logs from web servers with Flume to ingest the data into HDFS.

Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis

Fine tuning hive jobs for optimized performance.

Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.

Worked on NoSQL databases including HBase, Mongo DB, and Cassandra.

Implemented multi-data center and multi-rack Cassandra cluster.

Configured internode communication between Cassandra nodes and client using SSL encryption.

Partitioned and queried the data in Hive for further analysis by the BI team.

Extending the functionality of Hive and Pig with custom UDF s and UDAF’s.

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.

Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.

Monitoring SOLR transparency’s and reviews the SOLR servers.

Creating and deploying a corresponding SolrCloud collection.

Creating and truncating HBase tables in hue and taking backup of submitter ID(s). Configuring, Managing permissions for the users in hue.

Responsible for building scalable distributed data solutions using Hadoop.

Creating and managing the Cron jobs.

Import the data from different sources like HDFS/Hbase into Spark RDD.

Implementation of Ranger, Ranger plug-ins and Knox security tools

Implemented Kerberos Security Authentication protocol for existing cluster.

Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.

Researched applications in use that is not able to use TLS and possibly stand up a proxy LDAP server to fulfill this request.

Integrated Impala to use the same file and data formats, metadata, security and resource management frameworks.

Implemented test scripts to support test driven development and continuous integration.

Worked on tuning the performance Pig queries.

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Experience in configuring the Storm in loading the data from MYSQL to HBASE using jms

Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Installed Oozie workflow engine to run multiple Hive and pig jobs.

Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment

Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.

Developed and Coordinated deployment methodologies (Bash, Puppet & Ansible).

Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: HDFS, Map Reduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, CDH5, Apache Hadoop 2.6, Spark, SOLR, Storm, Knox, Impala, Red Hat, MySQL and Oracle.

Deloitte, Houston, TX (Feb 2014 to August 2015)

Hadoop Administrator

Responsibilities

+Worked on setting up Hadoop cluster for the Production Environment.

+Responsible for implementation and ongoing administration of Hadoop infrastructure.

+Installed, configured and deployed a 50 node MapR Hadoop Cluster for Development and Production

+Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.

+Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.

+Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.

+Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.

+Involved in architecting Hadoop clusters using major Hadoop Distributions - CDH3 & CDH4.

+Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.

+Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.

+Worked on installation of DataStax Cassandra cluster.

+Experience in projects involving movement of data from other databases to Cassandra with basic knowledge of Cassandra Data Modeling.

Worked on Hadoop CDH upgrade from CDH3.x to CDH4.x

+Addressed Data Quality Using Informatica Data Quality (IDQ) tool.

+Used Informatica Data Explorer (IDE) to find hidden data problems.

+Optimized the full text search function by connecting Mongo DB and Elastic Search.

+Utilized AWS framework for content storage and Elastic Search for document search.

+Developed a framework for the automation testing on the Elastic Search index Validation. Java, MySQL.

+Created User defined types to store specialized data structures in Cloudera.

+Wrote a technical paper and created slideshow outlining the project and showing how Cloudera can be potentially used to improve performance.

+Setting up monitoring tools for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper.

+Written scripts to automate application deployments and configurations. Hadoop cluster performance tuning and monitoring. Troubleshoot and resolve Hadoop cluster related system problems.

+As an admin followed standard Back up policies to make sure the high availability of cluster.

+Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.

+Performance tuning of Hadoop clusters and Hadoop MapReduce routines.

+Screen Hadoop cluster job performances and capacity planning.

+Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.

+Performed automation/configuration management using Chef, Ansible, and Docker based containerized applications.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Yarn, HBase, MapReduce, Sqoop, Flume, Zookeeper, Hortonworks, Eclipse, MYSQL, UNIX Shell Scripting.

Kaiser Permanete, Los Angeles, CA (June 2013 – Dec 2013)

Hadoop Administrator

Responsibilities

Working on multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.

Implemented authentication and authorization service using Kerberos authentication protocol.

Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.

Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.

Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.

Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters.

Designed the Cluster tests before and after upgrades to validate the cluster status.

Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.

Documented and prepared run books of systems processes and procedures for future references.

Helping users and teams with incidents related to administration and development.

Onboarding and training on best practices for new users who are migrated to our clusters.

Guide users in development and work with developers closely for preparing a data lake.

Migrated data from SQL Server to HBase using Sqoop.

Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.

Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.

Created Hive external tables for loading the parse data using partitions.

Designed table architecture and developed DAO layer using Cassandra NoSQL database.

Determined groups using LDAP and migrate them to LDAPS.

Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.

Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.

Extensive knowledge in troubleshooting code related issues.

Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.

Designed and coded application components in an agile environment utilizing test driven development approach.

Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, Spark, Splunk, Solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, cluster health, monitoring security, RedHat Linux, Cloudera Manager,Hortonworks.

CSC - Hyderabad, India (Aug 2007 – June 2011)

Linux/Unix Systems Administrator

Responsibilities

Installed, Configured and Maintained Debian/RedHat Servers at multiple Data Centers.

Configured RedHat Kickstart server for installing multiple production servers.

Configuration and administration of DNS, LDAP, NFS, NIS, NIS+ and Send mail on RedHat Linux/Debian Servers.

Hands on experience working with production servers at multiple data centers.

Involved in writing scripts to migrate consumer data from one production server to another production server over the network with the help of Bash and Perl scripting.

Installed and configured monitoring tools Munin and NagiOS for monitoring the network bandwidth and the hard drives status.

Automated server building using System Imager, PXE, Kickstart and Jumpstart.

Planning, documenting and supporting high availability, data replication, business persistence, and fail-over, fail-back using Veritas Cluster Server in Solaris, RedHat Cluster Server in Linux and HP Service Guard in HP environment.

Automated tasks using shell scripting for doing diagnostics on failed disk drives.

Configured Global File System (GFS) and Zetta byte File System (ZFS).

Troubleshooting production servers with IPMI tool to connect over SOL.

Configured system imaging tools Clonezilla and System Imager for data center migration.

Configured yum repository server for installing packages from a centralized server.

Installed Fuse to mount the keys on every Production server for password-less authentication on Debian servers.

Installed and configured DCHP server to give IP leases to production servers.

Management of RedHat Linux user accounts, groups, directories and file permissions.

Implemented the Clustering Topology that meets High Availability and Failover requirement for performance and functionality.

Configured, managed ESX VM's with virtual center and VI client.

Performance monitoring using SAR, Iostat, VMstat and MPstat on servers and also logged to munin monitoring tool for graphical view.

Used LDAP in Active directory to add new user to a directory, remove, modify, and grant privileges and policy.

Performed Kernel tuning with the sysctl and installed packages with yum and rpm.

Installed and configured PostgresSQL database on RedHat/Debian Servers.

Performed Disk management with the help of LVM (Logical Volume Manager).

Configuration and Administration of Apache Web Server and SSL.

Backup management Recovery through Veritas Net Backup (VNB).

Password-less setup and agent-forwarding done for SSH login using ssh-keygen tool.

Established and maintained network users, user environment, directories, and security.

Documented strongly the steps involved for data migration on production servers and also testing procedures before the migration.

Provided 24/7 on call support on Linux Production Servers. Responsible for maintaining security on Red Hat Linux.

Environment: RHEL 5.x/4.x, Solaris 8/9/10, Sun Fire, IBM blade servers, Web sphere 5.x/6.x, Apache 1.2/1.3/2.x, iPlanet, Oracle 11g/10g/9i, Logical Volume Manager, Veritas net backup 5.x/6.0, SAN Multipathing (MPIO, HDLM, Power path), VM ESX 3.x/2.x.

Contact this candidate