Hadoop Administrator

Location:

San Jose, CA

Posted:

November 21, 2019

Contact this candidate

Resume:

KARTHIK VELLANKI

CONTACT: 408-***-****

EMAIL: ****************@*****.***

POSITION: Hadoop Administrator

EXECUTIVE SUMMARY

Overall 4 years of experience in Software analysis, design, development and maintenance in diversified areas of Client Server, Distributed and embedded applications.

Good Working and theoretical knowledge on Apache Druid, configuring the cluster.

Performing the data ingestion to load data into druid.

Druid integration with Hadoop, Kafka, and other components.

Monitoring the service performance in druid components, its memory and hard disk usage.

Hands on experiences with Hadoop stack. (HDFS, Map Reduce, YARN, Sqoop, Flume, Hive-Beeline, Pig, Zookeeper, Oozie, Kerberos, Kafka).

Experienced on Horton-works Hadoop Clusters.

Implementing security authentication By Kerberos on enterprise Hadoop and Druid clusters.

Autoscaling implemented on Hadoop and Druid nodes as well as Rebalancing after adding/decommissioning of nodes to the clusters.

Provided L2, L3 support for application, development and production clusters

Hands on day-to-day operation of the environment, knowledge and deployment experience in Hadoop ecosystem.

Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.

Hands on experience in installing, configuring Cloudera, Hortonworks clusters and installing Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume and Zookeeper.

Used Oozie, Cron, Autosys Workflows to schedule jobs

Written shell scripts and successfully migrated data from on RDBMS to Druid

Good understanding and hands on experience of Hadoop Cluster capacity planning of production cluster, performance tuning, cluster monitoring, troubleshooting.

Configure, setting up and upgrading the Hadoop and druid clusters

Installing and Configuring Systems for use with Cloudera distribution of Hadoop (consideration given to other variants of Hadoop such as Apache, Hortonworks, Pivotal, etc.)

Adding of new users to access the Hadoop Platform and maintaining AD with Centrify.

Commissioning and de-commissioning the cluster nodes, Data migration.

Implemented several scheduled Spark, Hive & Map Reduce jobs in Hadoop distribution.

Assist developers with troubleshooting Map Reduce, BI jobs as required.

Provide granular ACLs for local file datasets as well as HDFS.

Cluster monitoring and troubleshooting using tools such as Cloudera and Ambari metrics.

Manage and review HDFS data backups and restores on Production cluster.

Hadoop Cluster monitoring through Ambari Metrics Ui and with Nagios tool.

Implement new Hadoop infrastructure, OS integration and application installation. Install OS (rhel6, rhel5, centos, and Ubuntu) and Hadoop updates, patches, version upgrades as required.

Responsible for continuous monitoring.

Implement and maintain security LDAP, Kerberos as designed for cluster.

Expert in setting up Horton works cluster with and without using Ambari.

Implementing hive optimization techniques like Partitioning, bucketing, CBO query optimization and vectorization Good Knowledge on Azure cloud and its services.

Implementing Spark tuning methodologies like Data Serialization, Memory Tuning.

Experienced in setting up Cloudera cluster using packages as well as parcels Cloudera manager.

Hadoop Administrator

In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.

Experience in configuring, installing and managing Hortonworks & Cloudera Distributions Extensive Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies.

Expertise to handle tasks in Red Hat Linux includes upgrading using YUM, kernel, configure SAN Disks, Multipath and LVM file system.

Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HBASE, ZOOKEEPER) using Horton works Ambari.

Worked on creation of Ansible manifest files to install tomcat instances and to manage configuration files for multiple applications.

Provide hands-on engineering expertise to assist with support and operation of the cloud infrastructure. Responsibilities include the design, creation, configuration, and delivery of cloud infrastructure environments.

Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.

Well versed in writing Hive Queries and Hive query optimization by setting different queues.

Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems. Experience in Jumpstart, Kickstart, Infrastructure setup and Installation Methods for Linux.

Ran Ansible playbooks and created various roles for applications, then deployed the Applications/Services on hosts.

Experience in implementation and troubleshoot of cluster, JDBC.

Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job. Experience Schedule Recurring Hadoop Jobs with Apache Oozie.

Experience Schedule Recurring Hadoop Jobs with Apache Oozie.

Knowledge of NoSQL databases such as HBase, Cassandra, Mongo DB.

TECHNICAL SKILLS

Hadoop ecosystem tool's and Automation tool

MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Spark, Flume. MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue,

Storm, Kafka, Spark, Flume, Apache Druid

Hadoop/Big Data/Druid Technologies

HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Apache Spark, Spark Streaming,

Spark SQL, HBase, and Cassandra, Hortonworks,

Cloudera, Autosys, Imply, Apache Druid

Programming Language

Core Java, HTML, Programming C, C++.

Databases

MySQL, Teradata, HBase, NoSQL, MS Access

Scripting languages

Shell Scripting, Bash Scripting, HTML scripting, Python.

WEB Servers

Apache Tomcat, windows server2003, 2008, 2012.

Security Tool's

ACL’s, Sentry, Ranger and Kerberos.

Cluster Management Tools

Cloudera Manager, HDP Ambari, Imply

Operating Systems

MS Access, Red Hat Linux 4.0, RHEL-5.4, RHEL 6.4, Oracle Linux 7.6, RHEL 7

Scripting & Programming Languages

Shell & Python Programming

Platforms

Linux (RHEL, Ubuntu), Oracle Linux

WORK EXPERIENCE

Druid Platform Engineer/Hadoop Admin July 2019 to Present

Apple Inc(Contract Role) through Metanoia Solutions Inc, Sunnyvale, CA

Responsibilities:

•Setting up the Druid cluster with its dependencies like Zookeeper, HDFS, Kafka, Imply Pivot, Nginx.

•Maintaining the meta-store for druid cluster.

•Os Patching and upgrade for Prod and Non-Prod clusters.

•Setting up the VM and BM clusters and checking for setup related errors.

•Performing the ingestion task by writing the ingestion spec to load data into druid.

•Maintained and configured the various components of Druid like historical, middle Manager, Brokers and the other components.

•Written ingestion specs to ingest batch data into druid.

•Debugged and solved the log generation scenarios in Overlord UI console.

•Configured Druid with Kafka to ingest streaming data into druid.

•Integrated druid with Kafka with multiple Zookeepers or Zookeeper Quorum for streaming Data

•Setup Hadoop paths in druid cluster for ingesting batch data.

•Monitoring the ingestion processes, checking whether the segments are created on the overlord console.

•Resolving middle-Manager buffer specifications to perform with varying workloads.

•Setting up Metadata, Zookeeper, Hadoop extensions and clarity monitoring.

•Analyzing Nginx config files for network related issues.

•Checking the query results on Imply pivot UI, comparing to RDBMS results, pruning, modifying and improving the quality of Data.

•Scripts implemented to fetch data from RDBMS and ingest into druid.

•Worked on one of its kind VM druid cluster with Network File Storage Component.

•Analyzed the key tabs, java version, nginx related issues during druid ingestion and solved them accordingly.

•Installed python, java, Hadoop, imply related services on the druid cluster.

•Extending the cluster capability and connectivity between free flow of data between Hadoop, druid clusters by enabling ACL’s.

•Enabling Druid Basic Auth for the druid clusters. Druid Kerberos Integration done.

•Autoscaling implementing with Kubernetes on Druid.

•Knowledge on Global Server Load Balancing on Druid with Apache Load Balancing implemented.

Environment:

Big Data, HDFS, YARN, Hive, Sqoop, Zookeeper, HBase, Oozie, Kerberos, Rangers, Knox, Spark, Red-Hat Linux, Oracle Linux, Oracle JDK 8, Kafka, Zookeeper, Imply, Autosys.

Hadoop Administrator March 2019 – June 2019

Techno Bytes, Ashland, MA

•Configuration, develop, introduce, test and keep up to date in data management systems.

•Performed data manipulation, transformation, and cleansing.

•Construct elite calculations, models, predictive models and evidence of ideas.

•Research for information acquisition and new uses based on existing information.

•Created/Extracted datasets for data mining and modelling.

•Make custom programming segments (for example specific UDFs) and analytical applications.

•Introduce and refresh disaster recovery methods.

•Recommended ways to improve data reliability, efficiency and quality.

•Work together with database architects, modelers and IT colleagues on undertaking objectives.

Environment:

Big Data, HDFS, YARN, Hive, Sqoop, Zookeeper, HBase, Oozie, Kerberos, Rangers, Knox, Spark, Red-Hat Linux, Oracle Linux, Oracle JDK 8, Kafka, Zookeeper, Imply, Autosys.

Hadoop Administrator January 2016 – December 2016

Fratello Innotech Ltd, Hyderabad India

Responsibilities:

Upgraded HDP 2.6.0 to HDP 2.6.5 cluster

Worked on increasing HDFS IO efficiency by adding new disks and directories to HDFS data nodes. Tested HDFS performance before and after adding data directories.

Worked on HBase performance tuning by following Apache HBase recommendations and changed row key accordingly.

Implementing Spark tuning methodologies like Data Serialization, Memory Tuning.

Creation of key performance metrics, measuring the utilization, performance and overall health of the cluster.

Capacity planning and implementation of new/upgraded hardware and software releases as well as for storage infrastructure.

Autoscaling implemented on Hadoop and Druid nodes as well as Rebalancing after adding/decommissioning of nodes to the clusters.

Provided L2, L3 support for application, development and production clusters

Research and recommend innovative, and where possible, automated approaches for system administration tasks.

Ability to closely calibrate with product managers and lead engineers.

Provide guidance in the creation and modification of standards and procedures

Proactively monitor and setup alerting mechanism for Kafka Cluster and supporting hardware to ensure system health and maximum availability.

Experience in the Azure components & APIs.

Thorough knowledge on Azure platforms IAAS, PaaS.

Responsible for daily monitoring activities of 6 clusters with 3 different environments (Dev, Stg and Prod) making a total of 18 clusters.

Support developer team in case of issues related to job failures related to hive queries, zeppelin issues etc.

Responsible for setting rack awareness on all clusters.

Responsible for DDL deployments as per requirement and validated DDLs among different environments.

Responsible for usual admin activities like giving access to users for edge nodes, raising tickets/requests for account creation, AD user creation for different services.

Environment:

Big Data, HDFS, YARN, Hive, Sqoop, Zookeeper, HBase, Oozie, Kerberos, Rangers, Knox, Spark, Red-Hat Linux.

Hadoop Administrator January 2015 to December 2015

Speed consulting Pvt Ltd, Bangalore India

Responsibilities:

Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.

Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.

Worked on Capacity planning for the Production Cluster.

Installed HUE Browser.

Involved in loading data from UNIX file system to HDFS and creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Experience in MapR, Cloudera, & EMR Hadoop distributions.

Worked on Installation of HORTONWORKS 2.1 in Linux Servers and Configuring Oozie Jobs Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.

Performed on cluster up gradation in Hadoop from HDP 2.1 to HDP 2.3.

Ability to Configuring queues in capacity scheduler and taking Snapshot backups for HBase tables. Worked on fixing the cluster issues and Configuring High Availability for Name Node in HDP 2.1.

Involved in Cluster Monitoring backup, restore and troubleshooting activities.

Involved in Map R to Horton works migration.

Administration and management of Atlassian tool suites (installation, deployment, configuration, migration, upgrade, patching, provisioning, server management etc.)

Audited our existing plug-ins and uninstalled few unused plugins to save costs and manage the tool efficiently. Automated the transition of issues based on the status when work is logged on the issues.

Automated issue creation from the office 365 email through mail handler. Configured logging to reduce unnecessary warnings and Info.

Currently working as Hadoop administrator in MapR Hadoop distribution for 5 clusters ranges from POC clusters to PROD clusters contains more than 1000 nodes.

Implemented manifest files in puppet for automated orchestration of Hadoop and Cassandra clusters.

Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.

Responsible for implementation and ongoing administration of Hadoop infrastructure Managed and reviewed Hadoop log files.

Administration of HBase, Hive, Sqoop, HDFS, and MapR.

Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.

Worked on Configuring Kerberos Authentication in the cluster

Experience in using MapR File system, Ambari, Cloudera Manager for installation and management of Hadoop Cluster.

Very good experience with all the Hadoop eco systems in UNIX environment.

Experience with UNIX administration.

Worked on installing and configuring Solr 5.2.1 in Hadoop cluster.

Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.

Experienced on Horton works Hadoop Clusters.

Implementing security authentication By Kerberos on enterprise Hadoop and Druid clusters.

Autoscaling implemented on Hadoop and Druid nodes as well as Rebalancing after adding/decommissioning of nodes to the clusters.

Provided L2, L3 support for application, development and production clusters.

Worked on indexing the HBase tables using and indexing the Json data and Nested data.

Hands on experience on installation and configuring the Spark and Impala.

Successfully install and configuring Queues in Capacity scheduler and Oozie scheduler.

Worked on configuring queues in and Performance Optimization for the Hive queries while Performing tuning in the Cluster level and adding the Users in the clusters.

Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.

Implementing hive optimization techniques like Partitioning, bucketing, CBO query optimization and vectorization Good Knowledge on Azure cloud and its services.

Implementing Spark tuning methodologies like Data Serialization, Memory Tuning.

Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.

Adding/installation of new components and removal of them through Ambari.

Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.

Monitored workload, job performance and capacity planning

Involved in Analyzing system failures, identifying root causes, and recommended course of actions.

Inventing and deploying a corresponding Solr Cloud collection. Creating and managing the Cron jobs.

Environment:

Hadoop, Map Reduce, Yarn, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Impala, Spark, Hortonworks, Flume, HBase, Zookeeper and Unix/Linux, Hue.

Linux Administrator June 2014 to November 2014

Speed Consulting Pvt Ltd, Bangalore, India

Responsibilities:

Provided 24x7 on-call supports in debugging and fixing issues related to Linux, Solaris, Installation/Maintenance of Hardware/Software in Production, Development & Test Environment as an integral part of the Unix/Linux Support team.

Installation Red hat Linux Enterprise Server 5/6 on Dell and HP x86 HW.

Installed and configured the Red Hat Linux 5.1 on HP-Dl585 servers using Kick Start.

Monitoring day-to-day administration and maintenance operations of the company network and systems working on Linux and Solaris Systems.

Responsible for deployment, patching and upgrade of Linux servers in a large datacenter environment.

Design Build and configuration of RHEL.

Responsible for providing 24x7 production support for Linux.

Automated Kickstart images installation, patching and configuration of 500+ Enterprise Linux servers.

Built kickstart server for automated Linux server builds.

Installed Ubuntu servers for migration.

Created Shell, Bash scripts to automate a variety of tasks.

Maintained user accounts. Sudo was used for management accounts and faceless. Otherwise the accounts were LDAP. Datacenter operations, migration of Linux servers

Installed, configured, troubleshoot and maintain Linux Servers and Apache Web server, configuration and maintenance of security and scheduling backups, submitting various types of cron jobs.

Installed and configured the RPM packages using the YUM Software manager. Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.

Defining and Develop plan for Change, Problem & Incident management Process based on ITIL.

Networking communication skills and protocols such as TCP/IP, Telnet, FTP, SSH.

Also coordinating with storage team and networking teams.

Environment:

Red Hat Enterprise Linux 4.x/5.x, Logical Volume Manger for Linux and VMware ESX Server 2.x,Hyper-V Manager VMware.

Contact this candidate