Job Description
Senior Cloudera Administrator/Infrastructure SME with Cloudera Data Platform Experience (SDX and CDP – On-Prem, Public Cloud and Private Cloud)
Role & Responsibilities:
Will be responsible for the administration of Cloudera CDP on-prem and cloud infrastructure.
Leverage the CDP features to build the cloud-hybrid infrastructure for CDP (CDP Public Cloud).
Independently installs and maintains Big Data (Cloudera) clusters in high available, load balanced configuration across multiple (Production, QA and Development) environments in both on-prem and Cloud (AWS) environment.
Implement the Knox and Kerberos for Cluster security and integrate with enterprise and cloud IAM.
Develop scripts to automate and streamline operations and configuration
Manage and automate the installation process (use tools like Ansible) for CDP Manager, CDH, and the ecosystem projects. Activities include : Set up a local CDH repository; Perform OS-level configuration for Hadoop installation; Install Cloudera Manager server and agents; Install CDH using Cloudera Manager; Add a new node to an existing cluster; Add a service using Cloudera Manager
Schedule the jobs using Apache Nifi or Air flow
Analyze, recommend and implement improvements to support environment/infrastructure management initiatives. Configure - Perform basic and advanced configuration needed to effectively administer a Hadoop cluster on-prem and on-cloud. Activities include : Configure a service using Cloudera Manager; Create an HDFS user's home directory; Configure NameNode HA; Configure ResourceManager HA; Configure proxy for Hiveserver2/Impala
Maintain and modify the cluster to support day-to-day operations in the enterprise. Activities include : Rebalance the cluster; Set up alerting for excessive disk fill; Define and install a rack topology script; Install new type of I/O compression library in cluster; Revise YARN resource assignment based on user feedback; Commission/decommission a node
Enable relevant services and configure the cluster to meet goals defined by security policies, Activities include: Configure HDFS ACLs; Install and configure Sentry; Configure Hue user authorization and authentication; Enable/configure log and query redaction; Create encrypted zones in HDFS
Benchmark the cluster operational metrics, test system configuration for operation and efficiency. Activities include: Efficiently copy data within a cluster/between clusters; Create/restore a snapshot of an HDFS directory; Get/set ACLs for a file or directory structure; Benchmark the cluster (I/O, CPU, network)
Under general supervision, manage Big Data Administration activities, technical documentation, system performance support, and internal customer support. May provide input into the development of Systems Architecture for mission critical corporate development projects.
Research performance issues, configuring the cluster with Cloudera best practices, optimizing specifications and parameters to fine-tune and proactively avoid performance issues
Skills:
Must be Cloudera Certified Hadoop and/or Spark Administrator
Hands-on experience on Cloudera installation, configuration, debugging, tuning and administration.
Must have infrastructure implementation experience on on-prem and Cloud (AWS).
Must have knowledge on CDP Containers and integration of on-prem clusters with Cloud clusters.
Strong hands on experience in implementation of Security like Kerberos, Sentry, OS Upgrade and TLS/SSL implementation etc
Must have knowledge on Cloudera SDX, Cloudera Public Cloud and Private Cloud infrastructures.
Experience administrating distributed applications: Hadoop, Spark, Kafka, Map Reduce, Hive, Impala
Experience with large-scale high-performance distributed systems like Hadoop, NoSQL or Spark
Assist in preparing and scaling the cluster as required to execute mission critical data processing processes
Experience with setting up and configuring the YARN queues using the YARN queue manager
· Deep understanding of IP, TCP, UDP, SSL/TLS protocols
Experience with Devops, scripting or other automation
Experience with Performance monitoring and tuning
Hadoop Cluster maintenance as well as creation and removal of nodes
Working knowledge of Networks, Linux OS and Unix Shell Scripting • Understanding working authorization mechanism for Ranger / Sentry
Systems implementation, operations, and its optimization as Hadoop Admin
Demonstrate ability to find the root cause of a problem in Spark/HDFS and CDP clusters, optimize inefficient execution, and resolve resource contention scenarios; Resolve errors/warnings in Cloudera Manager; Resolve performance problems/errors in cluster operation;Determine reason for application failure; Configure the Fair Scheduler to resolve application delays
Experience on Cloudera Data Science Workbench and Cloudera Data Flow products.
Experience working with Systems Operation Department in resolving variety of infrastructure issues