Hadoop Administrator

Location:

Foxborough, MA

Salary:

$95 per hour on c2c

Posted:

September 01, 2020

Contact this candidate

Resume:

Venkateswara Varma Srivatsavaya

Email: ***********@*****.***

Cell Phone: +1-856-***-****

Big Data professional with 10+ years of combined experience in the fields of Hadoop and Database Engineering and Administration.

Professional Summary

Experience in designing, implementing and administering multitenant Cloudera distribution of Hadoop clusters for visualization and analytics on Big Data

Experience in engineering and administering Streamsets/Sqoop for data ingestion, HDFS/HBase/Kudu for distributed storage, Hive for datawarehousing, Impala for interactive querying, Spark for data transformation and advanced analytics, Flume for streaming event data, Solr for content and metadata indexing on unstructured data, Cloudera Navigator for data governance and Cloudera Data Science Workbench for Data Sceince

Experince in working with Business and Application owners to understand high level requirements for onboarding to the BigData and Analytics platform. On a multi tenant platform hosting more than 60+ applications and usecases, working with Application Development, SQA and DevOps teams to ensure successful Production deployments on the BigData and Analytics platform

Experience in setting up High Availability, Replication, Backup and Disaster Recovery on the different services on the Big Data and Analytics platform

Experience in Unix development and administration. Currently maintaining 200 Oracle Enterprise Linux 6 and RedHat Eenterprise Linux 7 Servers which are used for CDH clusters

Experience in setting up and maintaining MySQL/MariaDB relational database used for Metadata Storage for Hive, Cloudera Navigator and other Cloudera Management Services.

Experience in securing the clusters by configuring MIT Kerberos for authentication and Sentry for role based fine grained authorization

Experience in Yarn and Impala Resource Management (CPU/Memory) on a vastly multitenant platform

Experience in Integrating Kerberized Impala with Business Intelligence tools like Tableau and Spotfire

Experience in installing, configuring, upgrading and maintaining Streamsets Data Ingestion platform

Experience in installing, configuring, upgrading and maintaining Cloudera DataScience Workbench to provide self service access to Data Scientists to run Machine Learning and Artificial Intelligence usecases

Experience in developing and testing Java and Python code to automate redundant tasks including but not limied to backups, monitoring and for performing connectivity tests to different services on the cluster

Hands on Knowledge in Amazon AWS services (EC2, EBS, S3, ELB, VPC, Route 53)

Hands on Knowledge in Kafka cluster setup, administration, monitoring, Security, Kafka Streams and KSQL

Technical Skills

BigData Ecosystem: Hadoop, MapReduce, Yarn, Spark, Hive, Impala, HBase, Kudu, Steamsets, Kafka and Cloudera DataScience Workbench

Cloud Computing : Amazon Web Services

Security: Kerberos and Sentry

Programing Languages: Java, VB

Scripting: Unix Shell Scripting and Python Scripting

Business Intelligence Tools: Tableau, Spotfire

Relational Databases: Microsoft SQL Server 2000, 2008 and 2012, Sybase, MySQL

Operating Systems: Oracle Linux 6, RHEL 7, Windows NT / 98 /2000/ XP, Windows 8, Windows 10

Specialized Training and Certifications

Cloudera Certified Hadoop Administrator CCA131

Microsoft Certified Professional MCP

Microsoft Certified Technology Specialist MCTS (Database Development, Implementation and Maintenance)

Work Experience

Hadoop Admin

Client: Statestreet Bank and Trust Company 11/2017 - Present

Domain: Banking

Location: Quincy, MA

Responsibilities:

Installing, configuring, upgrading and maintaining Cloudera distribution of Hadoop on Linux machines and corresponding services like HDFS, Yarn, Hive, Impala, HBase, Kudu, Spark2, Spark On Yarn, Streamsets, Cloudera Datascience Wokbench

Administering Oracle Enterprise Linux 6 and RedHat Enterprise Linux 7 on the CDH cluster nodes. Developing Unix shell scripts for automation of redundant tasks

Setting up and maintaining MySQL/MariaDB relational database used for Metadata Storage for Hive, Cloudera Navigator and other Cloudera Management Services. Setting up MySQL/MariaDB replication for Backup and Disaster Recovery

Configuring MIT Kerberos for authentication and Sentry for role based fine grained authorization

Creating pipelines via Streamsets using multiple origins, destinations, processors and executors to transfer batch/real time data into and out of Hadoop. Providing samples and demo walkthroughs to new Steamsets Users and working with Streamsets Vendor Support to troubleshoot issues

Creating sample Models and running experiments on Cloudera DataScience Workbench to demo to new users exploring Machine Learning and Artificial Intelligence usecases. Supporting corresponding usecases once promoted to higher environments and working with Cloudera Vendor Support to troubleshoot issues

Checking and tuning CPU and Memory both on a system level and also on YARN Resource Management level to make sure Mapreduce, Spark and Impala workloads on the platform are running at their optimum level

Integrating Kerberized Impala with Business Intelligence tools like Tableau and Spotfire. Benchmarking and testing in collaboration with Application teams to configure the optimum Impala settings for performance

Developing and testing Java, Python programs to use spark and read from/ write to Kerberized HDFS, Hive and HBase and provide samples to Application Users on the platform

Working with Business and Application owners to understand high level requirements for onboarding to the BigData and Analytics platform. On a multi tenant platform hosting more than 60+ applications and usecases, working with Application Development, SQA and DevOps teams to ensure successful Production deployments on the BigData and Analytics platform

Reviewing Hive/Impala/Spark workloads of different teams, troubleshooting and providing suggestions for performance boost, while making sure the platform stability is not negatively impacted

Troubleshooting and solving issues related to Operating System, Database and Big Data & Analytics platform

Tracking production tickets in Service Now and providing necessary support

Hadoop Engineer

Client: The Nielsen Company 12/2015 – 11/2017

Domain: Market Research

Location: Lancaster, PA

Responsibilities:

Analyzed the feasibility for migrating data from legacy databases to Hadoop distributed file system

Created sqoop jobs to import data from Oracle to HDFS. Tuned the required mappers by understanding the dataset sizes

Created raw/transformation data tables and consumption views in Hive using Hive Query Language

Created and tested jobs to perform transformation on raw data using Hive Query Language

Coordinated with end clients to understand their querying needs and designed Hive partitioning strategy accordingly

Analyzed Hive Query profiles to identify and resolve performance issues

Experienced in Sentry role based authorization for tables and views

Used Webhdfs/httpfs to copy files from client nodes to HDFS directly

Created oozie workflows to schedule jobs with sqoop and hive actions

Experienced in refreshing Impala with Hive tables and Views. Expertise in performance tuning of Impala by following best practices like computing table and column statistics, join optimization, partition pruning

Experienced in integrating Impala with Tableau and performing load tests to benchmark the memory requirements in Impala

Performed High Availability and Disaster Recovery tests in collaboration with Admin team

Explored Apache Spark Streaming for data transformation on Flume event data

Hands on Knowledge in installation, configuration and administration of Cloudera Distribution of Hadoop clusters

Hands on Knowledge in configuring NameNode High Availability and Resource Manager High Availability

Hands on Knowledge in key administrations concepts like Trash/Snapshot for backups, balancing data across all data nodes, analyzing log files of different services

SQL/Java/Unix Developer

Client: The Nielsen Company 02/2014 – 12/2015

Domain: Market Research

Location: Cherry Hill, NJ

Responsibilities:

Understood the Customers requirement and high level business need to implement solution meeting the business requirements.

Responsible for Detailed design - Preparation of High Level & Low Level Design documents

Developed Stored Procedures, Triggers, Cursors, Tables, Views, SQL Joins required by various applications on the database

Developed Unix shell scripts to automate redundant tasks and scheduled the same via Autosys

Involved in the design and development of code rewrite from C++ to Java for an application that uses the latitude and longitude of the customers to suggest the nearby stores as part of Market Research

Responsible for enhancement and maintenance of HelpDesk application which is used by Operations team to communicate with customers, run daily, weekly, monthly and annual maintenance jobs, process incentives, shipping and labeling of gifts etc

Responsible for Unit Test case preparation and Unit testing

Packaged the developed components to be deployed in production environment

Worked on the production tickets on daily basis and provided Impact Analysis, resolution and DBCRFs and SCRFs

Communicated with other teams to work on the tickets related to other interfaces

Documented critical modules for future uses

Involved in POC for migrating data from RDBMS to Hadoop Distributed File System

SQL/Java/Unix Developer

Client: AIG/Chartis – Commercial Insurance 12/2009 - 02/2014

Domain: Insurance

Location: Chennai, India

Responsibilities:

Prepared Design documents by analyzing the impact of different interfaces and downstream systems dependant on AIWCS (Workers’ compensation policy underwriting system)

Developed Stored Procedures, Triggers, Cursors, Tables, Views, SQL Joins and other statements for various applications, maintained referential integrity and implemented complex business logic

Developed UNIX Shell scripts to automate repetitive database processes

Involved in debugging Java code of existing applications in order to resolve bugs

Made code enhancements in Java as part of application upgrade

Developed complex queries based on business team’s needs to derive insights out of huge datasets

Scheduled daily, weekly, monthly and yearly reports based on requirements from business

Used PVCS for labeling and versioning the code components

Prepared Unit test plan and System Test plan and took part in the testing

Provided UAT support and Production support

Participated in weekly status meetings with onsite

Education

SRM University (Chennai, India)

Bachelor of Technology, Electronics and Communications Engineering

Contact this candidate