Resume

Data Engineer

Location:

Phoenix, AZ

Posted:

April 08, 2020

Contact this candidate

Resume:

Pushpalatha Tellapati

adcpv1@r.postjobfree.com Ph:510-***-****

PROFESSINAL SUMMARY:

4+ years of professional IT experience which includes experience in Big data ecosystem related technologies.

Around 3 years of hands on experience working with Hadoop, HDFS, Map Reduce framework and Hadoop ecosystem like Hive, HBase, Sqoop and Oozie.

Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.

Hands on experience in installing, configuring, and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.

Extensive experience in writing Map Reduce, Hive, PIG Scripting and HDFS.

In-depth understanding of Data Structure and Algorithms.

Hands-on experience to setup >300 node Hortonworks, MapR and PHD clusters.

Experience in managing and reviewing Hadoop log files.

Installed and configured 15 node Apache Solr.

Experience working on Greenplum Database.

Knowledge in designing, implementing and managing Secure Authentication mechanism to Hadoop Cluster with Kerberos.

Implemented Capacity Sheduler in Hortonworks and Cloudera.

Installed and configured Tomcat, HTTPD Webserver, SSL, LDAP and SSO for application called Collibra(Data Governance tool). Administored production Collibra DGC services.

Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.

Extensive experience on data lake implementation.

Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, Cassandra.

Used Zookeeper for various types of centralized configurations.

Implemented in setting up standards and processes for Hadoop based application design and implementation.

Good Knowledge in using SPARK for real time streaming of data into the cluster.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.

Experience in managing Hadoop clusters using Cloudera Manager tool.

Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Linux Redhat.

Hands on experience as Linux System Admin.

Performed Linux admin activities like patching, maintenance and software installations.

Experience managing groups of RHEL/Centos hosts at a scale of 100+ nodes, including installation and configuration for Hadoop cluster.

Knowledge in Oracle, Postgress, SQL Server and My SQL database.

Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.

Scripting to deploy monitors, checks and critical system admin functions

automation.

Hands on experience in application development using Java, RDBMS, and Linux shell scripting.

Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.

TECHNICAL SKILLS:

Big Data Ecosystem : HDFS, HBase, Hadoop MapReduce, Zookeeper, Hive,

Pig, Sqoop, Flume, Oozie, Kafka, Cassandra, Spark

Hadoop Distributions : Cloudera, MapR, Hortonworks,AWS EMR and PHD

Languages : C, C++, Java, SQL/PLSQL

Methodologies : Agile, waterfall.

Database : Oracle 10g, DB2, MySQL, MongoDB, CouchDB, MS

SQL server, RDS, Postgress DB

IDE / Testing Tools : Eclipse, STS.

Operating System : Windows, UNIX, Linux

Scripts : JavaScript, Shell Scripting

EDUCATION:

Bachelor’s in Computer Science from JNT University, India.

Masters in Computer Science from JNT University, India.

Title: Hadoop Administrator Nov. 2019 – Till Date

Client: American Express

Location: Phoenix, AZ

Responsibilities:

Managed Hadoop cluster environment including deployment, service allocation and configuration for the cluster, capacity planning, performance tuning, and ongoing monitoring.

Screen Hadoop cluster job performances and capacity planning.

Report utilization and performance metrics.

Install, maintain, and administer software on Linux servers.

Managing 500+ Node MapR cluster daily basis.

Worked on shell scripts for automation.

Implemented YARN ACL Checks, Kerberos for MapR Hadoop cluster.

Worked on installing Certs, requesting Load Balancer, Updating/creating JKS key store.

Installed and configured/enabled High availability for YARN, HDFS and HIVE.

Worked on DistCp data transfer jobs schedule from Production cluster to Analytics cluster on daily basis for Analytics team reports purpose.

Involved in writing sqoop jobs to migrate data from Oracle, sql server to hdfs.

Involved on LDAP configuration for MapR and Hue.s

Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data

Managed and reviewed Hadoop Log files

Extensively used Oozie scheduler, clear understanding of Oozie workflows, coordinators and Bundles.

Worked extensively with Sqoop for importing metadata from Oracle

Designed a data warehouse using Hive

Configured Splunk to generate alerts over system/service failures.

Working as a Production code deployment engineer in sprint basis.

Environment: Hadoop, MapR,Cloudera 5.15, MapReduce, HDFS, Hive, HBase, Kafka, Java 7 & 8, MongoDB, Collibra DGC 5.6.4, Pig, Informatica, Oracle, Informatica BDM, Linux, Eclipse, Zookeeper, Apache Solr, R and Rstudio,Control-M, Redis, Tableau, Qlikview, DataStax,Spark, Splunk

Title: Hadoop Administrator April. 2018 – Nov. 2019

Client: DELL EMC

Location: Round Rock, TX

Responsibilities:

Engineer in Big Data team, worked with Hadoop and its Ecosystem.

Installed and configured Hadoop ecosystem like HBase, Flume, Pig, Hive, Oozie and Sqoop.

Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.

Managing 500+ Node HDP and Cloudera clusters daily basis.

Worked on shell scripts for automation.

Implemented YARN ACL Checks, Ranger and Kerberos for HDP and Cloudera Hadoop cluster.

Worked on DistCp data transfer jobs schedule from Production cluster to Analytics cluster on daily basis for Analytics team reports purpose.

Involved in writing sqoop jobs to migrate data from Oracle, sql server to hdfs

Investigated and Implemented Hortonworks Smartsence recommendations.

Implemented Bug fixes on Hive and Tez as per Hortonworks recommendations.

Involved on LDAP configuration for Cloudera Manager, Ambari and Hue.

Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data

Managed and reviewed Hadoop Log files

Extensively used Oozie scheduler, clear understanding of Oozie workflows, coordinators and Bundles.

Worked extensively with Sqoop for importing metadata from Oracle

Installed and configured large scale Kafka cluster in Cloudera.

Responsible for smooth error-free configuration of DWH-ETL solution and Integration with Hadoop

Designed a data warehouse using Hive

Used Control-m scheduling tool to schedule the daily jobs.

Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Configured Splunk to generate alerts over system/service failures.

Working as a Production code deployment engineer in sprint basis.

Environment: Hadoop, HDP-2.6, Hortonworks,Cloudera 5.15, MapReduce, HDFS, Hive, HBase, Kafka, Java 7 & 8, MongoDB, Collibra DGC 5.6.4, Pig, Informatica, Oracle, Informatica BDM, Linux, Eclipse, Zookeeper, Apache Solr, R and Rstudio,Control-M, Redis, Tableau, Qlikview, DataStax,Spark, Splunk

Title: Hadoop Administrator May. 2016 – Mar. 2018

Client: Autodesk

Location: Bengaluru, India

Responsibilities:

Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Cloudera and Horton works.

Adding/Installation of new components and removal of them through Cloudera and Horton works.

Monitoring workload, job performance, capacity planning using Cloudera.

Major and Minor upgrades and patch updates.

Creating and managing the Cron jobs.

Installed Hadoop eco system components like Pig, Hive, HBase and Sqoop in a Cluster.

Experience in setting up tools like Nagios for monitoring Hadoop cluster.

Handling the data movement between HDFS and different web sources using Flume and Sqoop.

Extracted files from SQL database like MSSQL, ORACLE through Sqoop and placed in HDFS for processing.

Installed Oozie workflow engine to run multiple Hive and Pig jobs.

Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.

Installed and configured HA of Hue to point Hadoop Cluster in Cloudera Manager.

Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.

Installed and configured Map Reduce, HDFS and developed multiple Map Reduce jobs in Hive for data cleaning and pre-processing.

Kafka- Used for building real-time data pipelines between clusters.

Ran Log aggregations, website Activity tracking and commit log for distributing system using Apache kafka.

Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.

Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.

Experience in Python and Shell scipts.

Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.

Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.

Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.

Environment: Hadoop, Cloudera, Hortanworks, MapReduce, HDFS, Hive, HBase, Java 6 & 7, MongoDB, Pig, Informatica, Oracle, Informatica BDM, Linux, Eclipse, Zookeeper, Apache Solr, R and Rstudio,Control-M, Redis, Tableau,Spark, Splunk

Contact this candidate