Resume

Big Data Disaster Recovery

Location:

Levittown, NY

Posted:

September 14, 2023

Contact this candidate

Resume:

Jarina Afrin

Hadoop Admin

Long Island New York; USA

Mobile: 929-***-****

E-mail: adzpay@r.postjobfree.com

LinkedIn: http://linkedin.com/in/jarina-afrin-73446a281

Professional Summary

9+ years of experience in IT including 7 years of experience in Big data technology using Hadoop ecosystem and related tools.

Extensive knowledge and understanding of the Hadoop architecture and various components like HDFS, YARN, Name Node, Data Node, Hive, Hue, HBase, Impala, Spark, Ansible, Kafka, Python, OpenBSD, Grafana, Basic shell scripting etc.

Installing, configuring and managing ecosystem components like HDFS, Hive, MapReduce, Oozie, YARN, Zookeeper using CDH and CDP.

Experience in performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.

Experience in Hadoop cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.

Experience in minor and major upgrades of Hadoop from CDH to CDP and its Ecosystem.

Good knowledge on Linux/UNIX administration.

Enabling High Availability to HDFS, YARN, Hive, Impala services to prevent failures and maintain service availability.

Highly experiences in maintaining security by enabling Authentication and Authorization using Kerberos and Ranger.

Experience in managing UPN, SPN and Key tab for Authentication, also experience to create role, policy, encryption and masking using Ranger, KMS and Sentry.

Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development Data warehousing environments.

Facilitated data requirement meetings with business and technical stakeholders and resolved conflicts to drive decisions.

Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation.

Solid understanding of all phases of development using multiple methodologies i.e. Agile with JIRA, Kanban board along with ticketing tool.

Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop and troubleshooting for any issues.

Experience in monitoring the health of Hadoop cluster and also performing multiple administrative cluster maintenance such as commissioning /decommissioning data nodes.

Experienced in converting Hive/SQL queries into Spark transformations using Spark RDDs.

Experienced in ETL tools like talent, tableau and their usages, supports and maintenances in Big Data Environment.

Hands on experience in analyzing Log files for Hadoop and Eco-System services and finding root cause.

Practical knowledge on functionalities of Hadoop Daemons, interactions between them, resource utilization and dynamic tuning of resources to ensure efficiency of cluster performance.

Creating Resource pool, job pools, Queue (Parent, Queue child Queue) and assigning users/group to pools and restricting production job submissions based on pools to ensure optimal performance.

Experience with replicating data across data centers for disaster recovery scenarios through BDR and HDFS snapshots.

Work Experience

Hadoop Administrator

Progressive Insurance

Mayfield, OH

Jun/2022 - current

Summary:

Expert knowledge in installing, configuring, monitoring and using Hadoop components like YARN, HDFS, HBase, Hive, Tez, Sqoop, Hue, Atlas, Solar, Key management service, Key trustee server, Zookeeper, Oozie, Apache Spark, Impala, Ansible, Python etc.

Involved in migrate from CDH 6.X to 7.X and troubleshoot any problem comes with it.

Upgrade to latest version of CDP perform pre-transition steps for components deployed in clusters.

Architecting & Modelling of Data for Distributed processing platforms by proper Partitioning, Bucketing, compressions, NoSQL, Hive/Spark, Realtime processing and querying etc. for best performance and data usage by business users and customers

Implemented scheduler for Yarn resource pool and queue manager to allocate using FIFO, Fair scheduler, and Capacity scheduler.

Involved in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting

Worked on live Big Data Hadoop production environment with multi nodes.

HA implementation of Name Node to avoid single point of failure.

Automated day to day activities using shell scripting and used Cloudera Manager to monitor health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.

Installed ansible server and developed ansible playbooks to automate nightly patching activity.

Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.

Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup and performance tuning.

Implemented Oozie workflow for ETL Process for critical data feeds across platform.

Responsible in ETL tolls like scoop for bring data into cluster from different databases like RDBMS, MYSQL etc.

Implementing Kerberos Security Authentication protocol for existing cluster

Created HIVE databases and granted appropriate permissions through Ranger policies.

Transition from sentry to ranger by exporting policy rules from sentry and continuing with upgrade steps to convert Sentry policies to Ranger policies.

Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.

Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to parquet files.

Performed server failover and high availability testing of Prod and Disaster recovery servers.

Perform annual configuration review and identify environment differences to audit platform and fix inconsistencies.

Tools and components: Cloudera CDH/CDP, RHEL, HDFS, Cloudera Manager, Map Reduce, YARN, Hive, Impala, Hue, Sqoop, Oozie, Zookeeper, Spark, Cloudera Navigator, Kerberos, Apache Sentry, Apache Ranger, AutoSys, Talend, HBase, LDAP, CDP7. x.

Sr. Bigdata Consultant

Guardian Life Insurance

New York, New York

Sept/2020 - May/2022

Summary:

Installation, configuration, monitoring and maintenance of Hadoop cluster in CDH6.x environment.

Configuration of Cloudera Manager, Hive, Sqoop, Impala, Oozie, Spark, HBASE, Zookeeper.

Resource management among users and groups using different types of resource schedulers.

Creating new users and set roles according to the requirement.

Ensuring client authentication using Kerberos and authorization using Apache sentry.

Monitoring and troubleshooting jobs and log files for seamless performance and prevention of future downtime.

Supporting offshore development team to run queries and jobs using hive and Spark to analyze data, bright out meaningful insights and save the result in HDFS, further import in SQL Server for future use.

Ensuring security and confidentiality of sensitive data by periodical auditing, tracking and monitoring different client’s activity using Cloudera Navigator.

Monitoring health of HDFS, commissioning and decommissioning data nodes.

Keep track of the health of active name node and standby name node to ensure HA.

Installation and upgrading of daemons and services.

Transferring data between clusters.

Monitoring and troubleshooting ETL jobs of Spark and Hive.

Loading data from different RDBMSs like Teradata, SQL Server to HDFS or Hive warehouse using Sqoop.

Configuring jobs using Oozie.

Working with other Hadoop administrators to upgrade CDH and related components and daemons versions, to implement important configurations and bug fixing.

Integration of BI tool like Talend with Cloudera CDH for data analysis and report generation.

Tools and components: Cloudera CDH5.x, HDFS, Map Reduce, YARN, Hive, Sqoop, Oozie, Zookeeper, HBase, Impala, Cloudera Manager, Cloudera Navigator, Kerberos, Apache Sentry, Talend.

Hadoop Administrator

UBS Bank

Nashville, TN

Mar/2018 - Aug/2020

Summary:

Working on POCs to bring in new functionalities that the developers and internal Data scientists can leverage for their environments.

Troubleshooting day to day issues with the Hadoop cluster by removing Tez and spark bottlenecks.

Assisting system team to upgrade RHEL from ver. 6.x to 7.x by cross checking dependencies with HDP, JDK and related components versions, planning and provisioning for server downtime, deploying updates, configuration changes needed for the project and keeping continuous communication between system team and Hortonworks support team for successful completion of the project.

Enabling TLS/SLS on the Hadoop cluster.

Experience enabling TLS/SLS for Ambari and HDP services.

Experience setting up high availability and fault tolerance using HA Roxy.

Responsible for day-to-day activities which includes HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, Backup and restoring, capacity planning.

Name node HA Implementation to avoid single point of failure.

Involved in troubleshooting issues on the Hadoop ecosystem, understanding of systems capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.

Involved in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.

Implemented Kerberos Security Authentication protocol for existing cluster.

Experience in implementing Hadoop ACLs and authorization using Ranger.

Experience with Cloudera Navigator for Auditing Hadoop access.

Responsible for Setting up multi-Tenant cluster in Hadoop using FIFO, Fair scheduler and Capacity scheduler.

Experience setting up and configuring oozie coordinator and workflow jobs.

Experience working with SPARK and troubleshooting issues with memory to improve performance.

Involved in Analyzing system failures, identifying root causes, and recommended course of actions.

Experience in importing and exporting the data using Sqoop from HDFS from/to Relational Database systems/mainframe and vice-versa.

Experience in importing and exporting the logs using Flume.

Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.

Experienced in optimizing hive queries to handle different data sets.

Identified and created ORC formatted hive tables for high usage.

Designed and developed Oozie workflows for sequence flow of job execution.

Experienced in working with Spark eco system using Spark SQL.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.

Experience in installing, configuring Hive, its services and Meta store. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.

Experience in tuning and debugging Spark applications.

Experience with setting up data governance on Hive databases using Ranger and AD groups.

Tools and components: Hortonworks HDP 2.x, 3.x, HDFS, Map Reduce, YARN, Hive, HBase Sqoop, Oozie, Zookeeper, Apache Tez, LLAP, Apache Ambari, Kerberos, Apache Ranger, Tableau, Talend.

System Administrator (Bigdata)

Merc & co

Boston, MA

Mar/2017 - Feb/2018

Summary:

Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.

Used Sqoop to migrate data to and from HDFS and MySQL/Oracle and deployed Hive and HBase integration to perform OLAP operations.

Designed, planned and delivered a proof of concept and business function/division-based implementation of a Big Data roadmap and strategy project

Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.

Involved in exporting the analyzed data to the databases such as Teradata/MySQL/Oracle.

Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data in a timely manner.

The Hive tables created as per requirement were internal/external tables defined with appropriate static and dynamic partitions, intended for efficiency.

Transformed the data using Hive, Pig for BI team to perform visual analytics, according to the client requirement.

Developed scripts and automated data management from end to end and sync up between all the Clusters.

Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users.

Tools and components: Cloudera CDH5.x, HDFS, Map Reduce, YARN, Hive, Sqoop, Oozie, Zookeeper, Impala, Cloudera Manager, Cloudera Navigator, Kerberos, Apache Sentry, Tableau.

System Administrator

Discovery Communications

New York, New York

Oct/2014 - Feb/2017

Summary:

Created and managed users and groups, their permissions, ownerships, group privileges and related ACLs for files, folders and system services.

Managed Database workload batches with automated shell scripts and Cron utility schedules.

Managed disk space using Logical Volume Management (LVM).

Monitored virtual memory performance; swap space, disk and CPU utilization.

Monitored system performance using Nagios.

Prepared monthly system performance reports, procedures and process documents.

Experience in System Builds, Server builds Installs, Upgrades, Patches, Migration, Troubleshooting,

Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning.

Experience in deploying virtual machines using templates and cloning, taking backup with a snapshot.

Setup, configured, and maintained UNIX and Linux servers, RAID subsystems, desktop/laptop machines including installation/maintenance of all operating system and application software.

Install, configure, and maintain Ethernet hubs/switches/cables, new machines, hard drive, memory, and network interface cards.

Manage software licenses, monitor network performance and application usage, and make software purchases.

Provide user support including troubleshooting, repairs, and documentation.

Education Background

Bachelor of Computer Science

National University Dhaka

Contact this candidate