Post Job Free
Sign in

Big Data Management

Location:
Denton, TX
Posted:
July 01, 2024

Contact this candidate

Resume:

Summary: Results-driven IT professional with ** years of experience, specializing in Hadoop Administration and Big Data Technologies for the past decade. Demonstrated success in implementing and supporting enterprise Hadoop environments, both on-premise clusters and cloud-based platforms. Proficient in installation, configuration, and management of Hadoop Clusters using Hortonworks and Cloudera distributions.

Key Highlights:

●Expertise in performance monitoring, tuning, and troubleshooting, with a track record of resolving complex technical issues.

●Skilled in working collaboratively with vendors and internal teams to identify root causes and implement effective solutions.

●Proven experience supporting and upgrading Cloudera Data Hub, Cloudera Manager, and Cloudera Navigator.

●Adept at designing, configuring, and tuning replication processes, such as BDR, for efficient data management.

●Strong proficiency in setting up and securing various Hadoop components, including Kafka, Impala, Hue, Hive, Yarn, Oozie, Sentry, and Key Trustee.

●Extensive background in setting up new clusters with a focus on security, user authentication, backup/recovery, data replication, failover, and load balancing.

●Demonstrated ability to design and optimize clusters for changing workloads and user/data growth.

●Skilled in integrating 3rd party tools into Hadoop environments and conducting proof of concepts for enhancing functionality.

●Experienced in implementing security measures like Ranger/Sentry, Data at REST, Data in Transit, and Kerberos.

●Proven expertise in monitoring and troubleshooting Linux memory, CPU, OS, storage, and network-related issues.

●Hands-on experience in analyzing Hadoop and ecosystem service log files to identify and address root causes.

●Proficient in planning, installing, and configuring Hadoop clusters using Cloudera and Hortonworks distributions.

●Skilled in data import/export using Sqoop between HDFS and Relational Database systems.

●Capable of creating and enhancing data pipelines with Apache Nifi to facilitate data ingestion from diverse sources.

●Implemented end-to-end secured streaming solutions using open source tools to reduce licensing costs.

●Exceptional problem-solving skills and ability to excel both independently and within a team.

●Excellent communication, presentation, and interpersonal ability

Experience

NCR. 05/20 - Till date

Big Data Administrator

●Design, deploy and architect big data clusters and ingestion frameworks

●Responsible for transitioning the organization from regular rdms to hadoop.

●Designed the cluster in respect with amount of data being brought to the cluster and number of resources needed to run the cluster

●Integrated with external tools to run analysis like Logi Analytics for the ETL.

●Migrated data from cloud to on premise cluster using AWS snowball and other AWS tools working with network and infosec teams on ports and security details.

●Designed and developed the data ingestion process from sql servers to Hadoop using ssis jobs and spark jobs.

●Designed job schedules co-ordinating with different countries in europe, spain and The US

●Designed, deployed and upgraded cdp cluster on OneFS Dell EMC Isilon

●Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.

●Changing the configurations based on the requirements of the users for the better performance of the jobs

●Experience with automation (dashboards, alerting, scripting) Automated deployments using cluster shell, Ansible playbooks cut down the execution time by 50%

●Design, deploy and architect big data clusters and ingestion frameworks.

●Responsible for transitioning the organization from regular rdms to hadoop.

●Designed the cluster in respect with the amount of data being brought to the cluster and number of resources needed to run the cluster Integrated with external tools to run analysis like Logi Analytics for the ETL.

●Migrated data from cloud to on premise cluster using AWS snowball and other AWS tools working with network and infosec teams on ports and security details.

●Designed and developed the data ingestion process from sql servers to Hadoop using ssis jobs and spark jobs. Designed job schedules co-ordinating with different countries in europe, spain and The US Designed, deployed and upgraded cdp cluster on OneFS Dell EMC Isilon

●Upgrading CDH to CDP

●Converting Sentry RBAC to Apache Ranger Authorization.

●Skills: Hadoop Administration · Data Architecture · Extract, Transform, Load (ETL) · Data Migration · Isilon · Network-Attached Storage (NAS) · Apache Spark · Big Data · Hadoop

Inmar Intelligence. 05/17- 05/20

Hadoop Administrator

●Responsible for 4 clusters ranging from Dev, Test, Stage and prod

●Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.

●Changing the configurations based on the requirements of the users for the better performance of the jobs

●Automated deployments using cluster shell, Ansible playbooks cut down the execution time by 50%

●Designed, Installed and configured MAPR cluster.

●Cluster benchmarking with memory, network, IO and disk tests and comparing them with DFSio and sorting tests.

●Implementing POC for big data tools like CASK.

●Configured Spark on Yarn on a MAPR Cluster decreasing the spark programs run time by 50%

●Configuring security with Truststores and keystores for the DATA and REST.

●Implementing security using Kerberos and Active Directory

●Created Authorization policies for all the Ecosystem components in Ranger, Sentry like tools

●Troubleshooting developer issues daily addressing performance bottlenecks and mentoring on using the big data tools

●Implemented cluster on a virtual environment using SAN disks

●Infrastructure support for the whole MDA clusters integrating with various KPIs like custom Ingest framework, Tableau, SSIS and CDAP.

●Performed upgrades, backups on Ambari and HDP Stacks on secured cluster

●Capacity scheduling by setting up queues for different teams making sure enough cluster resources are assigned based on the team’s requirements like the variety, velocity and volume

●Configuring clusters by Setting up Users, Groups and System Settings, configure topology, Configuring Volumes and Jobs logs and scheduling

●Data access and protection setting up data access, configure client NFS Access, configure and setup control access to the cluster, configure snapshots and mirrors

●Monitoring the cluster by configuring and responding to Alarms, balancing cluster resources, managing logs and snapshots, adding and removing services

●Replacing Failed Disks, removing disks, perform node maintenance and adding nodes

●Automated deployments by writing Ansible playbooks

●Tuning Hadoop cluster for multi-tenancy and optimal resource utilization, performance and high availability

●Working on streams using kafka and setting up topics and setting up authorization policies using ACLS and Apache Ranger

●Working on security Data at Rest creating crypto zones in Hadoop for healthcare data

●Working with shell scripts to schedule log rotation, config backups,

●Working with OpsGenie to schedule the oncall and manage the oncall rotation

●Worked with AppDynamics to generate the utilization reports for yarn queues

●Monitoring the smart sense activity explorer to find the high memory, cpu intensity jobs and other metrics to help developers tune their queries

●Working with Appdev teams to help them onboard and tune their queries in Hbase, Phoenix, spark and Hive etc.

●Managing setup and operations with data tools like tableau, incorta, kognito, etc…

●Working on Data in Transit, setting up TLS and SSL for each of the Hadoop ecosystem components including Hdfs, Hbase, Yarn, Oozie, Spark, kafka etc…

●Setting up clusters on cloud in AWS using cloud break and cloud formation templates

●Setting up back scripts and running cronjobs daily for the major ecosystem components including Ambari, Ranger, Hive and Oozie

●Adding nodes from AWS creating a hybrid cluster some nodes on cloud and some physical

●Working on performance enhancements by enabling ACID and LLAP functionalities in Hive

●Working on other in memory tools like kognitio allotted some nodes using node labels and created yarn queues to isolate them from stepping all over the cluster.

●Assisting with issues that came up related to the PFM Raw Claim project, which is a huge initiative to get all their claims loaded into Hadoop

●Built a secured Kafka cluster with SASL_SSL, Kafka monitor, Kafka manager and zoo navigator.

●Implemented Kafka cluster monitoring with Prometheus and Grafana.

●Added Authentication and authorization to Kafka cluster with Kerberos and ACLS.

●Carried out migration of topics successfully from existing cluster to the newly built secured cluster using Kafka mirrors.

AT&T. 09/13- 05/17

Hadoop Administrator

●Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.

●Changing the configurations based on the requirements of the users for the better performance of the jobs

●Automated deployments by writing Ansible playbooks

●Tuning Hadoop cluster for multi-tenancy and optimal resource utilization, performance and high availability

●Troubleshooting Hadoop/Yarn/MapReduce

●Experience tuning and troubleshooting MapReduce and general YARN jobs for optimal performance, both on the Command Line as well as via provided UI's.

●Providing High Availability NameNodes using Quorum Journal Managers and Zookeeper Failover Controllers Extensive knowledge of various configurations parameters available in mapred-size.xml and yarn-site.xml and locations of various log files Usage of JobHistoryServer (and newer App Timeline Server) UI & API to monitor cluster usage and overall job health, forensics etc

●Implemented Dominant Resource Calculator and capacity scheduler queues to enhance the performance of the cluster

●Experience with Hive, HiveServer2, and the Hive Metastore, tuning and optimizing. Knowledge of Hive SQL and SQL syntax and usage (selects, inserts, joins). Bonus points for experience with Tez, and newer file formats such as ORC (Optimized Row Columnar) and why they may be superior to Text, Sequence and RC formats, but also what are some tradeoffs

●Using Ambari administering large Hadoop clusters (> 100 to > 1000's of physical nodes) Ambari REST API for automating common tasks along with the monitoring of the overall cluster health.

●Oozie knowledge and experience developing, deploying Oozie workflows, including coordinator flows and Oozie actions such as the HiveAction

●Deploying and maintaining a zookeeper ensemble within and without Hadoop

●Experience using Sqoop to transfer data into and out of HDFS & RDBMS

●Experience tuning & maintaining HBase

●Adding and removing Data Nodes and Services using Ambari

FATPIIPE Inc. 05/05 -09/13

TFS and QC Admin

●Developing and executing software test plans in order to identify software problems and their causes.

●Organizing, documenting, maintaining and executing automated test scripts and

●Performing unit test and system test activities as required.

●Exporting Test cases from Quality Center to MTM using MTM migrator Tool and maintaining them

●Designed & Enhanced QTP Hybrid Driven Framework for the regression testing using modular approach undertaken with using User defined function library, shared object repository, Static data file, Calendar sheet (Master Datasheet), dynamic datasheet and user defined result sheet etc.

●Designed and developed automation test scripts using Quick test Professional and managed testing activities using Mercury Quality Center.

●Design and Implementation of the Automation framework, formulating the driver script and the requisites

●Designed Functions Library, Functions, Subroutine's, Common functions, Utilities functions, and Regular Expressions and Environmental variables using QTP in VB scripts.

●Executed automated test scripts, analyzed the results and reported bugs in the Quality center.

●Modified and Executed automated and manual test scripts for different modules using QTP.

●Followed a Hybrid framework to adopt keyword driven, data-driven and script modularity methods.

●Created repeatable user defined functions and stored them as function libraries (.vbs files)

●Development in Agile environment and following SCRUM methodology

●Doing the formal validation per the FDA regulations

●Testing the EncoreAnywhere website per the FDA regulations for Transferring Rx to the device and back to the website

●Verifying the integration of the data card utilities with EncoreAnywhere

●Validating the Rx generated on the website using the EDIU tool

●Generation and review of the test plans and procedures, including automation assessment

●Independent analysis of technical details and definition of feature test strategy, provide estimates for efforts

●QA for a distributed enterprise web server application ideally familiar with documentation practices of a regulated industry.

●Collaboration with the development team to ensure appropriate coverage

●Analyzing Requirements for HCA by actively participating in the deep dive sessions

●Preparing Test Strategy document per the analysis by leading the Offshore Team

●Ensure all the deliverables are per the schedule

●Prepared Project Plan using Microsoft Project Plan

●Preparing Quality Acceptance Level for the product discussing with client

●Coordinating with offshore for Test cases and Knowledge Transfer of the Product

●Preparing Test Cases for new Requirements and updating existing with enhancements

●Involved in review of Functional and Technical requirement documents to understand the functionalities of the project.

●Interacting with Business users, stakeholders, Subject Matter Experts and Business Analysts to understand requirements and defined the scope of the project.

●Created and managed project templates use case templates, requirement types and tractability matrix.

●Scheduled meetings with developers, system analysts and testers to collaborate resource allocation and project completion.

●Conducted meetings with Management, SME, users and other stakeholders for open and pending issues.

●Test Effort Estimation, Identification and Allocation of Resources and Coordination with the team and Management.

●Creation & Evaluation of Test plan (Test methodology, Test Strategy), Testing (Matrix) and Conduct Project review meetings with the Test team.

●Preparation of Readiness reports and QA related documents for each release.

●Execution of test cases/Defect Logging/ review & verification in Quality Center 9.2

●Responsible for Smoke test automation for the new builds using Q TP 9.2

●Developed functions using VBScript for Automation using Quick Test Pro

●Preparing Anomaly reports from Quality Center and timely reporting to client

●Developing Business Scenarios to help Clients Execute the User Acceptance Testing

●Coordinating testing efforts with the Clients UAT Testing team and Knowledge Transfer to help them perform the Business User Acceptance Testing

Education

Master of Information Technology; Manipal Academy of Higher Education

Bachelor’s degree in Computer Science; Nagarjuna University



Contact this candidate