Manager Data

Location:

Brooklyn, NY

Salary:

$130K/Yearly

Posted:

November 12, 2020

Contact this candidate

Resume:

Muhammad Sufian

Hadoop Administrator

************@*****.***

Contact: +1-646-***-****

https://www.linkedin.com/in/muhammad-sufian-94620418b/

Professional Summary:

** ***** ** ********** ** Information Technology including 6 years of experience in Big Data/Hadoop ecosystem.

Hands-on experience on components like HDFS, Yarn, Oozie, Hive, Impala, Tez, Sqoop, Spark, HBase, Hue, Sentry, Ranger, Cloudera Manager, Apache Ambari and Zookeeper.

Experienced in cluster installation, configuration, and data management and monitoring.

Involved with daily troubleshooting, maintaining performance, query optimization and load balancing after large volume of deployment.

Hands on experience on managing cluster resource using yarn components like resource manager, node manager, resource pool and scheduler.

Experienced in code management using git and bitbucket.

Knowledge in The Jupyter Notebook that we can use to create and share documents that contain live code, equations, visualizations, and text.

Knowledge in Great American Supplemental Benefits is an ideal strategic fit with Lighthouse’s growth plans to expand our, presence in the U.S. Individual and seniors segments through a broad range of supplemental health solutions.

Installation and configuring Kerberos, syncing users from Active Directory and managing authentication.

Experience in setting up Kerberos principals and testing HDFS, Hive, Impala and Spark access for the new users.

Highly Experience to automate Hive, regular spark processes, spark ETL processes and sqoop using oozie and schedule them according to client requirement.

Experienced in using Directed Acyclic Graph (DAG) to recover RDD, ensuring fault tolerance and better global optimization than systems like MapReduce.

Highly experience in troubleshooting data pipeline issues and monitoring them on daily basis.

Maintaining cluster security by implementing LDAP authentication and SSL/TLS.

Experienced in both Cloudera CDH/CDP and Hortonworks HDP environment.

Implementing database level authorization by setting up Sentry, Ranger and by implementing policies for users and groups.

Implementing file level authorization using FACL and directory zoning.

Monitoring data governance using Cloudera Navigator.

Cluster monitoring, instance management, service maintenance and configuration deployment using Cloudera Manager, Apache Ambari.

Performance tuning and benchmarking of different components in a Hadoop cluster.

Configuring cluster level High Availability (HA) by implementing Standby Namenode, journal nodes and zkfc, active lock using zookeeper.

Configuring service level High Availability (HA) of Hive, Impala, Hue using load balancer as well as Resource Manager and HBase.

Setting backup and Disaster Recovery by configuring peers and scheduling backup jobs.

Ensuring cluster performance by implementing best practices for HDFS, YARN, Hive and Impala.

Experience in using Spark Sql for analyzing data and create RDD, convert them to data frame or data set for development and finally save those as hive table in hive warehouse.

Experience in setting up and configuring external database as metastore.

Hands on experience in managing and configuring Linux/Unix based Operating systems for cluster management.

Importing/exporting data from RDBMS to hive and hdfs using sqoop and hive.

Transferring files between hosts and clusters using winscp, scp and distcp.

Knowledge to handle a growing volume of connected data; we can go for Neo4j, a non-relational graph database that's optimized for managing relationships.

Analyzing log files, enabling DEBUG mode to find out the root cause of a misconfiguration or error.

Experience in submitting jobs using YARN, Hive, Impala, pyspark.

Experience in deploying and configuring Hadoop cluster for different environments with VMWare and Amazon Web Services (AWS).

Host management by commissioning and decommissioning the data nodes and using hdfs commands.

Assist to manage, coordinate and implement software and OS upgrades, patches, hot fixes on servers, workstations, and network hardware.

Documentation of different events, deployment and configuration changes of the cluster.

Manage resource pool and scheduler.

Experience in supporting 24/7 on call support for client teams.

EXPERIENCE:

Hadoop Administrator October 2019 to Current

Lighthouse Guild,

New York, NY

Job Responsibilities:

Created a new project solution based on the company's technology direction ensured that infrastructure services are projected based on current standard.

Implemented HA for name node and HUE using Cloudera manager

Created and configured cluster monitoring service activity monitor, service monitor, report manager, Host Monitor, event server and alert publisher.

Configured HAProxy for IMPALA and hive service.

Creating hdfs snapshots for hive backup.

Created SQOOP scripts for ingesting data from transactional systems to Hadoop.

Worked with application teams to install the operating system updates, Hadoop updates, patches, version upgrades as required.

Designed, develop data processing and streaming pipelines using NIFI.

Created scripts for automating balancing data across the cluster using the HDFS load balancer utility.

Experience on daily troubleshooting, monitoring, maintain performance, query optimization and load balancing after large deployment.

Created POC for implementing different use cases such as usability of CDP, spark etc.

Working experience on maintaining MySQL database, setting up the users and maintaining the backup.

Implemented Kerberos Security Authentication protocol for existing cluster.

Integration of clusters with LDAP.

Implemented TLS for CDH Services and for Cloudera Manager.

Working with application teams to set up new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals.

Managed the backup and disaster recovery for cluster using BDR utility tool.

Analyze errors, configurations and find out root cause to minimize future system failure.

Configuring and troubleshooting job performances and capacity planning.

Support application teams on choosing the right file formats in Hadoop file systems like Text, Avro, Parquet and compression techniques such as Snappy, bz2, LZO.

Substantially improved all areas of the software development life cycle for the company products, introducing frameworks, methodologies, reusable components and best practices to the team.

Environment: Cloudera's distribution Hadoop (CDH) 5.x/CDP 7.x, Map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Cloudera Navigator, Airflow, Hue, Talend, TLS, LDAP, MySQL. Graph database-neo4j.

Hadoop Consultant March 2017 to Sept, 2019

MetLife

Cary, NC

Roles and Responsibilities:

Administration & Monitoring Hadoop.

Worked on Hadoop Upgradation from lower version to current version.

Configuring and troubleshooting Hadoop cluster job performance and capacity planning.

Installing, Upgrading and Managing Hadoop Cluster on Hortonworks distribution.

Closely worked with Hortonworks support team to resolve the issues by raising tickets.

Worked on fixing the cluster issues and Configuring High Availability for Name Node in HDP

Responsible for managing and scheduling jobs on Hadoop Cluster.

Replacement of Retired Hadoop slave nodes.

Performed updates of Hadoop Yarn and MapReduce memory settings.

Worked with DBA team to migrate Hive and Oozie meta store Database from PostgreSQL to MySQL.

Worked with fair and capacity schedulers, creating new queues, adding users to queue, increase mapper and reducers capacity and also administer view and submit MapReduce jobs.

Experience in Administration/Maintenance of source control management systems, such as GIT.

Operations - Custom Shell scripts, VM and Environment management.

Experience in working with Amazon EC2, S3, and Glaciers for POC.

Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/Ubuntu) and configuring launched instances with respect to specific applications.

Worked with IAM service creating new IAM users & groups, defining roles and policies and Identity providers.

Loaded data from Oracle, MS SQL Server, MySQL, Flat File database into HDFS, HIVE.

Fixed Namenode partition failure, MR job failure with too many fetch failures and troubleshooting common Hadoop cluster issues.

Maintaining Github repositories for Configuration Management.

Managing cluster coordination services through Zoo Keeper.

Securing cluster by implementing SSL/TLS, LDAP and Kerberos.

Deal with the several services restart and killing the process to clear the alert.

Monitoring Log files of several services, clear files in case of Diskspace issues on share this node.

Environment: Red hat, Hortonworks HDP 3.x, Yarn, Hive, Kafka, Spark, Sqoop, HBase, MySQL, Oozie, AWS (S3, EC2, IAM,EMR), Git, NIFI, LDAP, Kerberos. SSL/TLS.

Hadoop System Admin Sep, 2014 to Feb, 2017

NBCUniversal Media, LLC

Englewood Cliffs,

New Jersey, NJ

Responsibilities:

Worked on Hadoop cluster scaling in development environment to pre-production stage and up to production.

Involved in complete Implementation lifecycle, specialized in writing custom MapReduce and Hive programs.

Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.

Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.

Worked with Puppet for application deployment.

Configured Kafka to read and write messages from external programs.

Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.

Developed MapReduce and Spark jobs to discover trends in data usage by users.

Implemented Spark using Python and Spark SQL for faster processing of data.

Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Used the Spark -Cassandra Connector to load data to and from Cassandra.

Real time streaming the data using Spark with Ranger

Developed several business services using Java RESTful Web Services using Spring MVC framework.

Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.

Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.

Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).

Implemented test scripts to support test driven development and continuous integration.

Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.

Responsible to manage data coming from different sources.

Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.

Used File System check (FSCK) to check the health of files in HDFS.

Developed the UNIX shell scripts for creating the reports from Hive data.

Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively. Involved throughout Software Development Life Cycle (SDLC).

Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)

Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.

Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.

Spark Streaming collects this data from Kafka in near-real-time and performs necessary.

Transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).

Configured Kerberos for the clusters.

Environment: Hadoop, Map Reduce, HDFS, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Java, AWS, GitHub.

Quality Assurance Engineer March, 2011 to Aug, 2014

VeriFone

Miami, FL

Role and achievements:

Design and development of Test Plans, Test Cases based on the requirement document like BRS, FRS.

Strong knowledge of scripting and automation tools and strategies, e.g. Shell, Python, PowerShell

Performed defect review meetings with Business Analysts, Developers, and QA Team Members; and worked closely with them to ensure the scope of testing and to mitigate defects prior to testing

Involved in defect tracking and defect management using HP ALM.

Arranging demos with the client on modular functionalities delivered.

Provide QA estimates, QA progress report, defects report and test execution results to key stakeholders.

Helping the QA Team to identify the test scenarios and reviewing the test cases.

Executing SQL queries to perform database testing.

Performed browser compatibility and configuration testing on various browsers such as Chrome, Firefox, and IE.

Develop automation test script using UFT.

Provided UAT test support for Minute Clinic testing.

Execute automation test scripts in batch mode, Analysis the test result and log the defect.

Conducted Regression testing after the bugs fixed by the development team.

HP ALM used and have in depth knowledge of Test lab, Test plan, Defect module and Requirements module.

Work with Jenkins and CI tools to automate software delivery (build, test, deploy).

Review User Stories, Compliance Plan, Release Plan, Stage-Gate Assessment including Analysis of functional and business requirement documents such as BRD, FRD, and Process Flow.

Environment: Map Reduce, HDFS, Ambari, Hive, Oozie, SQL, DB2, MS-Office, Out Look, Power Point, MS Excel, Alteryx, Flume, Cassandra, Scala, Java, AWS, GitHub, QC/ALM, JIRA, TD, QTP/UFT

EDUCATION:

Masters: Master’s of Commerce, National University, Dhaka.

Contact this candidate