Hadoop Developer Data

Location:

Queens, NY

Salary:

120000

Posted:

October 06, 2022

Contact this candidate

Resume:

MD Fazlay Rabbi

929-***-****

*********@*****.***

United States Citizen, New York

PROFESSIONAL SUMMARY

• Over 6+ years of overall IT Industry and Software Development experience with 5 years of experience in Hadoop Development

• Experience in installation, upgrading, configuration, monitoring supporting and managing in Hadoop clusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3 on Ubuntu, RedHat, CentOS systems.

• Worked on components of CDH and HDP including HDFS, MapReduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark and Kafka.

• Deployed Hadoop clusters on public and private cloud environments like AWS and OpenStack.

• Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.

• Experienced in Administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.

• Experienced in performing backup, recovery, failover and DR practices on multiple platforms.

• Implemented Kerberos and LDAP authentication of all the services across Hadoop clusters

• Experienced in automating the provisioning processes and system resources using Puppet.

• Implemented Hadoop-based solutions to store archives and backups from multiple sources.

• Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and used fast loaders and connectors

• Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.

• Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required.

• Imported and exported data in and out of HDFS and processed data for commercial analytics

• Installed, monitored and performance tuned standalone multi-node clusters of Kafka.

• Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and HBase.

• Implemented authentication using Kerberos and authentication using Apache Sentry.

• Exposure to installing Hadoop and its ecosystem components such as Hive and Pig.

• Experienced in collaborative platforms including Jira, Rally, SharePoint and Discovery

• Experience in understanding and managing Hadoop Log Files, experience in managing the Hadoop infrastructure with Cloudera Manager.

• Experienced on SQL DBA in HA and DDR like replication, log shipping, mirroring and clustering and database security and permissions

• Experienced in upgrading SQL Server software, patches and service packs

• Experience in providing good production support for 24x7 over weekends on rotation basis.

PROFESSIONAL EXPERIENCE

Hadoop/Spark Developer

State Street Bank & Trust

Boston, Massachusetts June 2021 – August 2022

Responsibilities:

# Gone through onboarding documents, raised request and got access of laptop/VDI.

# Raised and approved SailPoint request to access HDFS (EDGE node access) also raised several software requests in AppStore, Security Access Management, Clarity access.

# Created VM, took ccms access, rtc-prod access, git access.

# Created Autosys jobs for different servers (uat/uat2/dev).

# Created Ingestion and Transformation jobs in UAT2 region.

# IntelliJ project installation, maven dependency installs, creating packages.

# Pull source code of ftb_dal (Data Lake) and compile project.

# Set proxy settings for incoming maven projects.

# Prepared Excel sheet for versioning of images for UNIX access and shared with team.

# Worked with Investors file, prepared JIL (Job Information Language) Files for UAT & BUAT region.

# Also worked with Investor Ingestion jobs in UAT/UAT2/BUAT regions.

# Prepared Investors Transformation JIL for various regions.

# Had to write Scala code for File name validation.

# Worked with Exception Process Framework in Data Lake.

# Worked with SOP for FTB entitles_V2.7 modification.

# Worked with HIVE Database tables to test the “Delete Rows” script in DEV region.

# Worked with UTF-8 format.

# Worked with Apache Spark for data processing, spark job submit, troubleshooting and performance tuning.

# Worked with New Investor files named “IIS Retiree Services”.

# Worked with Trust Data Lake Investors AML (Anti Money Laundering) Consolidation full and Delta file & Recon log generation.

# Also worked in TDL Brokers AML consolidation full and Delta file and Recon log generation.

# Building high performance batch and interactive data processing applications using Apache Tez.

# Created HIVE Tables for IIS Retiree service US & NON-US full and Delta file in UAT.

# Developed source to FTB ETL servers’ connectivity & file transfer.

# Implemented transformation code for IIS Retiree service US files.

# HIVE tables creation for IIS Retiree Services full and delta file in UAT & UAT2.

# Created Autosys jobs for IIS Retiree Service US file Consolidation.

# Created Autosys jobs for IIS Retiree service Recon Log.

# Supported existing Hadoop applications Development and Production time.

# Supported the Applications during the production time on a rotation basis with the offshore team.

# TDL Investors AML Consolidation full & Delta file and recon log file generation.

# TDL Investors AML PEP NN Incorporate Exception process framework.

# TDL Brokers AML consolidation full & delta file and recon log generation.

# Tested all the Autosys jobs in UAT region in Different test cases and created Document with all the unit test case results.

# Tested all the Autosys jobs and also ran with commands, got Exception process failure email and wrote a python script in UAT server to count column numbers in each line.

# Tested and created an EP report for IIS Retiree Service Non-Restricted files for UAT Region and shared the Document with the Team.

# Tested Autosys Jobs for IIS Retiree Service in Both DEV and UAT region. Created several documents and shared with team daily basis.

# Also worked with scenario tracking Document (End to End negative testing) for IIS retiree service.

# Involved in agile methodologies, daily standup meetings, Iteration planning meetings.

Hadoop Developer

Bank of America

Jacksonville, Florida Nov 2020 - Feb 2021

Responsibilities

• Took all the training given by the Authority to Become Comfortable with Bank of America DG supervision Hadoop Team.

• Had to use JIRA to follow the assigned tasks on a daily basis.

• DB2 to HIVE ingestion:

• Had to create STG (stage) and PERM (permanent) tables for HIVE imitating DB2 schemas and tables, which is necessary for ingesting data from DB2 to HIVE. Also created insert queries for each table for inserting data from stage to perm tables.

• Created a load metadata entry file by using Load_Metadata_Table_Description from Sqoop1HFramework and entered metadata for each table in the Hadoop file system.

• Executed the necessary commands in Edge Node to run ingest operation successfully.

• Autosys Job Box:

Carried out the responsibility to create an autosys box job for creating new data pipelines.

Had to collect and populate insert_job, box_name, command, description and profile for creating successful Autosys jobs which run in different time intervals.

• Production Level Metadata creation job for more than 100 tables following given Template and Standard.

• Table Metadata.

• Element Metadata.

• Had to collect all the data columns from BSDWeb. Followed given Instruction and Class-word rules also.

• Submitted all the Tasks of the Sprints on Time.

Big Data Developer

GAP Inc.

Columbus, Ohio June 2017 – Oct 2020

Responsibilities

• Developed architecture document, process documentation, server diagrams, requisition documents

• Real streaming the data using Spark with Kafka.

• Worked on all major distributions of Hadoop Cloudera and Hortonworks.

• Involved in agile methodologies, daily scrum meetings, Spring planning's Worked on Hadoop cluster with 350 nodes on Cloudera distribution 7.7.0.

• Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.

• Running Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).

• Experienced in Spark streaming, which collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Mongo DB

• Responsible for installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.

• Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.

• Built NiFi dataflow to consume data from Kafka, make transformations on data, place in HDFS & exposed port to run spark streaming job.

• Assisted in exporting analyzed data to relational databases using Sqoop.

• Informatica Power center is used for Data integration.

• Experienced to connect & fetch data from different heterogeneous source and processing of data. Can connect to an SQL Server Database and Oracle Database both and can integrate the data into a third system.

• Informatica has the ability to publish the database process as web services, conveniently, easily and speedily. Informatica helps to balance the load between database box & ETL server, with coding capability.

• Configured Spark Streaming to receive real-time data from the Apache Kafka and store the stream data to HDFS using Scala.

• Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.

• Hands on with real time data processing using distributed technologies Storm and Kafka.

• Created demos in Tableau Desktop and published onto Tableau server.

• Worked on motion chard, Bubble chart, drill down analysis using Tableau desktop.

• Experienced in PySpark API written in python. Worked on querying data using Spark SQL on top of pyspark engine.

• Worked with PySpark library to apply SQL-like analysis on a huge amount of structured or semi-structured data.

• Involved in deployment of services through containerization (Docker, Kubernetes, and/or AWS Elastic Container Service (ECS).

• Complete end to end design and development of Apache NiFi flow which acts as the agent between the middleware team and EBI team and executes all the actions mentioned above.

• Developed the Sqoop scripts to make the interaction between Hive and vertical Database.

• Processed data into HDFS by developing solutions and analyzed the data using Map Reduce PIG, and Hive to produce summary results from Hadoop to downstream systems.

• Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route53, SES and SNS.

• Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.

• Streamed AWS log group into Lambda function to create service now incident.

• Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.

• Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into HBase.

• Scheduled several times based Oozie workflow by developing Python scripts.

• Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.

• Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.

• Responsible for installing Talend on multiple environments, creating projects, setting up user roles, setting up job servers, configure TAC options, adding Talend jobs, job failures, on-call support and scheduling etc.

• Enabled load balancer for impala to distribute data load on all Impala daemons across the cluster.

• Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using a shell script, Sqoop, package, and MySQL.

• Experienced with HIVE Data warehousing which aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence.

• Analyze business requirements and segregate them into Use Cases. Created Use case diagrams, activity diagrams, Sequence Diagrams.

• Organized JAD sessions to flush out requirements, performed Use Case and workflow analysis, outlined business rules, and developed domain object models.

• End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related.

• Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.

• Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.

• Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.

• Created partitioned tables and loaded data using both static partition and dynamic partition method.

• Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.

• Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop

• Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.

• Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.

• Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA

• Involved in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, Elastic, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau, Machine Learning.

HADOOP DEVELOPER

Deutsche Bank,

New York, NY November 2016 – May 2017

Responsibilities

• Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, Hive and Sqoop.

• Provided application demo to the client by designing and developing a search engine, report analysis trends, application administration prototype screens using AngularJS, and Bootstraps.

• Took the ownership of complete application Design of Java part, Hadoop integration

• Apart from the normal requirement gathering, I participated in a Business meeting with the client to gather security requirements.

• Implemented Apache NiFi to allow integration of Hadoop and PostgreSQL into day to day usage SDS of team's projects.

• Design and implement ETL jobs using BIML, SSIS and Sqoop to move from SQL Server to Impala.

• Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation

• Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data

• Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster

• Responsible in working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics

• Written event-driven, link tracking system to capture user events and feed to KAFKA to push it to HBASE.

• Created MapReduce jobs to extract the contents from HBase and configured in OOZIE workflow to generate analytical reports.

• Developed the JAX- RS web services code using Apache CXF framework to fetch data from SOLR when the user performed the search for documents

• Participated in SOLR schema, and ingested data into SOLR for data indexing.

• Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format

• Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.

• Created Talend Development Standards. This document describes the general guidelines for Talend developers, the naming conventions to be used in the Transformations and development and production environment structures.

• Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications

• Good knowledge in creating data frames using Spark SQL. Involved in loading data into Cassandra NoSQL Database

• Implemented record level atomicity on writes using Cassandra Written PIG Scripts to query and process the Datasets to figure out the patterns of trends by applying client-specific criteria, and configured OOZIE workflows to run the jobs along with the MR jobs

• Stored the derived the results in HBase from analysis and make it available to data ingestion for SOLR for indexing data

• Involved in integration of java search UI, SOLR and HDFS Involved in code deployments using continuous integration tool using Jenkins

• Documented all the challenges, issues involved to deal with the security system and Implemented best practices

• Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work

• Handled onsite coordinator role to deliver work to offshore Involved in core reviews and application lead supported activities

Environment:

Apache Hadoop, Hive, Scala, PIG, HDFS, Cloudera, Java Map-Reduce, Python, Maven, GIT, Jenkins, UNIX, MySQL, Eclipse, Oozie, Sqoop, Flume, Oracle, JDK 1.8/1.7, Agile and Scrum Development Process, NoSQL

Bachelors (BSc) in Computer Science (CS) from The City University of New York - Brooklyn, New York.

Contact this candidate