Resume

hadoop developer

Location:

Tampa, FL

Posted:

January 19, 2021

Contact this candidate

Resume:

MOHAMMED ABDULLAH

Email ID: adjjrf@r.postjobfree.com

Phone:872-***-****

KEY SKILLS

Java, Scala, Linux, AWS

Hadoop –Apache Spark, Spark ML, MapReduce, Hive, Python,Stream Sets, Kafka, Kudu, Flume, NIFI, Oozie, Sqoop, Docker,Jenkins,Distributions: Cloudera, Horton Works

CERTIFICATIONS

Certified Apache Spark Developer Hortonworks

AWS Certified Solutions Architect – Associate

AWS Certified Developer – Associate

SUMMARY

7 years of technical expertise in complete software development life cycle (SDLC), which includes 5 years of Hadoop Development and 2 years of Core Java Development, Design and Testing.

Hands on experience working with Apache Spark and Hadoop ecosystems like MapReduce (MRv1andYARN), Sqoop, Hive, Oozie, Flume, Kafka, Zookeeper and NoSQL Databases like Cassandra.

Apache Spark:

Excellent knowledge on Spark Core architecture.

Hands on expertise in writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala, Python and Java.

Created Data Frames and performed analysis using Spark SQL.

Acute knowledge on Spark Streaming and Spark Machine Learning Libraries.

Apache Sqoop:

Used Sqoop to Import data from Relational Database (RDBMS) into HDFS and Hive, storing using different formats like Text, Avro, Parquet, Sequence File, ORC File along with compression codecs like Snappy and Gzip.

Performed transformations on the imported data and Exported back to RDBMS.

Apache Hive:

Experience in writing queries in HQL (Hive Query Language), to perform data analysis.

Created Hive External and Managed Tables.

Implemented Partitioning and Bucketing on Hive tables for Hive Query Optimization.

Apache Oozie:

Experienced in writing Oozie workflows and coordinator jobs to schedule sequential Hadoop jobs.

Apache Flume and Apache Kafka:

Used Apache Flume to ingest data from different sources to sinks like Avro, HDFS.

Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.

Excellent knowledge and hands on experience on Fan Out and Multiplexing flows.

Excellent knowledge on Kafka Architecture.

Integrated Flume with Kafka, using Flume both as a producer and consumer (concept of FLAFKA).

Used Kafka for activity tracking and Log aggregation.

SQL and NoSQL:

Good understanding of Relational Databases like MySQL.

Ability to write complex SQL queries to analyze structured data.

Version Control and Build Tools:

Experienced in using GIT, SVN.

Ability to deal with build tools like Apache Maven, SBT.

Java experience:

Excellent knowledge of Object-oriented analysis and design. Very good at analyzing the use requirements and using the design patterns.

Designed and developed Java enterprise and web applications using Java, J2EE, Spring framework, JDBC API and Hibernate.

Utilized the concepts of Multi-threaded programming in developing applications.

Implemented unit test cases and documented all the code and applications.

PROFESSIONAL EXPERIENCE

Client: CITI BANK,TAMPA,FL

Role:Sr Hadoop Developer SEPTEMBER 2019 – present

Project Description:

DDL Audience Projection

The project is about Targeted Advertising the customers by using linear STB data to provide customer viewership insight and to Determine average audiences by network using those results we Extrapolate the full footprint using average audience data.

Responsibilities:

•Building a Data Quality framework, which consists of a common set of model componentsand patterns that

can be extended to implement complex process controls and data quality measurements using Hadoop.

•Created and populated bucketed tables in Hive to allow for faster map side joins and for moreefficient jobs and more efficient sampling. Also performed partitioning of data to optimize Hive queries.

•Implemented DDL Curated Data Storelogic using Spark Scala and Data frames concepts.

•Used Spark, hive for implementing the transformations need to join the daily ingested data to historic data.

•Enhanced the performance of queries and daily running spark jobs using the efficient design of partitioned hive tables and Spark logic.

•Implemented the Spark Scala code for Data Validation in Hive

•Implemented the automated workflows for all the jobs using the Oozie and shell script.

•Used Spark SQL functions to move data from stage hive tables to fact and dimension tables in

•Implemented dynamic partitioning in hive tables and used appropriate file format, compression

technique to improve the performance of map reduce jobs.

•Work with Data Engineering Platform team to plan and deploy new Hadoop Environments and

•expand existing Hadoop clusters.

Environment:Spark, Scala, Hadoop, Hive, Sqoop, Oozie, Design Patterns, SOLID & DRY principles, SFTP, Code Cloud, Jira, Bash.

Client: ALLIANCERX, DEERFIELD,IL

Role:Sr Hadoop Developer AUGUST’2018 – SEPTEMBER 2019

Roles and responsibilities:

•Preparing Design Documents (Request-Response Mapping Documents, Hive Mapping Documents).

•Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system.

•Implemented batch processing of data sources using Apache Spark and Elastic search.

•Developed Spark RDD transformations, actions to implement business analysis

•Migrated Hive QL queries on structured into Spark QL to improve performance.

•Documented the data flow form Application Kafka, Storm, HDFS, Hive tables

•Configured, deployed and maintained a single node storm cluster in DEV environment

•Developing predictive analytic using Apache Spark Scala APIs.

•Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and JSon files, ORC and Parquet).

•Handled importing of data from RDBMS into HDFS using Sqoop.

•Achieved data cleansing processing using Pig Latin operations and UDFs.

•Analyzed data in Hive warehouse using Hive Query Language (HQL).

•Involved in creating Hive tables, loading data and writing hive queries to process the data.

•Created scripts to automate the process of Data Ingestion.

•Developed PIG scripts for source data validation and transformation.

•Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability for analyzing HDFS audit data.

Tools and technologies used: Hadoop, HDFS, Apache Spark, Kafka, Cassandra, Hive, Pig, Scala, Java, Sqoop, MYSQL, Shell scripting.

Client: NIELSEN, TAMPA,FL

Role: Sr Hadoop Developer JANUARY’2016–DECEMBER’2017

Project Description: The project was to build a Focused search engine leveraging the latest open source software, big data technologies and Java Web Services. The Search engine is for organization’s internal use and the search enginewill search the data in the organization’s Knowledge repository.

Project #1

Clickstream Analysis:

This project aims at importing clickstream data in to distributed environment and perform transformations to provide solutions that enabled the BI and Data Science teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.

Roles and Responsibilities:

Worked with Apache Kafka to get data from web servers through Flume.

Leveraged Flume to stream data from Spool Directory source to HDFS Sink using AVRO protocol.

Developed Scala scripts to parse clickstream data using complex RegEx.

Developed Pig UDFs for processing complex data making use of Eval, Load and Filter Functions.

The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.

Implemented Hive queries using indexes and bucketing for time efficiency.

Implemented UDF’s, UDAF’s, UDTF’s in java for hive to process the data that can’t be performed using Hive inbuilt functions.

Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed data.

Implemented Oozie Coordinator to schedule the workflow, leveraging both data and time dependent properties.

Worked closely with BI and Data Science teams to gather requirements on data.

Debugging and troubleshooting the issues in MapReduce development using Test environments like MRUnit and JUnit.

Used Git as Version Control System and extensively used Maven as build tool.

Project #2

Log Data Analysis using Apache Spark:

This project was presented for analyzing Log Data using Apache Spark.

Roles and Responsibilities:

Implemented Batch Data Import and also worked on Stream processing using Spark Streaming.

Developed this project using Spark in YARN mode and in-depth knowledge on Standalone mode.

Created RDDs on the log files and converted them to Data Frames.

Developed Spark SQL queries to perform analysis on the log data.

Used Hive Context to connect with Hive Metastore and write HQL queries.

Tools and Technologies: Cloudera Manager (CDH5), MapReduce, HDFS, Sqoop, Pig, Hive, Oozie, Kafka, flume, Java, Git, Maven, Jenkins.

Client: PARALLON HEALTHCARE,NASHVILLE, TN

Role: Hadoop Developer SEPTEMBER' 14–JANUARY’ 16

Project Description:

Parallon Business Solutions is the subsidiary of the Hospital Corporation of America. It offers variety of services in the revenue cycle management, payroll, health information management and physician credentialing. Data is generated from data points from sensors or machines attached to the patients. Continuously streaming data from sensors or machines is collected and stored in HDFS even at the times when the patient is no longer present in the hospital, this data is used to monitor patient’s vitals

RESPONSIBILITIES:

•Worked on analyzing the Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.

•Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.

•Worked on NoSQL (HBase) to support enterprise production and loading data into HBASE using Impala and SQOOP.

•Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.

•Analyzed, designed and developed ETL strategies and processes, writing ETL specifications.

•Involved in design and development of complex ETL mapping.

•Implemented error handling in Talend to validate the data integrity and data completeness for the data from flat file.

•Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.

•Used Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.

•Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.

•Configured Hive metastore with MySQL, which stores the metadata for Hive tables.

•Created tables in HBase to store variable data formats of PII data coming from different portfolios.

•Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.

•Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.

•Imported data from HDFS to MYSQL database and vice-versa using SQOOP.

•Implemented Map Reduce jobs in HIVE by querying the available data.

•Performance tuning of Hive queries, MapReduce programs for different applications.

•Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.

•Developed a standard ETL framework to enable the reusability of similar logic across the board. Involved in System Documentation of Dataflow and methodology

•Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

•Used Cloudera Manager for installation and management of Hadoop Cluster.

•Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.

•Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases

•Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.

•Integrated Kafka-Spark streaming for high efficiency throughout and reliability

•Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.

•Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

•Deploy Informatica objects in production repository.

•Monitor and debug Informatica components in case of failure or performance issues.

Tools and Technologies: HADOOP TECHNOLOGIES (CLOUDERA, SPARK, HIVE, PIG, HBASE, IMPALA, SQOOP), INFORMATICA 9.1, ORACLE, AUTOSYS, UNIX

Client: CARE HOSPITALS,DELHI

Title: Java Developer JUNE’2013 –SEPTEMBER’2014

Project Description:

The project was to build a Focused search engine leveraging the latest open source software, big data technologies and Java Web Services. The Search engine is for organization’s internal use and the search enginewill search the data in the organization’s Knowledge repository.

Eco-System:Java1.8, JavaScript, HTML, CSS, Spring, Hibernate, REST web services, Junit, Oracle, Eclipse, Tomcat, JIRA, Postman, GIT.

Responsibilities:

•Monitor and debug Informatica components in case of failure or performance issues.

•Responsible to analyze functional specifications and to prepare technical design specifications.

•Involved in all Software Development Life Cycle (SDLC) phases of the project from domain knowledge sharing, requirement analysis, system design, implementation and deployment.

•Developed REST web services for implementing the business logic for different functionalities in the features that are developed.

•Utilized CSS, HTML and JavaScript for the development of the front-end screens.

•Wrote Junit test cases for testing the functionality of the developed web services.

•Involved in writing the SQL queries to fetch data from database.

•Utilized Postman for verifying the smooth workflow of the application, how the application is changing with the newly developed functionalities and also verified the output for the web services.

• User login, search & portfolio created using HTML5, CSS3, JavaScript and jQuery

• Extensively worked on both Enterprise and Community edition of MULE ESB. Experience working with Mule API and Runtime manager and RAML

• Designed and implemented UI layer using JSP, JavaScript, HTML, DHTML, JSON, XML, XHTML, XSL, XSLT, XSL-FO and business logic using Servlets, JSP, SWING, EJBs and J2EE framework.

• Worked on logging Mechanism Web NMS SNMP API supports logging of the SNMP requests.

• Responsible for the debugging, fixing and testing the existing bugs related to application.

• Developed builds using continuous integration server Jenkins.

• Extensively used GIT for push and pull requests of the code.

• Actively participated in the daily scrum meetings and bi-weekly retro meetings for knowledge sharing.

• Wrote DAO classes using spring and Hibernate to interact with database for persistence

• Used Eclipse for application development.

• Used JIRA as the task and defect tracking system.

Followed Agile Methodologies to manage the life-cycle of the project. Provided daily updates, sprint review reports, and regular snapshots of project progress.

Contact this candidate