Resume

Data Analysis Hadoop Developer

Location:

Queens, NY

Posted:

January 12, 2024

Contact this candidate

Resume:

Atiq Zaman

Hadoop Developer

Email: ad2pm3@r.postjobfree.com

Ph #:

PROFESSIONAL SUMMARY

Qualified IT Professional with around 8+ years of experience as a Hadoop Consultant

Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.

Proficient in Installation, Configuration and migrating and upgrading of data from Hadoop MapReduce, HIVE, HDFS, HBase, Sqoop, Oozie, Pig, Cloudera, Zookeeper, Flume and Cassandra.

Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH3&4 clusters.

Familiar and good exposure with Apache Spark ecosystem such as Spark, Spark Streaming using Scala and Python.

Experience in analyzing data using Hive SQL, Pig Latin and custom MapReduce programs in Java and Python.

Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.

Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.

Experience in NoSQL database MongoDB and Cassandra.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

Experienced in deployment of Hadoop Cluster using Puppet tool.

Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.

Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.

In depth knowledge of Job Tracker, Task Tracker, Name Node, Data Nodes and MapReduce concepts.

Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.

Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.

Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).

Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.

Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.

Good experience in implementing and setting up standards and processes for Hadoop based application design and implementation.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.

Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.

Experience in Object Oriented language like Java and Core Java.

Experience in creating web-based applications using JSP and Servlets.

Experience in managing Hadoop clusters using Cloudera Manager Tool.

Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

Extensive experience working in Oracle, DB2, SQL Server and My SQL database.

Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.

Hands on experience in application development using Java, RDBMS, and Linux shell scripting.

Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.

Developed restful Web-services using Grails framework in Python.

Develop framework for converting existing PowerCenter mappings and to PySpark Jobs.

Create Pyspark frame to bring data from DB2 to Amazon S3.

Translate business requirements into maintainable software components and understand impact (Technical and Business)

Provide guidance to development team working on PySpark as ETL platform

Technical Skills:

Hadoop Technologies

HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Flume, Oozie, Zookeeper, Ambari, Hue, Spark, PySpark, Strom, Talend, Ganglia

Operating System

Windows, Linux

Languages

Java, J2EE, Scala, Python, SQL, PL/SQL, Shell Script

Project Management / Tools

MS Project, MS Office, TFS, HP Quality Center Tool

Front – End

HTML, JSTL, DHTML, JavaScript, CSS, XML, XSL, XSLT

Databases

MySQL, Oracle 11g/10g/9i, SQL Server

NoSQL Databases

HBase, Cassandra

File System

HDFS

Reporting Tools

Jasper Reports, Tableau

IDE Tools

Eclipse, NetBeans

Application Server

Apache Tomcat, Web Logic

WORK EXPERIENCE

Client: Wells Fargo, Dallas, Tx May 2021 – May 2023

Role: Hadoop Developer

Technologies: HDFS, MongoDB. Hive (avro/parquet), sqoop, Autosys. Spark

Responsibilities:

•Designed and implemented solutions for onboarding to WF On-premises Data Lake, ensuring efficient and scalable data ingestion processes.

•Played a pivotal role as an Architect, overseeing the implementation and design of Hive tables with optimized partitioning and storage formats to enhance query performance.

•Developed and maintained Sqoop jobs and Hive objects, ensuring seamless data transfer and processing between different systems.

•Scheduled job execution and monitoring using Autosys, ensuring timely and efficient data processing and analytics.

•Developed, implemented, and supported data analytics protocols, standards, and documentation, ensuring consistency and adherence to best practices throughout the project lifecycle.

•Successfully worked on diverse use cases, such as DCAR, T2A Credit Card, Experian, Experian parsing, MU-DQ, Flex Loan, and RateSale, effectively leveraging Hadoop technologies to address specific business requirements.

•Played a vital role in the production support for the Flex Loan use case, ensuring the stability and availability of critical data processing workflows.

•Collaborated closely with the team to standardize code in the Consumer Data and Engagement Platform (CDEP), formerly known as the Customer Analytics Platform (CAP), fostering code quality and maintainability.

•Provided extensive support and expertise in handling SDG (Software Development Group) data across different environments (DEV/SIT/UAT), ensuring data integrity and smooth transitions.

•Deployed codebase using Jenkins, facilitating continuous integration and deployment processes.

•Demonstrated a solid understanding of the Software Development Life Cycle (SDLC) process and effectively coordinated with DevOps and cross-functional teams to ensure seamless codebase deployment to production environments.

•Utilized GIT for version control, ensuring effective collaboration, code traceability, and version management.

•Active participation in Sprint planning, Sprint review, retrospective, and other ceremonies, contributing to the agile development process and fostering team collaboration and communication.

•Engaged directly with business stakeholders and product owners, actively gathering requirements and providing valuable insights to drive successful solution development and delivery.

•Thrived in an Agile environment, leveraging iterative and collaborative methodologies to deliver high-quality Hadoop solutions.

Environment: ApacheHadoop, Map Reduce, HDFS, Hive, Kafka, Autosys, Java, Linux, Teradata, Tableau.

Client: Dillons, Hutchinson, KS Oct 2019 – May 2021

Role: Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.

Written multiple Map Reduce programs in Java for Data Analysis.

Wrote Map Reduce job using Pig Latin and Java API.

Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.

Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.

Passionate about working on the most cutting-edge Big Data technologies.

Developed Pig scripts for analyzing large data sets in the HDFS.

Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.

Designed and presented plan for POC on Impala.

Involved in migrating HiveQL into Impala to minimize query response time.

Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.

Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.

Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.

Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.

Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.

Responsible for performing extensive data validation using Hive

Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.

Involved in loading data from Teradata database into HDFS using Sqoop queries.

Involved in submitting and tracking MapReduce jobs using Job Tracker.

Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.

Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.

Exported data to Tableau and excel with Power view for presentation and refining

Implemented business logic by writing PigUDFs in Java and used various UDFs from Piggybanks and other sources

Implemented Hive Generic UDF's to implement business logic.

Implemented test scripts to support test driven development and continuous integration.

Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: ApacheHadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Java, Linux, Maven, Teradata, Zookeeper, Tableau.

Client: National Western Life Insurance, Austin, TX Feb 2017 – Sep 2019

Role: Hadoop Developer

Responsibilities:

Installed and configured Cloudera Hadoop on a 100 node cluster.

Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning and processing.

Developed data pipeline using Sqoop, Hive, Pig and Java MapReduce to ingest claim and policy histories into HDFS for analysis.

Implemented the workflows using Apache Oozie framework to automate tasks.

Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.

Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Created Hive External tables and loaded the data in to tables and query data using HQL.

Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.

Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Responsible for architecting Hadoop clusters with CDH3.

Importing and exporting data into HDFS and Hive using Sqoop.

Worked on NoSQL databases including HBase and ElasticSearch.

Performed cluster co-ordination through Zookeeper.

Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Installed and configured Hive and also written Hive UDFs.

Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.

Analyzed Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.

Developed shell script to pull the data from third party system’s into Hadoop file system.

Supported in setting up QA environment and updating configurations for implementing scripts with Pig.

Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.

Environment: Hadoop, MapReduce, HDFS, Flume, Cassandra, Sqoop, Pig, HBase, Hive, ZooKeeper, Cloudera, Oozie, ElasticSearch, Sqoop, NoSQL, UNIX/LINUX.

Client: MidFirst Bank, Oklahoma City, OK Jan 2015 – Jan 2017

Role: Hadoop Developer/Admin

Responsibilities:

Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate work place project. Interacted with the Business users to build the sample report layouts.

Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.

Load log data into HDFS using Flume and created MapReduce jobs to power data for search and aggregation.

Installed and configured Apache Hadoop and Hive/Pig Ecosystems.

Installed and Configured Cloudera Hadoop CDH4via Cloudera Manager in a pseudo distributed mode and cluster mode as a proof of concept.

Created Map Reduce Jobs using Hive/Pig Queries.

Extensively used Pig for data cleansing.

Developed the Pig UDF’S to pre-process the data for analysis.

Scheduled workflows in Oozie to automate tasks of loading data in HDFS and pre-processing with Pig and HiveQL.

Created Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Involved in configuring Sqoop to map SQL types to appropriate Java classes.

Load and transform large sets of structured, semi structured and unstructured data.

Cluster co-ordination services through ZooKeeper.

Environment: Hadoop, Oracle, Cloudera Hadoop CDH4, HiveQL, PigLatin, MapReduce, HDFS, HBase, ZooKeeper, Oozie, Oracle, PL/SQL, SQL*PLUS, Windows, UNIX, Shell Scripting.

Education: Completed B.Sc from National University Bangladesh, at 1999.

Contact this candidate