Data Hadoop

Location:

St. Catharines, ON, Canada

Posted:

May 28, 2020

Contact this candidate

Resume:

Name: Ameya

Phone Number: 581-***-****

Email Id: *********@*****.***

Professional Summary:

oOver 5+ years of working experience in Designing and Building high performance and scalable systems using Big Data Ecosystem on windows and Linux environments.

oStrong end-to-end experience on Hadoop Development with varying level of expertise around different BIGDATA Hadoop projects.

oExperience in Hadoop distributions like Cloudera (CDH).

oProcedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs.

oExperience on Apache Hadoop technologies Hadoop distributed file system (HDFS), Map Reduce framework, YARN, Pig, Hive, Sqoop, and Flume.

oExpertise in developing Map-Reduce programs to perform Data Transformation.

oGood Experience in working with Shell scripting.

oExperience in writing Map Reduce jobs on Hadoop Ecosystem including Hive and Pig.

oGood knowledge of Hadoop architecture and various Hadoop Stack elements.

oStrong knowledge on implementation of SPARK core – SPARK SQL and Spark streaming.

oFamiliarity on real time streaming data with Spark for fast large scale in memory Map Reduce.

oGathered and reﬁned business requirements and perform complex data proﬁling, analysis and data modeling.

oExperience in Hadoop Development in AWS Environment with EMR.

oExperience in developing pipelines and processing data from various sources and processing them with Hive and Pig.

oGood understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.

oExperience in Cluster Monitoring Tools such as Ganglia.

oExperience in development and deployment of custom Hadoop application in both in house clusters and Cloud based environment.

oExploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

Technical Skills:

Big Data Ecosystem

Hadoop, Spark, MapReduce, YARN, Hive, SparkSQL, Impala, Pig, Sqoop, HBase, Flume, Oozie, Zookeeper, Avro, Parquet, Maven, Snappy

Hadoop Distributions

Cloudera

NoSQL Databases

Cassandra, Mongo DB, HBase

Languages

Scala, SQL and C/C++

Databases

SQL Server, MySQL, PostgreSQL, Oracle

Operating systems

UNIX, Linux, and Windows Variants

Work Experience:

Role: Hadoop Developer

Client: Manulife Financial, Quebec City, QC

Duration: Jan 2018 – Till Date

Responsibilities:

oOptimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.

oResponsible for building scalable distributed data solutions using Hadoop.

oHandled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.

oAnalyzed the data by performing Hive queries and running Pig scripts to study customer behaviour.

oInstalled and configured Cloudera Manager for easy management of existing Hadoop cluster.

oDeveloped workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

oDeveloped Spark scripts by using Shell scripting commands as per the requirement.

oResponsible for managing and reviewing Hadoop log files. Designed and developed data management system using MySQL.

oWrote Shell scripts to parse XML documents and load the data in database.

oCluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise, and other tools.

oInvolved in Data processing using spark.

oDeveloped a data pipeline using Kafka and Storm to store data into HDFS.

oWorked on NoSQL databases including HBase and Elastic Search.

oWorked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.

oUsed Tableau as reporting tool as data visualization tool.

oFollowed agile methodology, interacted directly with the client provide/take feedback on the features, suggest/implement optimal solutions, and tailor application to customer needs.

oSetting up proxy rules of applications in Apache server and Creating Spark SQL queries for faster requests.

oDesigned and Developed database design document and database diagrams based on the Requirements.

Environment: HDFS, Hive, PIG, UNIX, SQL, Kafka, Map Reduce, SPARK Hadoop Cluster, Hbase, Sqoop, Kafka, Oozie, Linux, Data Pipeline, Cloudera Hadoop Distribution, MySQL, Git, Shell scripting.

Role: Hadoop Developer

Client: Costco Wholesale, Nepean, ON

Duration: Sep 2016 – Dec 2017

Responsibilities:

oDeveloped PIG scripts to transform the raw data into intelligent data as specified by business users.

oWorked closely with the data modellers to model the new incoming data sets.

oInvolved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs) Extracted and loaded data into Data Lake environment (Amazon S3) by using Sqoop which was accessed by business users and data scientists.

oAssisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.

oExploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

oDeveloped Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Import the data from different sources like HDFS/Hbase into Spark RDD.

oPOC on Single Member Debug on Hive/Hbase and Spark.

oConfigured deployed and maintained multi-node Dev and Test Kafka Clusters.

oPerformed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.

oLoad the data into Spark RDD and do in memory data Computation to generate the Output response.

oLoading Data into Hbase using Bulk Load and Non-bulk load.

oExploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

oDeveloped Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

oImport the data from different sources like HDFS/Hbase into Spark RDD.

oPerformed real time analysis on the incoming data.

oAutomated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.

oPerformed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.

oDeveloped Map Reduce jobs to convert data files into Parquet file format.

oExecuted Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.

oDeveloped business specific Custom UDF's in Hive, Pig.

oConfigured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.

oOptimized Map Reduce code, pig scripts and performance tuning and analysis.

Environment: Map Reduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark SQL, Apache Kafka, Sqoop, Scala, CDH5, Eclipse, Oracle, Git, Shell Scripting and Cassandra.

Role: Hadoop Developer

Client: Saba Software, Ottawa, ON

Duration: Jun 2015 - Aug 2016

Responsibilities:

oEvaluated business requirements and prepared detailed specifications that follow project guidelines in order to keep develop applications on track.

oInvolved in loading data from LINUX file system to HDFS.

oManaged streaming data coming from a variety of sources using Flume.

oAssisted in exporting and importing analyzed data to relational databases using Sqoop.

oPerformed transformations using Pig, Hive, and MapReduce (including UDFs).

oCreated Hive External tables with partitions on daily ingesting files.

oAnalyzed the data to extract customer data. For example: number of satisfactory customers per period, page views, visit duration, most purchased products on website.

oExperienced in job workflow scheduling tool like Oozie.

oAnalyzed large amounts of data sets using HBase to aggregate and report on it.

oWritten Solr queries for various search documents.

oExperienced in Monitoring Cluster using Cloudera manager.

oDevelop reports, dashboards using Tableau for quick reviews to be presented to business.

oWorked on ingesting the source data into the Hadoop datalake from various databases by using Sqoop tool.

oReading, Processing and parsing CSV source data files through Spark/Scala and ingesting to Hive tables.

oExtensively worked on Hive tables, partitions and buckets for analyzing large volumes of data.

oScheduled the Hive queries on daily basis by using oozie coordinator and by writing an oozie workflow.

oI also worked on database testing and QA validation to make sure that the product is bug free.

oKnowledge transition to the end users and junior developers to understand the hive queries and the business requirements.

Environment: HDFS, Hive, PIG, UNIX, SQL, Kafka, Map Reduce, SPARK Hadoop Cluster, Hbase, Sqoop, Kafka, Oozie, Linux, Data Pipeline, Cloudera Hadoop Distribution.

Role: Hadoop Developer

Client: Heckyl Technologies, India

Duration: Aug 2014 - May 2015

Responsibilities:

oConverting the existing relational database model to Hadoop ecosystem.

oGenerate datasets and load to HADOOP Ecosystem

oWorked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.

oWorked with Spark to create structured data from the pool of unstructured data received.

oManaged and reviewed Hadoop and HBase log files.

oInvolved in review of functional and non-functional requirements.

oResponsible to manage data coming from different sources.

oLoaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.

oInvolved in loading data from UNIX file system and FTP to HDFS.

oDesigned and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.

oCreating Hive tables and working on them using Hive QL.

oWrote Spark code to convert unstructured data to structured data.

oDeveloped Hive queries to analyze the output data.

oDeveloped workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

oHad to do the Cluster co-ordination services through ZooKeeper.

oCollected the logs data from web servers and integrated in to HDFS using Flume.

oUsed HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.

oDesign and implement Spark jobs to support distributed data processing.

oSupported the existing MapReduce Programs those are running on the cluster.

oWrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

oInvolved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.

oFollowed agile methodology for the entire project.

oPrepare technical design documents, detailed design documents.

Environment: Linux - Ubuntu, Hadoop pseudo distributed mode, HDFS, Hive, Flume, Spark, Flume, Hive

Contact this candidate