Data Manager

Location:

Baldwin, NY

Posted:

December 14, 2020

Contact this candidate

Resume:

BRANDON HASAN

516-***-****

************@*****.***

GC/EAD, Long Island, NY

PROFESSIONAL SUMMARY

* ***** ** ************ ********** in Design and Development of Java and Big Data technologies in depth understanding of Hadoop Distributed Architecture and its various components such as Node Manager, Resource Manager, Name Node, Data Node, Hive Server2, HBase Master, Region Server etc.,

Strong experience developing end - to-end data transformations using Spark Core API.

Strong experience creating real time data streaming solutions using Spark streaming and Kafka.

Worked extensively on fine tuning spark applications and worked with various memory settings in spark.

Strong Knowledge for real time processing using Apache Strom.

Developed Simple to complex Map/Reduce jobs using Java.

Expertise in writing end to end Data processing Jobs to analyze data using MapReduce, Spark and Hive.

Experience in Kafka for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc.

Experience to working in data-bricks environments and also integrating applications like zeppelin.

Developed Sqoop scripts for large dataset transfer between Hadoop and RDBMs.

Experience using Hortonworks Distributions to fully implement and leverage new Hadoop features.

Strong experience in working with UNIX/LINUX environments, writing shell scripts.

Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.

Sound knowledge of J2EE architecture, design patterns, objects modeling using various J2EE technologies and frameworks.

Good working experience in design and application development using IDE's like IntelliJ, Eclipse.

Experience in writing test cases in Java Environment using JUnit.

Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.

Good team player with ability to solve problems, organize and prioritize multiple tasks.

Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills. PROFESSIONAL EXPERIENCE

HADOOP DEVELOPER

Bridgewater Associates, Westport, CT November 2017 - present Responsibilities

Responsible for building scalable distributed data solutions using Hadoop.

Extensively involved in Design phase and delivered Design documents.

Written Hive UDF to sort Structure fields and return complex data type.

Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope.

Importing and exporting data into HDFS and Hive using Sqoop and Migration of huge amounts of data from different databases to Hadoop.

Used Hive and Spark SQL for analyzing the Health insurance data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions, geographic region details. orchestrate a series

Cleaned input text data and extracted features using Spark Machine learning feature exaction.

Used Scalable language as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.

Worked closely data scientist for building predictive model using Spark.

Load and Transform large sets of structured and semi structured data. Responsible to manage data coming from different sources.

Developed data pipeline using Sqoop and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.

Involved in creating Hive Tables, loading data and writing hive queries.

Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.

Developed multiple Map reduce jobs in Hive for data cleaning and pre-processing.

Involved in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Managed and reviewed the Hadoop Log files.

Developed complex hive queries using Joins and partitions for huge data sets as per business requirements and load the filtered data from source to edge node hive tables and validate the data.

Performed Bucketing and Partitioning of data using apache Hive which saves the processing time and generating proper sample insights.

Moved all log/text files generated by various services into HDFS location.

Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Created workflows in Oozie along with managing/coordinating the jobs and combining multiple jobs sequentially into one unit of work.

Imported and exported data from different RDBMS systems such as Oracle, Teradata, Sql Server, Netezza and Linux systems such as Sas Grid.

Handled semi-structured data such as excel, csv and imported from sas grid to hdfs by using sftp process.

Ingested data into hive tables, using Sqoop and sftp process.

Used compression techniques such as Snappy, Gzip for data loads and archival.

Data level transformations have been done in intermediate tables before forming final tables.

Data Integrity checks have been handled using hive queries, Hadoop and Spark.

Daily, Monthly, Quarterly and ad hoc based data loads automated in Control -M and will run as per calendar dates scheduled.

Involved in Production Support, BAU Activities and Release management.

Expertise in writing custom UDFs in Hive.

Environment: Hadoop, Spark, Hive, Shell, Sqoop, Oozie Workflows, Teradata, Netezza, Sql Server, Oracle, Hue, Impala, Cloudera Manager.

HADOOP DEVELOPER

Oppenheimer & Co. Inc, New York, NY December 2016 - October 2017 Responsibilities

Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Pig, Hive and Sqoop

Performance tune and manage growth of the O/S, disk usage, and network traffic

Responsible for building scalable distributed data solutions using Hadoop.

Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.

Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.

Created and worked Sqoop jobs with incremental load to populate Hive External tables.

Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.

Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.

Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.

Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.

Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box

(such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)

Creating Hive tables and working on them using Hive QL.

Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.

Worked on Cluster co-ordination services through Zookeeper.

Monitored workload, job performance and capacity planning using Cloudera Manager.

Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.

Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.

Involved in Agile methodologies, daily scrum meetings, spring planning. Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python

HADOOP DEVELOPER

Capital Bank, McLean, VA Jan 2016 - December 2016

Responsibilities

Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL

Installed application on AWS EC2 instances and configured the storage on S3 buckets

Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.

Worked in AWS environment for development and deployment of Custom Hadoop Applications.

Worked closely with the data modelers to model the new incoming data sets.

Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.

Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, Oozie, Zookeeper, Sqoop, Spark, Impala, Cassandra with Horton work Distribution.

Involved in creating Hive tables to loading data and writing hive queries.

Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.

Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.

Import the data from different sources like HDFS/HBase or local into Spark RDD.

Developed a data pipeline using Kafka and Storm to store data into HDFS.

Performed real time analysis on the incoming data.

Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.

Implemented Spark using Scala and SparkSQL for faster testing and processing of data. Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Hive, HBASE, Oozie, Scala, Spark, Linux. Education

Masters of Science in Economics’

University of Rome (Tor Vergata)

Rome, Italy

Bachelor of Business Administration

Stamford University Bangladesh

Dhaka, Bangladesh

Contact this candidate