Abdel Bileomon - Hadoop Developer

Location:

Santa Clara, CA

Posted:

March 29, 2021

Contact this candidate

Resume:

ABDEL BILEOMON

HADOOP BIG DATA ENGINEER

Phone: 408-***-**** Email: ***********@*****.***

Skills

STRENGTHS

Skilled in managing data analytics and data processing, database and data driven projects

Skilled in Architecture of Big Data Systems, ETL Pipelines, and Analytics Systems for diverse end users

Skilled in Database systems and administration

Proficient in writing technical reports and documentation

Adept with various distributions such as Cloudera Hadoop, Hortonworks, MapR and Elastic Cloud, Elasticsearch

Expert in bucketing and partitioning

Expert in Performance Optimization

Technical Skills

APACHE

Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Spark, Apache Tez, Apache Zookeeper, Cloudera Impala, HDFS

Hortonworks, MapR, MapReduce

SCRIPTING

HiveQL, MapReduce, XML, FTP,

Python, UNIX, Shell scripting, LINUX

OPERATING SYSTEMS

Unix/Linux, Windows 10, Ubuntu, Apple OS

FILE FORMATS

Parquet, Avro & JSON, ORC, text, csv

DISTRIBUTIONS

Cloudera, Hortonworks, AWS, Elastic, ELK, Cloudera CDH 4/5, Hortonworks HDP 2.5/2.6, Amazon Web Services (AWS)

DATA PROCESSING (COMPUTE) ENGINES

Apache Spark, Spark Streaming, Flink

DATA VISUALIZATION TOOLS

Pentaho, QlikView, Tableau, PowerBI, Matplot

COMPUTE ENGINES

Apache Spark, Spark Streaming, Storm

DATABASE

Microsoft SQL Server Database (2005, 2008R2, 2012)

Database & Data Structures, Apache Cassandra, Amazon Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDB,

SOFTWARE

Microsoft Project, Primavera P6, VMWare, Microsoft Word, Excel, Outlook, Power Point; Technical Documentation Skills

Experience

Freddie Mac Remote

DATA ENGINEER December 2020 – Present

Implemented AWS EMR Spark using PySpark and utilized DataFrames and SparkSQL API for faster processing of data

Registered datasets to AWS Glue through Rest API

Used AWS API Gateway to Trigger Lambda functions

Queried with Athena on data residing in AWS S3 bucket

AWS Step function used to run a data pipeline

Used DynamoDB to store metadata and logs

Monitoring and managed services with AWS CloudWatch

Performed transformations using Apache SparkSQL

Wrote Spark applications for data validation, cleansing, transformation, and customed aggregation

Developed Spark code using Python and Spark-SQL for faster testing and data processing.

Tuned Spark to increase job's performance

Monitoring and managed services with AWS CloudWatch

Configured ODBC Driver, Presto Driver with Okera and RapidSQL

Used Dremio as Query engine for faster Joins and complex queries over AWS S3 bucket using Dremio data reflections

Realtor Santa Clara, California

HADOOP DATA ENGINEER December 2018 – December 2020

Installed entire Hadoop ecosystem on new servers including Hadoop, Spark, Pyspark, Kafka, Hortonworks, Hive, Cassandra

Developed a data pipeline used for extracting historic flood information from online sources

Used Python to scrape relevant articles of most recent floods

Used Python to make requests from news sources API’s as well as social media API’s such as Facebook and Twitter

Stored unprocessed JSON and HTML files in HDFS data lake

Retrieved structured and unstructured data from HDFS and MySQL to Spark to preform MapReduce jobs

Implemented advanced procedures like text analytics and processing using in memory computing capability methods via Apache Spark in Scala

Used Spark Context and Spark Session to process text files by flat mapping, mapping to RDD, and reducing RDD’s by key to identify sentences containing valuable information

Worked with analytics team to provide querying insights and helped develop methods to map informative sentences more efficiently

Adjusted tables and schema to provide more informative data to be used in machine learning models

Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.

Created a Kafka producer to connect to different external sources and bring the data to a Kafka broker.

Handled schema changes in data stream.

Created a Kafka topics for structured streaming to get structured data by schema via CLI.

Hive partitioning, bucketing, performing joins on Hive tables.

Performed transformations and analysis using Hive

Medline Industries Northfield, IL

HADOOP DATA ENGINEER May 2017 – December 2018

Installed and configured Hadoop HDFS developed multiple jobs in java for data cleaning and preprocessing.

Developed Map/Reduce jobs using Scala for data transformations.

Develop different components of system like Hadoop process that involves Map Reduce, and Hive.

Migration of ETL processes from Oracle to Hive to test the easy data manipulation.

Using Sqoop to extract the data back to relational database for business reporting.

Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.

Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.

Developed Hive queries and UDFS to analyze/transform the data in HDFS.

Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.

Used Sqoop to efficiently transfer data between databases and HDFS

Debugging and identifying issues reported by QA with the Hadoop jobs by configuring to local file system.

Implemented Flume to import streaming data logs and aggregating the data to HDFS.

Experienced in running Hadoop streaming jobs to process terabytes data.

Involved in evaluation and analysis of Hadoop cluster and different big data analytic tools including HBase database and Sqoop.

Wells Fargo San Francisco, CA

AWS Cloud DATA ENGINEER October 2015 – May 2017

Imported the data from different sources like AWS S3 into Spark RDD.

Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.

Developed Spark scripts by using Scala Shell commands as per the requirement.

Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.

Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

Developing Spark programs using PySpark APIs to compare the performance of Spark with Hive and SQL.

Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.

Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.

Load the data into Spark RDD and do in memory data Computation to generate the Output response.

Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.

Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.

Involved in HBASE setup and storing data into HBASE, which will be used for analysis.

Used Impala for querying HDFS data to achieve better performance.

Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.

Implemented HQL scripts to load data from and to store data into Hive.

Develop Spark jobs to parse the JSON or XML data.

Used the JSON and XML for serialization and de-serialization to load JSON and XML data into HIVE tables.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.

Analyzed the SQL scripts and designed the solution to implement using PySpark.

Tested on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.

Used Avro, Parquet and ORC data formats to store in to HDFS.

Used Oozie workflow to co-ordinate pig and hive scripts.

Deployed to Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.

Deployed to various HDFS file formats like Avro, Sequence File and various compression formats like Snappy.

Dick's Sporting Goods Oakdale, PA

DATA ENGINEER April 2013 – October 2015

Involved in architectural design cluster infrastructure, Resource mobilization, Risk analysis and reporting.

Commissioning and de-commissioning the data nodes and involve in Name Node maintenance.

Regular backup and clear logs from HDFS space. This is to utilize data nodes optimally. Write shell scripts for time bound commands execution.

Edit and configure HDFS and tracker parameters.

Script the requirements using BigSQL and provide time statistics of running jobs.

Involve code review tasks in simple to complex Map/reduce Jobs using Hive and Pig

Cluster Monitoring using Big Insights ionosphere tool.

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Installed Oozie workflow engine to run multiple Hive and Pig jobs.

Education

Bachelor of Science Bioengineering, Bioinformatics

University of Illinois at Chicago

Certifications

IBM Scala Certificate

IBM Hadoop Certificate

IBM Big Data Certificate

Contact this candidate