Data Engineer

Location:

Atlanta, GA

Posted:

April 09, 2021

Contact this candidate

Resume:

Modupe OyeDiran

470-***-****

**********************@*****.***

Professional Summary

I am a highly experienced individual with 10+ years of industrial working experiences on various applications and various data as Big Data Engineer, Business Data management, Applications Research Analyst, Analyst/Programmer & IT applications consultant.

Skills

Operating System

Windows, Linux

Big Data Technology

Hadoop, Kafka, Spark, Zookeeper, Sqoop, Flume, Spark SQL, Cloudera Horton Work HDP platform

Other Technology

Microsoft Word, Excel, Outlook, Power Point, Oracle virtual box/manager, VM ware virtual manager,

Azure Data Lake, Twitter API (tweepy python wrapper), Spotify API (spotipy python wrapper), Alpha Vantage API and Yahoo Finance API

Data Structures/Storage

Text files, CSV files, JSON files, SSMS TSQL, MySQL, Hadoop Distributed File system (HDFS), Azure Data Lake, AWS S3, Hive, Hbase, MongoDd, Basic Avro,

Programming Languages

Python, MySQL SQL, Hive SQL, Hbase SQL, MongoDb, Hadoop MapReduce, Hadoop Distributed File system, Spark Resilient distributed dataset

Command scripting/formula

Excel, Windows, Linux, Ubuntu

Business Applications

Human Resources, Recruitment, FX Trading, Clients Reporting, On Boarding, Audits, Procurement, Insurance, Accounting, Reinsurance, Finance, News

Strengths and Soft skills

Stake Holder Management, Communications, requirements/information gathering, data collecting cleaning and processing

Work History

Data Engineer, 03/2019 to Current

Deloitte – Atlanta, Ga

Created a cluster of Kafka brokers to fetch structured data in structured streaming

Developed, designed tested Spark SQL clients with PySpark.

Participated in hadoop data ingestion and Hadoop cluster handing in real time processing using Kafka and spark.

Collected data using REST API, built HTTPS connection with client server, sent GET request and collected response in Kafka producer

Stored the data pulled from diverse APIs into HBase on Hortonworks

Designed HBase queries to perform data analysis, data transfer and table design

Developed PySpark application as ETL processes.

Imported data from web services into HDFS and transformed data using spark.

Hands-on experience with Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API

Split JSON files into DataFrames to be processed in parallel for better performance and fault tolerance

Decoded raw data from JSON and streamed it over using the Kafka producer API

Integrated Kafka with Spark Streaming for real-time data processing using Dstreams

Developed Cloud-based Big Data Architecture using Hadoop and AWS

Created Hive and SQL queries to spot emerging trends by comparing data with historical metrics.

Used Spark to parse out the needed data by using Spark SQL Context and selected fetures with target information and assigned names

Creating HBase tables, loading with data and writing HBase queries to process the data.

Conducted exploratory data analysis and managed dashboard for weekly report

Set up Ambari, troubleshooting the distributed of different components of apache big data tools to ensure the performance of pipelines

Utilized transformations and actions in spark to interact with data frames to show and process data

Aws Data Engineer, 05/2018 to 03/2019

GameStop – Grapevine, TX

Hands-on experience in developing Python applications on Linux and UNIX platform.

Developed AWS Cloud Formation templates to create custom infrastructure

Worked on AWS Kinesis for processing huge amounts of real-time data

Wrote Hive queries and wrote custom UDF's

Processed data stored in AWS RDS using EMR and ingested to AWS Redshift

Used Kafka to transmit live steaming with batch processing to generate reports.

Configured Airflow workflow engine to run multiple Hive jobs.

Added support for Amazon AWS S3 and RDS to host media files

Created Hive queries to spot emerging trends by comparing Hadoop data with historical metrics

Created multiple batch Spark jobs using Python

Accessed Hadoop file system (HDFS) using Spark and managed data in Hadoop data lakes with Spark.

Used Python and Shell Scripts to generate inventory

Analyzed large amounts of datasets to determine the optimal way to aggregate and report trends

Installed, Configured and Managed tools such as ELK for Resource Monitoring

Developed multiple Spark Streaming and batch Spark jobs using Python on EMR

Writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language from External Tables hosted on S3 buckets

Data Engineer, 11/2017 to 05/2018

Verizon – New York, NY

Using data from text files from HDFS to create a RDD and manipulate data using PySpark and Spark Context in jupyter notebooks

Using data from text files with python to transfer data from Kafka producer to Kafka consumer

Streaming real time selected data from Rest API and using python wrapper libraries and Kafka to produce a topic.

Used spark context and session builder to create spark RDDs to be meged.

Developed a spark job to convert data spliced on schema and transformed to spark data frames

Formatted required information, created a topics using Kafka Producer. Spark Context and Session

Created spark jobs where data was loaded into the spark data frames and a Hive table was created.

Used twitter agent to retrieve tweets in avro format from flume to HDFS using different configuration files

Processed JSON callbacks from Kafka using python.

Formatted Avro into data frames using schema matching

Created spark jobs with the usage of spark session using the cluster mode

Extracted metadata from Hive tables using Impala

Writing SQL queries for data validation of the reports and dashboards as necessary.

Imported data from different source like HDFS and api into spark DataFrames for further processing.

Installed Airflow workflow engine to executre spark jobs that run independently with time and data.

Used Sqoop to export the data from Oracle.

Processed data in the form of Data Frame and save the data as Parquet format in HDFS using spark.

Big data Developer, 09/2017 to 11/2017

CitiBank – New York, NY

Installing software on virtual machines hosted on VMware

Hadoop multi node cluster setup.

Virtual machines were used for master and slaves. All machines had a data node set.

Extraction of data from SQL database and text files and loading in Hadoop HDFS

Setting up Virtual Machines and Instances using Oracle/VM Virtual Manager

Wrote multiple Spark programs in Python for data extraction, transformation, and aggregation from multiple file-formats

Assisted in exporting analyzed data to relational databases using Sqoop

Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation.

Responsible for the performance optimization of spark jobs

Responsible to manage data coming from different sources into a HDFS Data Lake

Used the Hive JDBC to verify the data stored in the Hadoop cluster

Extracted the data from RDBMS (Oracle, MySQL) to HDFS using Sqoop

Good knowledge in SQL, HQL hands on experience in writing medium level SQL queries

Used Spark modules to store the data on HDFS

Installed Oozie workflow engine to run multiple Hive Jobs.

Hadoop Developer, 12/2013 to 06/2017

Unity Bank – Lagos, Nigeria

Worked on analyzing Hadoop Cloudera cluster and different big data analytics tools including Pig, Hive, and Sqoop.

Created ETL pipelines to ingest data.

Ingested data using Sqoop from various databases and moved the data into HDFS.

Used Pig scripts to clean and transform the data in the raw layer and moved into curated layer.

Moved the transformed data into Hive using bash scripts.

Orchestrated the pipelines using Apache OOZIE.

Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the business users.

Created and maintained technical documentation for launching the Sqoop job, pig scripts and Hive queries.

Data Analyst, 05/2009 to 11/2013

Media Track – Lagos, Nigeria

Used Oracle -SQL to extract the data and used MS Excel to analyze and draw accurate and reliable conclusions to provide insights to upper management and financial team.

Implemented knowledge of software development life cycle to analyze data regarding internal work efficiency and noted areas of wasted time, increasing total productivity in production by 10 percent.

Used different kinds of charts in MS Excel.

Conducted research using focus groups on 3 different products and increased sales by 17% due to the findings.

Spearheaded data flow improvement.

Developed Key Performance Indicators to monitor sales and decreased costs by 18%.

Education

Austin Peay State University, Clarksville, TN

MS Predictive Analytics

Contact this candidate