Modupe OyeDiran
adljj9@r.postjobfree.com
Professional Summary
I am a highly experienced individual with 10+ years of industrial working experiences on various applications and various data as Big Data Engineer, Business Data management, Applications Research Analyst, Analyst/Programmer & IT applications consultant.
Skills
Operating System
Windows, Linux
Big Data Technology
Hadoop, Kafka, Spark, Zookeeper, Sqoop, Flume, Spark SQL, Cloudera Horton Work HDP platform
Other Technology
Microsoft Word, Excel, Outlook, Power Point, Oracle virtual box/manager, VM ware virtual manager,
Azure Data Lake, Twitter API (tweepy python wrapper), Spotify API (spotipy python wrapper), Alpha Vantage API and Yahoo Finance API
Data Structures/Storage
Text files, CSV files, JSON files, SSMS TSQL, MySQL, Hadoop Distributed File system (HDFS), Azure Data Lake, AWS S3, Hive, Hbase, MongoDd, Basic Avro,
Programming Languages
Python, MySQL SQL, Hive SQL, Hbase SQL, MongoDb, Hadoop MapReduce, Hadoop Distributed File system, Spark Resilient distributed dataset
Command scripting/formula
Excel, Windows, Linux, Ubuntu
Business Applications
Human Resources, Recruitment, FX Trading, Clients Reporting, On Boarding, Audits, Procurement, Insurance, Accounting, Reinsurance, Finance, News
Strengths and Soft skills
Stake Holder Management, Communications, requirements/information gathering, data collecting cleaning and processing
Work History
Data Engineer, 03/2019 to Current
Deloitte – Atlanta, Ga
Created a cluster of Kafka brokers to fetch structured data in structured streaming
Developed, designed tested Spark SQL clients with PySpark.
Participated in hadoop data ingestion and Hadoop cluster handing in real time processing using Kafka and spark.
Collected data using REST API, built HTTPS connection with client server, sent GET request and collected response in Kafka producer
Stored the data pulled from diverse APIs into HBase on Hortonworks
Designed HBase queries to perform data analysis, data transfer and table design
Developed PySpark application as ETL processes.
Imported data from web services into HDFS and transformed data using spark.
Hands-on experience with Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API
Split JSON files into DataFrames to be processed in parallel for better performance and fault tolerance
Decoded raw data from JSON and streamed it over using the Kafka producer API
Integrated Kafka with Spark Streaming for real-time data processing using Dstreams
Developed Cloud-based Big Data Architecture using Hadoop and AWS
Created Hive and SQL queries to spot emerging trends by comparing data with historical metrics.
Used Spark to parse out the needed data by using Spark SQL Context and selected fetures with target information and assigned names
Creating HBase tables, loading with data and writing HBase queries to process the data.
Conducted exploratory data analysis and managed dashboard for weekly report
Set up Ambari, troubleshooting the distributed of different components of apache big data tools to ensure the performance of pipelines
Utilized transformations and actions in spark to interact with data frames to show and process data
Aws Data Engineer, 05/2018 to 03/2019
GameStop – Grapevine, TX
Hands-on experience in developing Python applications on Linux and UNIX platform.
Developed AWS Cloud Formation templates to create custom infrastructure
Worked on AWS Kinesis for processing huge amounts of real-time data
Wrote Hive queries and wrote custom UDF's
Processed data stored in AWS RDS using EMR and ingested to AWS Redshift
Used Kafka to transmit live steaming with batch processing to generate reports.
Configured Airflow workflow engine to run multiple Hive jobs.
Added support for Amazon AWS S3 and RDS to host media files
Created Hive queries to spot emerging trends by comparing Hadoop data with historical metrics
Created multiple batch Spark jobs using Python
Accessed Hadoop file system (HDFS) using Spark and managed data in Hadoop data lakes with Spark.
Used Python and Shell Scripts to generate inventory
Analyzed large amounts of datasets to determine the optimal way to aggregate and report trends
Installed, Configured and Managed tools such as ELK for Resource Monitoring
Developed multiple Spark Streaming and batch Spark jobs using Python on EMR
Writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language from External Tables hosted on S3 buckets
Data Engineer, 11/2017 to 05/2018
Verizon – New York, NY
Using data from text files from HDFS to create a RDD and manipulate data using PySpark and Spark Context in jupyter notebooks
Using data from text files with python to transfer data from Kafka producer to Kafka consumer
Streaming real time selected data from Rest API and using python wrapper libraries and Kafka to produce a topic.
Used spark context and session builder to create spark RDDs to be meged.
Developed a spark job to convert data spliced on schema and transformed to spark data frames
Formatted required information, created a topics using Kafka Producer. Spark Context and Session
Created spark jobs where data was loaded into the spark data frames and a Hive table was created.
Used twitter agent to retrieve tweets in avro format from flume to HDFS using different configuration files
Processed JSON callbacks from Kafka using python.
Formatted Avro into data frames using schema matching
Created spark jobs with the usage of spark session using the cluster mode
Extracted metadata from Hive tables using Impala
Writing SQL queries for data validation of the reports and dashboards as necessary.
Imported data from different source like HDFS and api into spark DataFrames for further processing.
Installed Airflow workflow engine to executre spark jobs that run independently with time and data.
Used Sqoop to export the data from Oracle.
Processed data in the form of Data Frame and save the data as Parquet format in HDFS using spark.
Big data Developer, 09/2017 to 11/2017
CitiBank – New York, NY
Installing software on virtual machines hosted on VMware
Hadoop multi node cluster setup.
Virtual machines were used for master and slaves. All machines had a data node set.
Extraction of data from SQL database and text files and loading in Hadoop HDFS
Setting up Virtual Machines and Instances using Oracle/VM Virtual Manager
Wrote multiple Spark programs in Python for data extraction, transformation, and aggregation from multiple file-formats
Assisted in exporting analyzed data to relational databases using Sqoop
Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation.
Responsible for the performance optimization of spark jobs
Responsible to manage data coming from different sources into a HDFS Data Lake
Used the Hive JDBC to verify the data stored in the Hadoop cluster
Extracted the data from RDBMS (Oracle, MySQL) to HDFS using Sqoop
Good knowledge in SQL, HQL hands on experience in writing medium level SQL queries
Used Spark modules to store the data on HDFS
Installed Oozie workflow engine to run multiple Hive Jobs.
Hadoop Developer, 12/2013 to 06/2017
Unity Bank – Lagos, Nigeria
Worked on analyzing Hadoop Cloudera cluster and different big data analytics tools including Pig, Hive, and Sqoop.
Created ETL pipelines to ingest data.
Ingested data using Sqoop from various databases and moved the data into HDFS.
Used Pig scripts to clean and transform the data in the raw layer and moved into curated layer.
Moved the transformed data into Hive using bash scripts.
Orchestrated the pipelines using Apache OOZIE.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the business users.
Created and maintained technical documentation for launching the Sqoop job, pig scripts and Hive queries.
Data Analyst, 05/2009 to 11/2013
Media Track – Lagos, Nigeria
Used Oracle -SQL to extract the data and used MS Excel to analyze and draw accurate and reliable conclusions to provide insights to upper management and financial team.
Implemented knowledge of software development life cycle to analyze data regarding internal work efficiency and noted areas of wasted time, increasing total productivity in production by 10 percent.
Used different kinds of charts in MS Excel.
Conducted research using focus groups on 3 different products and increased sales by 17% due to the findings.
Spearheaded data flow improvement.
Developed Key Performance Indicators to monitor sales and decreased costs by 18%.
Education
Austin Peay State University, Clarksville, TN
MS Predictive Analytics