Big Data Engineering

Location:

Houston, TX

Posted:

February 06, 2023

Contact this candidate

Resume:

Key Skills

Profile Summary

AWS Cloud

Big Data Systems

ETL

Project Management

Data Visualization

SQL Server Database

Project Execution

Documentation

Soft Skills

•Skilled in databases, data management, analytics, data processing, data cleanings, data modeling, and data-driven projects

•Experience working on Big Data systems, ETL pipelines, and real-time analytic systems, including Machine Learning algorithms, slicing/dicing OLTP Cubes, and drilling tabular models

•Proficient in various distributions such as Hadoop Apache ecosystems, Microsoft Azure and Spark Databricks

•Experienced at bucketing, partitioning, multi-threading computing, and streaming (Python, PySpark)

•Accustomed to working with large complex data sets, real-time/near real-time analytics, and distributed Big Data platforms

•Experienced in design, development, and system migration of high-performance metadata-driven data pipelines with Kafka and Hive/Presto on Qubole providing data export capability through API and UI

•Work with existing EDS platforms and strategic initiatives that are built for future phases of EDS/EBI

•Experience collecting log data from various sources and integrating it into HDFS using Flume; staging data in HDFS for further analysis

•Used Python for Big Data pipelines, and customizations, Transfer pipelines for transformation and moving of data using Flume, Spark, Spark Streaming, and Hadoop

•Worked with various file formats (Parquet, Avro & JSON) and Compressions (Snappy& Gzip)

•Skilfully deployed large multiple nodes of a Hadoop and Spark cluster

•Developed custom large-scale enterprise applications using Spark for data processing

•Experience in developing Apache Airflow workflows for scheduling and orchestrating the ETL process

•Possess excellent communication, analytical and interpersonal skills

Education

•Masters in Business Analytics from the University of Alabama at Huntsville, Alabama in 2021

•Bachelor of Science in Civil Engineering from Georgia Institute of Technology, Atlanta in 2015

•Associate degree in Civil Engineering from Westchester Community College, New York in 2011

Technical Skills

•Databases: SQL, Snowflake, MongoDB, Redshift, Hive

•Programming Languages: Java, Python, Scala, Spark, SQL, Shell, Spark

•Cloud Platforms: AWS, Azure

•CI/CD: GitHub, Gitlab, Jenkins

•Cloud Tools: Redshift, S3, Lambda, SQS, SNS, Step Functions, RDS, Secrets Manager, Glue, EC2, and EMR.

•Big Data: Hadoop, Hive, Flume, Sqoop, Airflow, Nifi, Spark, Spark Streaming, Yarn, Kafka, Zookeeper

Work Experience

Sr. Big Data Engineer

VROOM, Houston

September’21 to Present

•Designed Snowflake queries to perform data analysis, data transfer, and table design

•Adept at Project Management methodologies such as Waterfalls Rational Rose or Scrum / Agile / Sprint, Epics/Stories with a good knowledge of SOLID Patterns

•Developed Cloud-based Big Data Architecture using Hadoop and AWS and developed PySpark application as ETL processes

•Created Snowflake tables, loading with data and writing complex SQL queries to process data

•Created Hive and SQL queries to spot emerging trends by comparing data with historical metrics

•Performed troubleshooting on the distribution of different components of Apache Big Data tools to ensure the performance of pipelines

•Developed a cluster of Kafka brokers to retrieve structured data in structured streaming

•Set up Hadoop data ingestion and Hadoop cluster handing in real-time processing using Kafka and Spark

•Established collection of data using REST API, built HTTPS connection with client server, sent GET request, and collected response in Kafka producer

•Integrated Kafka with Spark Streaming for real-time data processing using structured streaming

•Stored data pulled from diverse APIs into HBase on Hortonworks and imported data from web services into HDFS and transformed data using Spark

•Used Spark to parse out data by using Spark SQL Context and select features with target information and assigned names

•Decoded raw data from JSON and streamed it using the Kafka producer API

•Conducted exploratory data analysis and manage dashboard for weekly report

•Utilized transformations and actions in Spark to interact with data frames to show and process data

•Split JSON files into DataFrames to be processed in parallel for better performance and fault tolerance

Data Engineer

Hy Vee, Remote

Sep’19- Aug’21

•Created data frames in Apache Spark by passing schema as a parameter to the ingested data using case classes

•Participated in the development/implementation of Cloudera, Hadoop and Hortonworks environments

•Involved in implementation of analytics solutions through Agile/Scrum processes for development and quality assurance.

•Interacted with data residing in HDFS using Spark to process the data.

•Automated, configured, and deployed instances on AWS Azure environments.

•Populated data frames inside spark jobs, Spark SQL and Data Frames API to load structured data into Spark clusters.

•Forwarded requests to source REST Based API from a Scala script via Kafka Producer

•Developed PySpark application to read data from various file system sources, apply transformations, and write to SQL database

•Gained knowledge of Hadoop, Spark, and similar frameworks.

•Attended meeting with managers to determine the company’s Big Data needs and developed Hadoop systems.

•Loaded disparate data sets and conducted pre-processing services using Hive or Pig

•Finalized the scope of the system and delivering Big Data solutions

•Collaborated with the software research and development teams and built building cloud platforms for the development of company applications.

•Training staff on data resource management

•Collected data using REST API, built HTTPS connection with client-server, sent GET request, and collected response in Kafka Producer

•Imported data from web services into HDFS and transformed data using Spark

•Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets, and ingested data through AWS Kinesis Data Stream and Firehose from various sources to S3

•Used SparkSQL for creating and populating the HBase warehouse

•Worked with Spark Context, Spark -SQL, DataFrames, and Pair RDDs

•Extracted data from different databases and scheduled Oozie workflows to execute the task daily

•Worked with Amazon Web Services (AWS) and was involved in ETL, Data Integration, and Migration

•Worked on AWS Kinesis for processing huge amounts of real-time data

Big Data Engineer

Deloitte, NY

July’17 to August’19

With analytics, an organization is better able to be descriptive, predictive, and prescriptive—but only if there’s a firm connection between what analytics can deliver and what the business is trying to accomplish. How can your organization use analytics to help deliver deeper insights to enable more effective decision-making? Deloitte serves clients as multinational information technology services and consulting company.

•Created end-to-end ETL pipelines using Components such as Hadoop, Spark, and Kafka

•Built Real-Time Streaming Data Pipelines with Kafka, Spark Streaming

•Created a Kafka producer to connect to different external sources and bring the data to a Kafka broker

•Developed ETL pipeline to process log data from Kafka/HDFS sequence file and output to Hive tables in ORC format

•Implemented Spark streaming for real-time data processing with Kafka and handled large amounts of data with Spark

•Wrote streaming applications with Spark Streaming/Kafka

•Used SQL to perform transformations and actions on data residing in HDFS

•Responsible for designing and deploying new ELK clusters

•Participated in various phases of data processing (collecting, aggregating, moving from various sources) using Apache Spark

•Managed structured data via Spark SQL then stored into Hive tables for downstream consumption.

•Defined the Spark/Python (PySpark) ETL framework and best practices for development and wrote Python code that tracks Kafka message delivery

•Built Jenkins jobs for CI/CD infrastructure from GitHub repos

•Support for the clusters, and topics on the Kafka manager, coordinated Kafka operation and monitoring with dev ops personnel; formulated balancing the impact of Kafka producer and Kafka consumer message(topic) consumption

•Used Cloudera Manager for installation and management of a multi-node Hadoop cluster

•Versioned with Git and set up a Jenkins CI to manage CICD practices.

•Interacted with data residing in HDFS using Spark to process the data

Jr. Data Engineer

Ingles Market Inc., (Remote)

October’15 – July‘17

•Sourced data using APIs with data available in JSON to be converted to Parquet and Avro formats

•Used Kafka to ingest Data and create topics for data streaming

•Utilized Spark for data processing and creating DStreams from data received from Kafka

•Stored results of processed data in Hive

•Automated AWS components like EC2 instances, Security groups, ELB, RDS, Lambda, and IAM through AWS Cloud Formation templates

•Worked on large data warehouse analysis services servers and developed the different reports for the analysis from those servers

•Wrote Hive scripts to process HDFS data and wrote shell scripts to automate workflows to pull data from various databases into the Hadoop framework for users to access the data through Hive views

•Launched and configured Amazon EC2 Cloud Servers using AMIs and configured the servers for specified applications

•Developed SQL queries to Insert, update and delete data in a data warehouse

•Documented the requirements including the available code which should be implemented using Spark, Amazon DynamoDB, Redshift, and Elastic Search

•Imported and exported data into HDFS and Hive using Sqoop

Ayooluwa Amole

AWS Cloud Big Data Engineering

Achievement-driven professional with nearly 8 years of experience in Big Data/ Cloud; targeting assignments with an organization of repute

**************@*****.*** 281-***-****

Adaptable

Communicator

Strategic Thinker

Collaborative

Team Player

Problem-solving

Contact this candidate