EZRA DUDDEN
BIG DATA DEVELOPER
Email: ************@*****.*** Phone: 805-***-****
Professional Summary
Having 6 years Skilled in managing data analytics and data processing, database and data driven projects, Skilled in DataFrames, Analytics Systems for diverse end users and industries covering heavy equipment, feature film and television entertainment, and industrial technology integration. Skilled in Database systems and administration. Proficient in writing technical reports and documentation. Adept with various distributions such as Cloudera Hadoop, Hortonworks, and Elastic Cloud, Elasticsearch. Skilled with Spark and Kafka.
Technical Skills
Apache
Kafka, Maven, Oozie, Pig, Hue, Sqoop, Flume, Hadoop, HBase, Cassandra, Airflow, Tez, ZooKeeper
BI Visualization
Kibana, Tableau, PowerBI
Programming
Python, Scala, Java, Shell Scripting
Development
Drupal, UX Design
File Types
XML, CSV, JSON, Avro, Parquet, ORC
APIs
REST API
Development
Agile, Kanban, Scrum, Continuous Integration, TDD, Unit Testing, Functional Testing, Design Thinking, Lean, Six Sigma
Soft Skills
Communication, Collaboration, Customer Service, Help Desk, Mentoring, Reviewing
Big Data
RDDs, UDFs, Data Frames, Datasets, Pipelines, Data Lakes, Data Warehouse, Data Analysis
Hadoop
Hadoop, HDFS, Hadoop YARN, Hortonworks, Cloudera, Impala
Spark & Hive
Apache Spark, Spark Streaming, Apache Hive, Hive QL
Database
Redshift, ArangoDB, Cassandra, HBase, MongoDB, SQL, NoSQL, MySQL, RDBMS, Access, Oracle
File Management
HDFS, Snappy, Gzip, DAS, NAS, SAN
Cloud Services & Distributions
AWS, Azure, Anaconda Cloud, Elasticsearch, Solr, Lucene, Cloudera, Databricks, Hortonworks, Elastic. Cloud Foundry, Elastic Cloud
Software
Microsoft Office Suite, Peoplesoft, WEKA, PandaDoc
Operating Systems
Linux/UNIX, Windows
Virtualization & Network
SQL Injection, Data FTK imager, WAN/LAN, TCP/IP, Routing, VMware, VirtualBox, OSI Model
Experience
02/20 - Current
Big Data Engineer
MGM Studio, Thousand Oaks, CA
Developed Spark jobs using Spark SQL to process structured data into Spark clusters.
Implemented Spark using Python and utilized Data Frames and Spark SQL API for faster processing of data.
Ingested Images responses in Kafka producer written in Python.
Defined a schema for a custom HBase table.
Analyze the performance of digital strategies to yield business recommendations.
Worked on AWS Kinesis for processing real-time data.
Handling schema changes in the data stream using Spark.
Connected databases and Tableau to integrate to HBase and configured for optimized data extraction.
Used Tableau to develop multiple SQL queries to join tables and create dashboards.
10/18 – 02/20
Big Data Developer
John Deer, Moline, IL
Communicated between business and development team to translate requirements into technical feasibilities.
Spark jobs, Spark SQL and Data Frames API to load structured data into Spark clusters.
Created a Kafka broker which uses schema to fetch structured data in structured streaming.
Interacted with data residing in HDFS using Spark to process the data.
Handled structured data via Spark SQL, stored into Hive tables for consumption.
Accessed Hadoop file system (HDFS) using Spark and managed data in Hadoop data lakes with Spark.
Handled structured data with Spark SQL to process in real time from Spark Structured Streaming.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs Python
Support for the clusters, topics on the Kafka manager.
Configured Spark Streaming to receive real-time data to store in HDFS.
Ingested production line data from IoT sources through internal RESTful APIs.
Created Python scripts to download data from the APIs and perform pre-cleaning steps.
Performed ETL operations on IoT data using PySpark.
Wrote user-defined functions (UDFs) to apply custom business logic to datasets using PySpark.
Configured AWS S3 to receive and store data from the resulting PySpark job.
Wrote DAGs for Airflow to allow scheduling and automatic execution of the pipeline.
Performed cluster-level and code-level Spark optimizations.
Created AWS Redshift Spectrum external tables to query data in S3 directly for data analysts.
Wrote simple SQL scripts on the final database to prepare data for visualization with Tableau.
Hands-on with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
10/17 – 10/18
Big Data Developer
BlackRock, New York, NY
Used Spark to process data on top of YARN and performed analytics on data in Hive.
Used Cloudera Manager to measure performance for optimization.
Experience in collecting real-time log data from diverse sources like RESTful API and social media using web crawlers in Python, and filtered to load the data into HDFS for further analysis - ingested through Kafka.
Handled importing of data from various data sources, performed transformations and analysis using spark.
Hands-on experience installing, configuring, and deploying Hadoop distributions in cloud environments
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Wrote shell scripts for exporting log files to Hadoop cluster through automated processes.
Migrated data from RDBMS for streaming or static data into the Hadoop cluster using Hive, Flume and Sqoop.
Transfered Streaming log data from different data sources into HDFS and HBase using Apache Flume.
Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data into HDFS using Spark.
Developed nifi pipeline using ELK for quicker testing and handling of information.
Used Kafka producer to ingest the raw data into Kafka topics run the Spark Streaming app to process event-driven data.
Collecting the real-time data from Kafka using Spark Streaming and perform transformations.
Designed Hive queries to perform data analysis, data transfer and table design.
Expertise in Hive queries and have extensive knowledge on joins.
Developed end to end Hive Queries to parse the raw data, populated external & internal tables and store the refined data in partitioned external tables
08/16 – 10/17
Big Data Engineer
SAIC, Huntsville, AL
Implemented Spark in EMR for processing Big Data across our Data Lake in AWS System
Worked with Amazon AWS IAM console to create custom users and groups.
Created AWS Lambda function for extracting the data from Kinesis Firehose and post the data to AWS S3 bucket on scheduled basis (every 4 hours) using AWS cloud watch event.
Implemented Serverless architecture using AWS Lambda with Amazon S3 and Amazon Dynamo DB.
Expertise in AWS data migration between different database platforms like Local SQL Server to Amazon RDS and EMR HIVE.
Led many critical on-prem data migration to AWS cloud, assisting the performance tuning and providing successful path towards Redshift Cluster and AWS RDS DB engines.
Worked on AWS S3 bucket integration for application and development projects.
Experience in managing and reviewing Hadoop log files in AWS S3.
Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Amazon Redshift.
Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Amazon Redshift.
Education
Bachelor of Science in Mathematics (major concentration: Theoretical Mathematics from Metropolitan State University of Denver