Aws Data

Location:

Birmingham, AL

Posted:

January 23, 2021

Contact this candidate

Resume:

TECHNICAL SKILLS

APACHE

Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Spark, Apache Tez, Apache Zookeeper, Impala, HDFS, MapR, MapReduce

SCRIPTING

Python, LUNIX, Shell scripting, R Programming, Scala

OPERATING SYSTEMS

Unix/Linux, Windows 10

FILE FORMATS

Parquet, Avro & JSON, ORC, text, csv

DISTRIBUTIONS

Cloudera CDH 4/5, Hortonworks HDP 2.5/2.6

DATA PROCESSING (COMPUTE) ENGINES

Apache Spark, Spark Streaming, Flink

DATA VISUALIZATION TOOLS

Tableau, PowerBI

COMPUTE ENGINES

Apache Spark, Spark Streaming, Storm

DATABASE

PostgreSQL, MySQL, Apache Cassandra, Amazon Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDB,

SOFTWARE

Microsoft Word, Excel, Outlook, PowerPoint, LaTex; Technical Documentation Skills

WORK EXPERIENCE

Enhance IT, Big Data Engineer June – Present

Atlanta, GA

Support, maintain and document Hadoop and MySQL data warehouse

Iterate and improve existing features in the pipeline as well as add new ones

Design, develop, document, and test new requirements in the data pipeline using BASH, FLUME, HDFS and SPARK in the Hadoop ecosystem

Provide full operational support – analyze code to identify root causes of production issues and provide solutions or workarounds and lead it to resolution

Participate in full development life cycle including requirements analysis, design, development, deployment, and operations support

Made and oversaw cloud VMs with AWS EC2 command-line clients and AWS administration reassure.

Used Spark DataFrame API over the Cloudera platform to perform analytics on Hive data.

Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.

Used Ansible Python Script to generate inventory and push the deployment to AWS Instances.

Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.

Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift

Implemented AWS Lambda functions to run scripts in response to events in the Amazon Dynamo DB table or S3.

Populating database tables via AWS Kinesis Firehose and AWS Redshift.

Automated the installation of ELK agent (file beat) with Ansible playbook. Developed KAFKA Queue System to Collect Log data without Data Loss and Publish to various Sources.

AWS Cloud Formation templates used for Terraform with existing plugins.

Developed AWS Cloud Formation templates to create a custom infrastructure of our pipeline

Implemented AWS IAM user roles and policies to authenticate and control access

Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS

Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift

EDUCATION

Regis University Denver, CO

Master of Science, Data Science, GPA: 3.80 Jan. 2019 – May 2020

University of Alabama at Birmingham Birmingham, AL

Bachelor of Arts, History, GPA: 3.54, Cum Laude, Phi Alpha Theta Aug. 2012 – Aug. 2013

Alabama Southern Community College Monroeville, AL

Associate of Arts, Liberal Arts, GPA: 3.70, Phi Theta Kappa Aug. 2010 – Dec. 2011

Contact this candidate