TECHNICAL SKILLS
APACHE
Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Spark, Apache Tez, Apache Zookeeper, Impala, HDFS, MapR, MapReduce
SCRIPTING
Python, LUNIX, Shell scripting, R Programming, Scala
OPERATING SYSTEMS
Unix/Linux, Windows 10
FILE FORMATS
Parquet, Avro & JSON, ORC, text, csv
DISTRIBUTIONS
Cloudera CDH 4/5, Hortonworks HDP 2.5/2.6
DATA PROCESSING (COMPUTE) ENGINES
Apache Spark, Spark Streaming, Flink
DATA VISUALIZATION TOOLS
Tableau, PowerBI
COMPUTE ENGINES
Apache Spark, Spark Streaming, Storm
DATABASE
PostgreSQL, MySQL, Apache Cassandra, Amazon Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDB,
SOFTWARE
Microsoft Word, Excel, Outlook, PowerPoint, LaTex; Technical Documentation Skills
WORK EXPERIENCE
Enhance IT, Big Data Engineer June – Present
Atlanta, GA
Support, maintain and document Hadoop and MySQL data warehouse
Iterate and improve existing features in the pipeline as well as add new ones
Design, develop, document, and test new requirements in the data pipeline using BASH, FLUME, HDFS and SPARK in the Hadoop ecosystem
Provide full operational support – analyze code to identify root causes of production issues and provide solutions or workarounds and lead it to resolution
Participate in full development life cycle including requirements analysis, design, development, deployment, and operations support
Made and oversaw cloud VMs with AWS EC2 command-line clients and AWS administration reassure.
Used Spark DataFrame API over the Cloudera platform to perform analytics on Hive data.
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Used Ansible Python Script to generate inventory and push the deployment to AWS Instances.
Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift
Implemented AWS Lambda functions to run scripts in response to events in the Amazon Dynamo DB table or S3.
Populating database tables via AWS Kinesis Firehose and AWS Redshift.
Automated the installation of ELK agent (file beat) with Ansible playbook. Developed KAFKA Queue System to Collect Log data without Data Loss and Publish to various Sources.
AWS Cloud Formation templates used for Terraform with existing plugins.
Developed AWS Cloud Formation templates to create a custom infrastructure of our pipeline
Implemented AWS IAM user roles and policies to authenticate and control access
Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS
Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift
EDUCATION
Regis University Denver, CO
Master of Science, Data Science, GPA: 3.80 Jan. 2019 – May 2020
University of Alabama at Birmingham Birmingham, AL
Bachelor of Arts, History, GPA: 3.54, Cum Laude, Phi Alpha Theta Aug. 2012 – Aug. 2013
Alabama Southern Community College Monroeville, AL
Associate of Arts, Liberal Arts, GPA: 3.70, Phi Theta Kappa Aug. 2010 – Dec. 2011