Data Aws

Location:

Washington, VA, 22747

Posted:

March 09, 2021

Contact this candidate

Resume:

HALLELUJAH LEMMA

*****.************@*****.*** 510-***-****

PROFESSIONAL SUMMARY

Big Data Engineer with 11+ years of IT experience including 5 years of experience in the Big Data technologies. Expertise in Hadoop/Spark using cloud services, automation tools, and software design process. Outstanding communication skills dedicated to maintaining up-to-date IT skills.

● Around 11+ years of extensive experience in IT including 5 years of Big Data and cloud related technologies

● Good understanding/knowledge of installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster and cloud services.

● Hands-on experience on MapReduce Programming/HDFS Framework including Hive, Spark, PIG, Oozie, Sqoop and Podium.

● In depth understanding knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts

● Experience in YARN Architecture and its components such as Node Manager, Application Masters, Resource Manager.

● Set up standards and processes for Hadoop based application design and implementation.

● Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by custom UDF's.

● Experience in Designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and Hadoop ecosystem.

● Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.

● Responsible for building scalable distributed data solutions using Hadoop.

● Hands on experience in application development using C#.net, RDBMS, Linux shell scripting & Windows command Scripting.

● Experienced in coding SQL, Procedures/Functions, Triggers and Packages on database (RDBMS) packages.

● Developed and deployed SSIS packages and ETL Operation using stored procedures and queries using SQL.

● Strong Communication skills of written, oral, interpersonal and presentation.

● Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.

SKILLS

IDE: Jupyter Notebooks (formerly iPython Notebooks), Eclipse, IntelliJ, PyCharm

PROJECT METHODS: Agile, Kanban, Scrum, DevOps, Continuous Integration, Test-Driven Development

HADOOP DISTRIBUTIONS: Hadoop, Cloudera Hadoop, Hortonworks Hadoop

CLOUD PLATFORMS: Amazon AWS - EC2, SQS, S3, MapR, Elastic Cloud

CLOUD SERVICES: Databricks

CLOUD DATABASE & TOOLS: Redshift, DynamoDB, Cassandra, Apache Hbase, SQL

PROGRAMMING LANGUAGES: Spark, Spark Streaming, Java, Python, Scala, PySpark, PyTorch

SCRIPTING: Hive, MapReduce, SQL, Spark SQL, Shell Scripting

CONTINUOUS INTEGRATION (CI-CD): Jenkins

VERSIONING: Git, GitHub

PROGRAMMING METHODOLOGIES: Object-Oriented Programming, Functional Programming

FILE FORMAT AND COMPRESSION: CSV, JSON, Avro, Parquet, ORC

FILE SYSTEMS: HDFS

ETL TOOLS: Apache Nifi, Flume, Kafka, Talend, Pentaho, Sqoop

DATA VIZIUALIZATION TOOLS: Tableau, Kibana

SEARCH TOOLS: Elasticsearch

SECURITY: Kerberos, Ranger

AWS: AWS Lambda, AWS S3, AWS RDS, AWS EMR, AWS Redshift, AWS Kinesis, AWS ELK, AWS Cloud Formation, AWS IAM

Data Query: Spark SQL, Data Frames

WORK HISTORY

Sep 2018 - Current

DATA DEVELOPER

Asics - Oakdale, PA • Ingested data into AWS S3 data lake from various devices using AWS Kinesis.

• Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3) and AWS Redshift.

• Implemented AWS Lambda functions to run scripts in response to events in the Amazon Dynamo DB table or S3.

• Used AWS Cloud Formation templates to create a custom infrastructure of our pipelines.

• Implemented AWS IAM user roles and policies to authenticate and control access.

• Specified nodes and performed the data analysis queries on Amazon redshift clusters using AWS Athena on AWS.

• Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift.

• Assisted in designing, building and maintaining database to analyze life cycle of checking transactions.

• Designed and developed ETL jobs to extract data from AWS S3 and loaded it into data mart in Amazon Redshift.

• Implemented and maintained EMR, Redshift pipeline for data warehousing.

• Used Spark engine, Spark SQL for data analysis and provided intermediate results to data scientists for transactional analytics and predictive analytics.

Jan 2017 - Sep 2018

DATA ENGINEER

DELTA - ATLANTA, GA • Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift.

• Wrote scripts to automate workflow, execution, loading, parsing of data into a different database system.

• Deployed Hadoop to create data pipelines involving HDFS.

• Managed multiple Hadoop clusters to support data warehousing.

• Created and maintained Hadoop HDFS, Spark, HBase pipelines.

• Supported data science team with data wrangling, data modeling tasks.

• Built continuous Spark streaming ETL pipeline with Kafka, Spark, Scala, HDFS and MongoDB.

• Scheduled and executed workflows in Oozie to run Hive and Pig jobs.

• Worked on importing unstructured data into HDFS using Spark Streaming and Kafka.

• Worked on installing clusters, commissioning and decommissioning data nodes, configuring slots, and ensured name node availability with regards to capacity planning and management.

• Implemented Spark using Scala and Spark SQL for optimized analysis and processing of data.

• Configured Spark Streaming to process real time data via Kafka and store it to HDFS using Scale.

• Loaded data from multiple servers to AWS S3 bucket and configured bucket permissions.

• Used Apache Kafka to transform live streaming with batch processing to generate reports.

• Implemented Partitioning, Dynamic Partitions, and Buckets in HIVE for performance optimization and organized data in logical grouping.

• Maintained and performed Hadoop log file analysis for error, access statistics for fine tuning.

• Optimized data storage in Kafka Brokers within the Kafka cluster by partitioning Kafka Topics.

• Used Impala for faster data analysis on HDFS data over Hive when fault tolerance was not required, and data was in text format.

• Responsible for analysis, design coding, unit & regression testing and implementation of the new requirements.

• Preparation of Visio Diagrams for Process and Execution Flow, Design Documentation and Unit Test cases.

• Building generic tool to import data from Flat files and from various other Database Engines using Podium.

• Development and automation of moving data across various Clusters using Distcp which can perform daily operations, reports in Hue UI.

• Creating Sqoop Jobs & Commands, in order to pull data from database and manipulate the data using Hive Queries.

• Development of data ingestion code using the Sqoop and perform ETL and processing phase using the Apache Hive, Spark using Pyspark and SQL Spark scripting.

• Performed orchestra using Apache Oozie and Autosys Jobs and coordinators to schedule and run workflows and packages containing Shell, Sqoop, Spark, Hive, Email actions.

• Build generic tool for Data Validation and automation report.

• Perform recursive Deployments through Jenkins and maintain code at Git Repository.

Oct 2015 - Jan 2017

BIG DATA DEVELOPER

The NIELSON COMPANY - New York, NY • Ingested data from multiple sources into HDFS data lake.

• Responsible for analysis, design coding, unit & regression testing and implementation of the new requirements.

• Preparation of Visio Diagrams for Process and Execution Flow, Design Documentation and Unit Test cases.

• Building generic tool to import data from Flat files and from various other Database Engines.

• Scripting of Teradata to Hadoop connector (TDCH) to pull from TD Source.

• Creating Sqoop Jobs & Commands, in order to pull data from database and manipulate the data using Hive Queries.

• Development of data ingestion code using the Sqoop and perform ETL and processing phase using the Apache Hive, Spark using Pyspark and SQL Spark scripting.

• Performed orchestra using Apache Oozie Jobs and coordinators to schedule and run workflows and packages containing Shell, Sqoop, Spark, Hive, Email actions.

• Developed Statistics and Visualization programs using R Programming (Graphs & Plots) and load the stats to RDBMS.

• Build generic tool for Data Validation and automation report.

• Manage Onsite and Offshore resources by assigning tasks using agile methodology and fetch updates through Scrum calls and Sprint meeting.

May 2013 - Oct 2015

HADOOP DEVELOPER

United Airlines - Chicago, IL • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.

• Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.

• Built data pipeline using Pig and Hadoop commands to store onto HDFS.

• Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.

• Processing phase using Hive, Spark, Pig, R programming to store onto HDFS.

• Build Graphs and Plots using R Programming for statistics & Visualization using Matlab and plotly.

• Performed Analytics and Recommendations for the Business using Hive & Pig scripts thorough ETL processing.

• Migration of ETL code base from Hive, Pig to Pyspark.

• Developed and Deployed Spark submit command with suitable Executors, Cores and Drivers on the Cluster.

• Applied transformations and filtered both traffic using Pig.

Mar 2009 - Apr 2013

SHAREPOINT ADMINISTRATOR

Sutherland Global Services - Rochester, NY • SharePoint Server 2010, SharePoint Designer, InfoPath Designer, MS Office.

• Design and establish a SharePoint application.

• Creation of web parts, workflows, content types.

• Develop InfoPath forms allowing programmatic submission to a SharePoint Form Library and initiating workflow processes.

• Creating and managing sites, site collections, lists, form templates, documents and form libraries.

• Created and managed web applications.

• Created and managed dynamic web parts.

• Customization of library attributes, import and export of existing data and connections of data.

• Provided a workflow and initiated the workflow processes.

• Worked on SharePoint Designer and InfoPath Designer and developed workflows and forms.

EDUCATION

2006

Bachelor's Degree: Computer Science

Mekelle University

CERTIFICATIONS

Microsoft Certified database administrator in MS SQL Server Databases Administration

Contact this candidate