HALLELUJAH LEMMA
*****.************@*****.*** 510-***-****
PROFESSIONAL SUMMARY
Big Data Engineer with 11+ years of IT experience including 5 years of experience in the Big Data technologies. Expertise in Hadoop/Spark using cloud services, automation tools, and software design process. Outstanding communication skills dedicated to maintaining up-to-date IT skills.
● Around 11+ years of extensive experience in IT including 5 years of Big Data and cloud related technologies
● Good understanding/knowledge of installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster and cloud services.
● Hands-on experience on MapReduce Programming/HDFS Framework including Hive, Spark, PIG, Oozie, Sqoop and Podium.
● In depth understanding knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts
● Experience in YARN Architecture and its components such as Node Manager, Application Masters, Resource Manager.
● Set up standards and processes for Hadoop based application design and implementation.
● Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by custom UDF's.
● Experience in Designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and Hadoop ecosystem.
● Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
● Responsible for building scalable distributed data solutions using Hadoop.
● Hands on experience in application development using C#.net, RDBMS, Linux shell scripting & Windows command Scripting.
● Experienced in coding SQL, Procedures/Functions, Triggers and Packages on database (RDBMS) packages.
● Developed and deployed SSIS packages and ETL Operation using stored procedures and queries using SQL.
● Strong Communication skills of written, oral, interpersonal and presentation.
● Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
SKILLS
IDE: Jupyter Notebooks (formerly iPython Notebooks), Eclipse, IntelliJ, PyCharm
PROJECT METHODS: Agile, Kanban, Scrum, DevOps, Continuous Integration, Test-Driven Development
HADOOP DISTRIBUTIONS: Hadoop, Cloudera Hadoop, Hortonworks Hadoop
CLOUD PLATFORMS: Amazon AWS - EC2, SQS, S3, MapR, Elastic Cloud
CLOUD SERVICES: Databricks
CLOUD DATABASE & TOOLS: Redshift, DynamoDB, Cassandra, Apache Hbase, SQL
PROGRAMMING LANGUAGES: Spark, Spark Streaming, Java, Python, Scala, PySpark, PyTorch
SCRIPTING: Hive, MapReduce, SQL, Spark SQL, Shell Scripting
CONTINUOUS INTEGRATION (CI-CD): Jenkins
VERSIONING: Git, GitHub
PROGRAMMING METHODOLOGIES: Object-Oriented Programming, Functional Programming
FILE FORMAT AND COMPRESSION: CSV, JSON, Avro, Parquet, ORC
FILE SYSTEMS: HDFS
ETL TOOLS: Apache Nifi, Flume, Kafka, Talend, Pentaho, Sqoop
DATA VIZIUALIZATION TOOLS: Tableau, Kibana
SEARCH TOOLS: Elasticsearch
SECURITY: Kerberos, Ranger
AWS: AWS Lambda, AWS S3, AWS RDS, AWS EMR, AWS Redshift, AWS Kinesis, AWS ELK, AWS Cloud Formation, AWS IAM
Data Query: Spark SQL, Data Frames
WORK HISTORY
Sep 2018 - Current
DATA DEVELOPER
Asics - Oakdale, PA • Ingested data into AWS S3 data lake from various devices using AWS Kinesis.
• Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3) and AWS Redshift.
• Implemented AWS Lambda functions to run scripts in response to events in the Amazon Dynamo DB table or S3.
• Used AWS Cloud Formation templates to create a custom infrastructure of our pipelines.
• Implemented AWS IAM user roles and policies to authenticate and control access.
• Specified nodes and performed the data analysis queries on Amazon redshift clusters using AWS Athena on AWS.
• Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift.
• Assisted in designing, building and maintaining database to analyze life cycle of checking transactions.
• Designed and developed ETL jobs to extract data from AWS S3 and loaded it into data mart in Amazon Redshift.
• Implemented and maintained EMR, Redshift pipeline for data warehousing.
• Used Spark engine, Spark SQL for data analysis and provided intermediate results to data scientists for transactional analytics and predictive analytics.
Jan 2017 - Sep 2018
DATA ENGINEER
DELTA - ATLANTA, GA • Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift.
• Wrote scripts to automate workflow, execution, loading, parsing of data into a different database system.
• Deployed Hadoop to create data pipelines involving HDFS.
• Managed multiple Hadoop clusters to support data warehousing.
• Created and maintained Hadoop HDFS, Spark, HBase pipelines.
• Supported data science team with data wrangling, data modeling tasks.
• Built continuous Spark streaming ETL pipeline with Kafka, Spark, Scala, HDFS and MongoDB.
• Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
• Worked on importing unstructured data into HDFS using Spark Streaming and Kafka.
• Worked on installing clusters, commissioning and decommissioning data nodes, configuring slots, and ensured name node availability with regards to capacity planning and management.
• Implemented Spark using Scala and Spark SQL for optimized analysis and processing of data.
• Configured Spark Streaming to process real time data via Kafka and store it to HDFS using Scale.
• Loaded data from multiple servers to AWS S3 bucket and configured bucket permissions.
• Used Apache Kafka to transform live streaming with batch processing to generate reports.
• Implemented Partitioning, Dynamic Partitions, and Buckets in HIVE for performance optimization and organized data in logical grouping.
• Maintained and performed Hadoop log file analysis for error, access statistics for fine tuning.
• Optimized data storage in Kafka Brokers within the Kafka cluster by partitioning Kafka Topics.
• Used Impala for faster data analysis on HDFS data over Hive when fault tolerance was not required, and data was in text format.
• Responsible for analysis, design coding, unit & regression testing and implementation of the new requirements.
• Preparation of Visio Diagrams for Process and Execution Flow, Design Documentation and Unit Test cases.
• Building generic tool to import data from Flat files and from various other Database Engines using Podium.
• Development and automation of moving data across various Clusters using Distcp which can perform daily operations, reports in Hue UI.
• Creating Sqoop Jobs & Commands, in order to pull data from database and manipulate the data using Hive Queries.
• Development of data ingestion code using the Sqoop and perform ETL and processing phase using the Apache Hive, Spark using Pyspark and SQL Spark scripting.
• Performed orchestra using Apache Oozie and Autosys Jobs and coordinators to schedule and run workflows and packages containing Shell, Sqoop, Spark, Hive, Email actions.
• Build generic tool for Data Validation and automation report.
• Perform recursive Deployments through Jenkins and maintain code at Git Repository.
Oct 2015 - Jan 2017
BIG DATA DEVELOPER
The NIELSON COMPANY - New York, NY • Ingested data from multiple sources into HDFS data lake.
• Responsible for analysis, design coding, unit & regression testing and implementation of the new requirements.
• Preparation of Visio Diagrams for Process and Execution Flow, Design Documentation and Unit Test cases.
• Building generic tool to import data from Flat files and from various other Database Engines.
• Scripting of Teradata to Hadoop connector (TDCH) to pull from TD Source.
• Creating Sqoop Jobs & Commands, in order to pull data from database and manipulate the data using Hive Queries.
• Development of data ingestion code using the Sqoop and perform ETL and processing phase using the Apache Hive, Spark using Pyspark and SQL Spark scripting.
• Performed orchestra using Apache Oozie Jobs and coordinators to schedule and run workflows and packages containing Shell, Sqoop, Spark, Hive, Email actions.
• Developed Statistics and Visualization programs using R Programming (Graphs & Plots) and load the stats to RDBMS.
• Build generic tool for Data Validation and automation report.
• Manage Onsite and Offshore resources by assigning tasks using agile methodology and fetch updates through Scrum calls and Sprint meeting.
May 2013 - Oct 2015
HADOOP DEVELOPER
United Airlines - Chicago, IL • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
• Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
• Built data pipeline using Pig and Hadoop commands to store onto HDFS.
• Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
• Processing phase using Hive, Spark, Pig, R programming to store onto HDFS.
• Build Graphs and Plots using R Programming for statistics & Visualization using Matlab and plotly.
• Performed Analytics and Recommendations for the Business using Hive & Pig scripts thorough ETL processing.
• Migration of ETL code base from Hive, Pig to Pyspark.
• Developed and Deployed Spark submit command with suitable Executors, Cores and Drivers on the Cluster.
• Applied transformations and filtered both traffic using Pig.
Mar 2009 - Apr 2013
SHAREPOINT ADMINISTRATOR
Sutherland Global Services - Rochester, NY • SharePoint Server 2010, SharePoint Designer, InfoPath Designer, MS Office.
• Design and establish a SharePoint application.
• Creation of web parts, workflows, content types.
• Develop InfoPath forms allowing programmatic submission to a SharePoint Form Library and initiating workflow processes.
• Creating and managing sites, site collections, lists, form templates, documents and form libraries.
• Created and managed web applications.
• Created and managed dynamic web parts.
• Customization of library attributes, import and export of existing data and connections of data.
• Provided a workflow and initiated the workflow processes.
• Worked on SharePoint Designer and InfoPath Designer and developed workflows and forms.
EDUCATION
2006
Bachelor's Degree: Computer Science
Mekelle University
CERTIFICATIONS
Microsoft Certified database administrator in MS SQL Server Databases Administration