Data Engineer

Location:

Washington, VA, 22747

Posted:

March 23, 2021

Contact this candidate

Resume:

Deward Seneh

Data Engineer

719-***-****

***********.**@*****.***

5+ years of experience working as a Data Engineer, I have an extensive experience in designing and executing solutions for complex business problems involving large scale data warehousing, real-time analytics and reporting solutions. Also, I have vast experience in using cloud services with big data tools for building pipelines.

Skills

Hadoop Distributions

Hadoop, Cloudera Hadoop, Hortonworks Hadoop

Cloud Services

DataBricks, AWS

Operating Systems

Windows, Linux, and Unix.

ID Development Tools

Juptter Notbook, Eclipse, CA-API Gateway, VS Code, PyCharm, IntelliJ

Languages

Python, Scala, Spark, Spark Streaming Java, PL/SQL, C#, JavaScript

Scripting Languages

Hive, JavaScript, jQuery

Databases

SQL Server, Oracle 9i/10i/10g, MySQL, PostgresDB

Technologies

API Gateway, HTML5/CSS, Servlets, JSP, XML, JSON, CSV, JSTL, Soap, Web Services, WSDL., React

Work History

AWS Data Engineer

Liberty Media, Englewood, Colorado

Design a complete automation system for the data pipeline, from source to storage, implementing schedulers for automation.

Sourced data from data storage or data source using APIs calls provided by data source.

Created an injection phase to fetch data from source and have it stored temperately in zookeeper broker by creating an Apache Kafka topic partitions number as specified within task.

Using Apache Spark, streamed data from the Kafka topic.

Loaded screamed data into json and removed unwanted json object (properties).

Converted the data to DataFrame, transform DataFrame columns to required data types as needed for analysis and to match schema for data storage.

Save DataFrame output to MySQL data as requested.

Wrote python scripts to manage AWS resources from API calls using BOTO SDK and also worked with AWS CLI.

familiar with EC2, Cloud watch, Cloud Formation and managing security groups on AWS.

Experienced in Automating, Configuring and deploying instances on AWS, Azure environments and Data centers, also

Monitor logs for better understanding the functioning of the system.

Monitoring resources, such as Amazon DB Services, CPU Memory, EBS volumes.

Practical usage of Amazon Glacier for archiving data.

Technical knowledge on EC2, IAM, S3, VPC.

formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre

Write producer /consumer scripts to process JSON response in python

Hadoop Engineer

Traveler's Insurance, Hartford, CT

Wrote shell scripts for automating the process of data loading

Performed streaming data ingestion to the Spark distribution environment, using Kafka.

Proficient in writing complex queries the API into Apache Hive on Hortonworks Sandbox

Formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre to parse the JSON

Write producer /consumer scripts to process JSON response in python

Developed distributed query agents for performing distributed queries against shards

Experience with hands-on data analysis and performing under pressure.

Proficient experience in writing Queries, Stored procedures, Functions, and Triggers by using SQL. Support development, testing, and operations teams during new system deployments.

Implemented Spark using Scala and utilized Data Frames and Spark SQL API for faster processing of data.

Created Hive queries to summarizing and aggregating business queries by comparing Hadoop data with historical metrics

Worked closely with stakeholders and data scientists/data analysts to gather requirements and create an engineering project plan.

ETL to Hadoop file system (HDFS) and wrote HIVE UDFs.

Hive for queries and incremental imports with Spark and Spark jobs

Developed DBC//ODBC connectors between Hive and Spark for the transfer of the newly populated data frame

Writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language

Big Data Engineer

NCR, Atlanta, GA

Ingested data from an online Rest API in real-time.

Created a Kafka topic and have data sourced into the topic.

Using PySpark, created a producer and have real-time data ingested into the Kafka topic.

Created a PySpark consumer to subscribe to the Kafka topic to obtain data in real-time.

Created Spark-Streaming for data processing.

Integrate Hive, HDFS and spark for data storage.

Created a table in Hive based on desired output from spark streaming.

Conducted data processing, extracted desired fields, and implemented Spark SQL aggregation queries to have data transformed as needed.

Save output to Hive table in HDFS.

Created Hive External tables and loaded the data into tables and query data using HQL.

Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.

Developed Spark code using Scala and Spark-SQL for faster testing and data processing.

Leverage Hadoop ecosystem/Cloud platform to design and develop solutions using Kafka, Spark, Hive and relevant Big Data technologies

Worked with the Spark-SQL context to create data frames to filter input data for model execution.

Data Engineer

Intuit, Inc., Menlo Park, CA

Created a Kafka topic with one broker and one partition.

Created a producer to get data from twitter and have it stored in Kafka topic.

Created a PySpark consumer to get data from the Kafka topic.

Created a spark streaming job and spark processing to extract wanted fields from data.

Imported time period logs to HDFS mistreatment writer.

Designed acceptable partitioning/bucketing schema to permit quicker knowledge retrieval throughout analysis mistreatment Spark.

Developed MapReduce programs to parse the raw data, populate staging tables, and store the refined data in partitioned tables in the EDW.

Used SparkSQL module to store data into HDFS

Used Spark DataFrame API over Hortonworks platform to perform analytics on data.

Proficient in writing complex queries to the Apache Hive API on Hortonworks Sandbox

Collaborated with various teams & management to understand a set of requirements & design a complete system

Implemented a complete Big data pipeline with micro batches.

Education

Bachelor of Science: Information Technology

Amity University, New Delhi, India

06/2019- Current

01/2017- 06/2019

11/2015- 01/2017

09/2014- 11/2015

Contact this candidate