Deward Seneh
Data Engineer
***********.**@*****.***
5+ years of experience working as a Data Engineer, I have an extensive experience in designing and executing solutions for complex business problems involving large scale data warehousing, real-time analytics and reporting solutions. Also, I have vast experience in using cloud services with big data tools for building pipelines.
Skills
Hadoop Distributions
Hadoop, Cloudera Hadoop, Hortonworks Hadoop
Cloud Services
DataBricks, AWS
Operating Systems
Windows, Linux, and Unix.
ID Development Tools
Juptter Notbook, Eclipse, CA-API Gateway, VS Code, PyCharm, IntelliJ
Languages
Python, Scala, Spark, Spark Streaming Java, PL/SQL, C#, JavaScript
Scripting Languages
Hive, JavaScript, jQuery
Databases
SQL Server, Oracle 9i/10i/10g, MySQL, PostgresDB
Technologies
API Gateway, HTML5/CSS, Servlets, JSP, XML, JSON, CSV, JSTL, Soap, Web Services, WSDL., React
Work History
AWS Data Engineer
Liberty Media, Englewood, Colorado
Design a complete automation system for the data pipeline, from source to storage, implementing schedulers for automation.
Sourced data from data storage or data source using APIs calls provided by data source.
Created an injection phase to fetch data from source and have it stored temperately in zookeeper broker by creating an Apache Kafka topic partitions number as specified within task.
Using Apache Spark, streamed data from the Kafka topic.
Loaded screamed data into json and removed unwanted json object (properties).
Converted the data to DataFrame, transform DataFrame columns to required data types as needed for analysis and to match schema for data storage.
Save DataFrame output to MySQL data as requested.
Wrote python scripts to manage AWS resources from API calls using BOTO SDK and also worked with AWS CLI.
familiar with EC2, Cloud watch, Cloud Formation and managing security groups on AWS.
Experienced in Automating, Configuring and deploying instances on AWS, Azure environments and Data centers, also
Monitor logs for better understanding the functioning of the system.
Monitoring resources, such as Amazon DB Services, CPU Memory, EBS volumes.
Practical usage of Amazon Glacier for archiving data.
Technical knowledge on EC2, IAM, S3, VPC.
formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre
Write producer /consumer scripts to process JSON response in python
Hadoop Engineer
Traveler's Insurance, Hartford, CT
Wrote shell scripts for automating the process of data loading
Performed streaming data ingestion to the Spark distribution environment, using Kafka.
Proficient in writing complex queries the API into Apache Hive on Hortonworks Sandbox
Formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre to parse the JSON
Write producer /consumer scripts to process JSON response in python
Developed distributed query agents for performing distributed queries against shards
Experience with hands-on data analysis and performing under pressure.
Proficient experience in writing Queries, Stored procedures, Functions, and Triggers by using SQL. Support development, testing, and operations teams during new system deployments.
Implemented Spark using Scala and utilized Data Frames and Spark SQL API for faster processing of data.
Created Hive queries to summarizing and aggregating business queries by comparing Hadoop data with historical metrics
Worked closely with stakeholders and data scientists/data analysts to gather requirements and create an engineering project plan.
ETL to Hadoop file system (HDFS) and wrote HIVE UDFs.
Hive for queries and incremental imports with Spark and Spark jobs
Developed DBC//ODBC connectors between Hive and Spark for the transfer of the newly populated data frame
Writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language
Big Data Engineer
NCR, Atlanta, GA
Ingested data from an online Rest API in real-time.
Created a Kafka topic and have data sourced into the topic.
Using PySpark, created a producer and have real-time data ingested into the Kafka topic.
Created a PySpark consumer to subscribe to the Kafka topic to obtain data in real-time.
Created Spark-Streaming for data processing.
Integrate Hive, HDFS and spark for data storage.
Created a table in Hive based on desired output from spark streaming.
Conducted data processing, extracted desired fields, and implemented Spark SQL aggregation queries to have data transformed as needed.
Save output to Hive table in HDFS.
Created Hive External tables and loaded the data into tables and query data using HQL.
Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Leverage Hadoop ecosystem/Cloud platform to design and develop solutions using Kafka, Spark, Hive and relevant Big Data technologies
Worked with the Spark-SQL context to create data frames to filter input data for model execution.
Data Engineer
Intuit, Inc., Menlo Park, CA
Created a Kafka topic with one broker and one partition.
Created a producer to get data from twitter and have it stored in Kafka topic.
Created a PySpark consumer to get data from the Kafka topic.
Created a spark streaming job and spark processing to extract wanted fields from data.
Imported time period logs to HDFS mistreatment writer.
Designed acceptable partitioning/bucketing schema to permit quicker knowledge retrieval throughout analysis mistreatment Spark.
Developed MapReduce programs to parse the raw data, populate staging tables, and store the refined data in partitioned tables in the EDW.
Used SparkSQL module to store data into HDFS
Used Spark DataFrame API over Hortonworks platform to perform analytics on data.
Proficient in writing complex queries to the Apache Hive API on Hortonworks Sandbox
Collaborated with various teams & management to understand a set of requirements & design a complete system
Implemented a complete Big data pipeline with micro batches.
Education
Bachelor of Science: Information Technology
Amity University, New Delhi, India
.
06/2019- Current
01/2017- 06/2019
11/2015- 01/2017
09/2014- 11/2015