Data Engineer

Location:

United States

Posted:

February 04, 2021

Contact this candidate

Resume:

ERIC FERRIER

Atlanta, GA

***********@*****.***

Phone Number: 781-***-****

PROFESSIONAL SUMMARY

Skilled in databases, data management, analytics, data processing, data cleanings, data modeling and data driven projects including Online Transaction Processing.

Skilled in Architecture of Big Data Systems, ETL Pipelines, real time analytic systems including Machine Learning algorithms, slicing / dicing OLTP Cubes and drilling tabular models.

Proficient in various distributions such as Hadoop Apache ecosystems, Microsoft Azure and Spark Databricks; working knowledge of AWS and Hortonworks-Cloudera

Expert in bucketing, partitioning, multi-threading computing and streaming (Python, PySpark)

Expert in Performance Optimization and Query Tuning (MS SQL)

Adept at Project Management methodologies such as Waterfalls Rational Rose or Scrum / Agile / Sprint, Epics/Stories with a good knowledge of SOLID Patterns, working knowledge of technical reports and documentation.

Proficient web developer using various frameworks including Node.JS, Node Express / Vue.Ja and NET Core.

EDUCATION

Master of Science

Enterprise System Management Business Intelligence

Golden Gate University

San Francisco, CA

SKILLS

APACHE

Apache Ant, Apache Flume, Apache Hadoop, Apace Oozie – Sqoop, HDFS, Apache YARN, Presto, Apache Hive, Apache HBase, Apache Kafka, Apache Spark, Apache Airflow, Apache Zookeeper, Cassandra, Cloudera-Hortonworks, MapR, MapReduce Python, NIFI ETL, Apache JMeter, NGInx

MICROSOFT / AWS

MS SQL, MS SSIS / DQS (Data tools), SSAS Tabular, OLAP, Azure Synapse, Azure Cosmos DB Emulator, Azure Databricks, AWS RedShift, AWS EMR, AWS DynamoDB, AWS EC2

SCRIPTING

PySpark, Spark SQL, HiveQL, MapReduce Python, MLib, Python Scikit-learn, R-RevoScaleR, SQLDax

Python Anaconda, Flask - REST API – Node.js / Vue.Js, Net C# 4.5 - Core 3.0 Web Authentication API, XML/XLST, UNIX, Shell scripting, LINUX, FTP, SSH

OPERATING SYSTEMS

Unix/Linux, Docker

Windows 10, Ubuntu, Apple OS

FILE FORMATS

Parquet, Avro & JSON, ORC, text, csv, XML, SOAP

DISTRIBUTIONS

Cloudera-Hortonworks, Cloudera CDH 4/5

HDP 2.5/2.6, Azure Insights, EMR Amazon Web Services

DATA PROCESSING (COMPUTE) ENGINES

Apache Spark, Spark Streaming, Flume, Kafka, Squoop, Pentaho Data, Azure Databrick, AWS Kinesis

DATA VISUALIZATION TOOLS

MS SSRS, PowerBI, Pentaho CE, QlikView, Matplotlib, Falcon Client Ploty, Streamlit, Zeppelin

DATABASES

Microsoft SQL Server Database, MySQL, PostgreSQL 9.5, Amazon Redshift, DynamoDB, Presto SQL engine, Apache Cassandra, Apache Hive, NoSql MongoDB, RDS Database normal forms and data warehouse models including tabular and start flake models, slow changing dimensions and bus matrix (Kimball) architecture.

SOFTWARE

Oracle Vbox, Eclipse 2019-12, Apache Airflow, Workbench DBeaver, Falcon SQL Client, PyCharm, RStudio, Visual Code, Azure Studio, Dax Studio, Microsoft Visual Studio, Excel, QlikView, PowerBi, SSRS, Acunetix XSS, Microsoft Project, Microsoft Word, Power Point, Git/Trello, Slack, Rational Rose XDE and technical documentation.

WORK HISTORY

Data Engineer • TripAdvisor

Needham, MA • February 2018 to Current

Experienced in Automating, Configuring and deploying instances on AWS, Azure environments and Data centers, also familiar with EC2, Cloud watch, Cloud Formation, and managing security groups on AWS.

Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.

Extensively used Hive optimization techniques like partitioning, bucketing, Map Join, and parallel execution.

Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.

Maintenance and design of software installation shell scripts.

Configuring a cluster of 10 Nodes for processing live information

Developed maintenance and software installation scripts.

Design extraction of data from different databases and scheduling Oozie workflows to execute daily tasks.

Developed distributed query agents for performing distributed queries against Hive

Load the data from different sources such as HDFS or HBase into Spark data frames and implement in-memory data computation to generate the output response.

Monitoring resources, such as Amazon DB, CPU Memory, using cloud watch.

Collaborated on a Hadoop cluster (CDH) and reviewed log files of all daemons.

Used Spark SQL to realize quicker results compared to Hive throughout information Analysis.

Created Hive external tables and designed information models in hive

Developed multiple Spark Streaming and batch Spark jobs using Python

Cloud Data Engineer • Disney

Orlando, FL • November 2016 to February 2018

Writing Hive Queries for analyzing data in Hive warehouse using HUE

Evaluate and propose new tools and technologies to meet the needs of the organization.

Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node

Developed multiple Spark Streaming and batch Spark jobs and Python on AWS

Worked with Amazon AWS IAM console to create custom users and groups.

Learned many technologies on the job as per the requirement of the project.

Developer communication standards,

Configured Spark and Spark SQL for faster testing and processing of data living in HDFS.

Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation.

Formatted JSON into data frames using schema StructTypes

Implemented a query parser, query planner, query optimizer to override the native query execution of Hive using replicated logs combined with indexes, supporting full relational SQL queries, including joins.

Transferred data between a Hadoop ecosystem and structured data storage in an RDBMS MySQL using Sqoop.

Proficient in writing complex queries into Apache Hive on Hortonworks

Loading data from servers to AWS S3 bucket

Configured bucket permissions and bucket policies.

Utilized Spark with AWS EMR for data pipeline automation.

Expertise in AWS data migration between different database platforms like Local SQL Server to Amazon RDS and EMR Hive.

Hadoop Engineer • Coinbase

San Francisco, CA • June 2014 to November 2016

Installed and configured a full Kafka cluster, Topics and Replicas

Created and managed Topic creation inside Kafka

Installed and configured replication factor partitions

Communicated and managed consumer groups over Kafka

Wrote python scripts to receive requests from REST Based API’s, through the request libraries and serve to Kafka producer

Performed ETL to Hadoop file system (HDFS) and wrote Hive UDFs.

Ingested information from spark Data Frames over HBase

Performed aggregation and windowing functions with SQL

Migrated Spark applications from Map Reduce to improve performance

Created a benchmark between Cassandra and Hbase for fast ingestion

Processed Terabytes of information on real time using spark streaming

Created and managed code reviews

Collaborated on Spring planning and sprint grooming

Developed unit tests to evaluate test functionality of spark applications

Developed scripts for collecting high-frequency log data from various sources and integrating it into HDFS using Flume; staging data in HDFS for further analysis.

Write producer /consumer scripts to process JSON response in python

Wrote streaming applications with Spark Streaming/Kafks.

Developed DBC/ODBC connectors between Hive and Spark for the transfer of the newly populated data frames from MSSQL

Built Hive views on top of source data tables

Built a secured provisioning Hive Metastore

Involved in loading data from the UNIX file system to HDFS.

Data Engineer • Verizon

Los Angeles, CA • September 2011 to June 2014

Participated in planning meetings and assisted with documentation and communication.

Worked on moving some on-prem data repositories to cloud using Amazon AWS to make use of reduced cost as well as scalability.

Implemented all SCD types using server and parallel jobs. Extensively implemented error handling concepts, testing, debugging skills and performance tuning of targets, source, transformation logics and version control to promote the jobs.

Involved in loading and transforming large sets of structured, semi-structured and unstructured data.

Involved in loading data from UNIX file system to HDFS.

Developed ETLs to pull data from various sources and transform it for reporting applications using PL/SQL

Hands-on experience extracting data from different databases and scheduling Oozie workflows to execute the task daily.

Used Sqoop to expeditiously transfer information between information data bases and HDFS and used Flume to stream the log data from servers. Successfully loaded files to Hive and HDFS from Oracle, SQL Server using SQOOP.

Captured data and importing it to HDFS using Flume and Kafka for semi-structured data and Sqoop for existing relational databases.

Used Zookeeper for providing coordinating services to the cluster.

Used Oozie hardware system to alter the pipeline advancement and execute jobs in a timely manner.

Moving information from Oracle to HDFS and vice-versa.

Collected and aggregative giant amounts of log information exploitation using Apache Flume and staging information in HDFS for additional analysis.

CERTIFICATIONS

EDUCATION

Microsoft Certified Solutions Expert, MCSE Data Management & Analytics

Udemy Tech Academy Big Data & AWS Analytics

Microsoft MCSA DataWarehouse SSIS 2016

MIT Professional Education, Machine Learning

Microsoft Developer, MCSA SQL 2016

Graduate Certificate in Geographic Information System & Technology, University of Southern California

Contact this candidate