Sign in


Milwaukee, WI
April 04, 2023

Contact this candidate


Gabriel Corral


Phone: 414-***-****


Professional Profile

IT Professional with 8 years of Experience in Big Data Development along with Data administration; Passionate about Big Data technologies and the delivery of effective solutions through creative problem solving with a track record of building large-scale systems using Big Data technologies.

Excellent academic credentials with Bachelor of Mechatronics Engineering from Mexico along with working proficiency in using Hadoop clusters, Hadoop HDFS, Hadoop tools and Spark, Kafka, and Hive

Experienced with Big Data link technologies such as Amazon Warehouse Services (AWS), Microsoft Azure, cloud services (Amazon S3, EMR, EC2, Glue, SNS, Cloud watch), Cassandra, HIVE, No-SQL databases (like HBase, MongoDB), SQL databases (like Oracle, SQL, Postgres SQL, My SQL server, Snowflake).

Familiar with Software Development Life Cycle with experience in preparing and executing Unit Test Plan and Unit Test Cases after software development

Knowledge of performance-tuning data-heavy dashboards and reports for optimization using various options such as Extracts, Context filters, writing efficient calculations, Data source filters, Indexing, and Partitioning in the data source.

Brilliant in providing clear and effective testing procedures and systems to ensure an efficient and effective process. Clearly documents big data systems, procedures, governance and policies.


Data Ingestion

Distributed Processing

Data Migration

Data Processing

Cloud Data Pipelines

Data Analysis

Technical Skills

OS: Microsoft Windows, Microsoft Windows Server, Linux, Mac

Programming & Scripting: Spark, Python, Java, Scala, Hive, Kafka, SQL, HTML, CSS, JavaScript

Database Skills: Database partitioning, database optimization, building communication channels between structured and unstructured databases.

Data Stores: Data Lake, Data Warehouse, SQL Database, RDBMS, NoSQL Database, Amazon Redshift, Apache Cassandra, MongoDB, SQL, MySQL, Oracle

Data Pipelines/ETL: Flume, Apache Storm, Apache Spark, Nifi, Apache Kafka

Databases: Cassandra, Hbase, Redshift, DynamoDB, MongoDB, MS Access, SQL, MySQL, Oracle, PL/SQL, Postgres SQL, RDBMS

HADOOP DISTRIBUTIONS: Apache Hadoop, Cloudera Hadoop, Hortonworks Hadoop

CLOUD PLATFORMS: Amazon AWS - EC2, SQS, S3, Elastic Cloud, Microsoft Azure, GCP

IDE: Jupyter Notebooks (formerly iPython Notebooks), Eclipse, IntelliJ, PyCharm

AWS: AWS Lambda, AWS S3, AWS RDS, AWS EMR, AWS Redshift, AWS Kinesis, AWS ELK, AWS

Professional Experience

ROCKWELL AUTOMATION, Milwaukee, WI January 2020 – Current

AWS Data Engineer

Working on many projects involving predictive maintenance in different industries such as pharmaceutical, automotive, and mass consumer products. Also helping other countries with their projects such as Colombia, Brazil, USA, and Europe.

Worked with a Big Data dev team that created a solution to allow you to join 2 or more tables from a trusted bucket and save the result in a new table into an enhanced-trusted bucket, query it using Dremio or RapidSQL, and save the data lineage in Collibra.

Implemented AWS EMR Spark using PySpark/Scala and utilized DataFrames and SparkSQL API for faster processing of data.

Registered datasets to AWS Glue using AWS Glue crawlers and the Glue data catalog.

Used AWS API Gateway to Trigger Lambda functions.

Queried with Athena on data residing in AWS S3 bucket.

Utilized AWS Step function to run a data pipelines in AWS.

Used DynamoDB to store metadata and logs.

Monitored and managed services with AWS CloudWatch.

Performed transformations using Apache SparkSQL in AWS Glue.

Wrote Spark applications for data validation, cleansing, transformation, and customed aggregation in Scala.

Developed Spark code using Python/Scala and Spark-SQL for faster testing and data processing.

Tuned Spark to increase job's performance.

Used Athena to query data from AWS S3.

Used Dremio as Query engine for faster Joins and complex queries over AWS S3 bucket using Dremio data reflections.

Wrote and upgraded Lambda function written in Python2 to Python3.

Conducted testing using PyCharm and PyTest functions upgraded from Python2 to Python3.

Loaded data into AWS Snowflake as Data warehouse.

Worked on POCs to ETL with S3, EMR(Spark) and Snowflake.

Worked as part of the Big Data Engineering team and worked on pipeline creation activities in the Azure environment.

Used Apache Airflow to schedule jobs and monitor.

Used Talend to transfer dataset to S3 bucket as POC.

Used Datasync to move data from NAS(EC2) storage to S3 bucket.

FORD MOTOR COMPANY, Dearborn, MI Jan 2019 – Dec 2019

Data Engineer

Creating a Pipeline from a Rest API and Store the data into MySQL Database.

Ingesting Data using Apache Flume and Store the data into HDFS as JSON format.

Configured Flume Agent with Rest API Keys to ingest data

Used Apache Spark, spark-streaming to perform transformations

Configured a JDBC driver in PySpark to connect MySQL Database

Wrote incremental imports into Hive tables.

Created Dataframe to store the processed results in a tabular format.

Installed Airflow engine to run multiple spark Jobs.

Wrote Spark SQL queries and optimized the Spark queries with Spark SQL.

ETL to Hadoop file system (HDFS) and wrote Spark UDFs.

Established data consumption from information stored on Google Cloud.

Experienced in importing real-time logs to HDFS using Flume.

Created UNIX shell scripts to automate build process, and perform regular jobs like file transfers.

Hortonworks used for installation of Hortonwork Cluster and performance monitoring.

Made incremental imports to Hive with Sqoop.

Managed Hadoop clusters and check the status of clusters using Ambari.

Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.

Initiated data migration from/to traditional RDBMS with Apache Sqoop.

Developed scripts to automate the workflow processes and generate reports


Data Engineer

Created UNIX shell scripts to automate the build process, and to perform regular jobs like file transfers.

Wrote shell scripts to automate workflows to pull data from various databases into FS.

Loaded data into a shared file system using terminal.

Performed upgrades, patches and bug fixes in Hadoop in a cluster environment.

Wrote shell scripts for automating the process of data loading

Created test cases during two-week sprints using agile methodology

Designed data visualization to present current impact and growth using Power BI

Developed a Java program to clean up data streams from server logs

Worked with a Java Messaging system to deliver logs

Monitored multiple SQL servers to improve query performance

Experience in configuring, installing and managing Hadoop binaries (PoC).


Hadoop Data Engineer

Worked with technology and business groups for Hadoop migration strategy.

Created map-reduce jobs for data processing using JAVA as a programming language.

Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.

Used and validated Hadoop Infrastructure and data center planning considering data growth.

Transferred data to and from clusters, using Sqoop and various storage media such as flat files.

Developed MapReduce programs and Hive queries to analyze sales pattern and customer satisfaction index over the data present in various relational database tables.

Orchestrated hundreds of Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub-workflows

Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the MapReduce jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.

Followed Agile methodology for the entire project.


Bachelor of Mechatronic Engineering


Contact this candidate