Sr. Big Data ENgineer

Location:

United States

Posted:

May 31, 2023

Contact this candidate

Resume:

•Areas of Exposure

Database SQL Server, MySQL, SAP, HANA and DynamoDB

AWS servies – glue, kinesis, Kafka

Data Modeling, Data Engineering

Tableau

Hive, Apache Spark

Python, PySpark

Extract, Transform, Load (ETL)

Office Access tools, Excel, Project, Problem Solving

Alteryx and Knime Analysis & Data Visualization Skills

Soft Skills

Profile Summary

Green Belt Project Certified from Honeywell with extensive experience of 7+ years in all phases of Big Data Engineering & project management life cycle.

oDistinction of generating more than 1 million per year in savings for the company with implemented projects on the marketing team during COVID -19.

Excellent academic credentials with Mechatronic Engineering from Instituto Tecnológico de Monterrey and Abo Akademi Finland along with working proficiency in Analytics, OLAP technologies, and more' experience in agile development methodologies.

Passionate about Big Data technologies and the delivery of effective solutions through creative problem solving with a track record of building large-scale systems using Big Data technologies.

Demonstrated excellence in:

oBusiness intelligence tools like Tableau, Alteryx and Knime.

oStrong Skills in Databases DynamoDB, MySQL, HANA and Cloud Storage.

oAWS Services

oData Integration Mining

oProgramming in Pyspark, Python, SAP API, SQL

oProject Techniques 6SIGMA, KAIZEN, SCRUM, KANBAN

Extensive experience with SQL and NoSQL solutions as well as Cassandra, HIVE,Redshift, Snowflake, MySQL,SAP, Postgres, Microsoft SQL Server.

Proficiency in data ingestion,processing, and manipulation both in on-premise and cloud platform across various sectors like healthcare, information technology, or finance using AWS and Azure.

Understanding of the Hadoop Architecture and its ecosystem such as HDFS, YARN, MapReduce, Sqoop, Avro, Spark, Hive,Kafka, and Zookeeper.

HONEYWELL SPS, CHARLOTTE,NC (REMOTE)

Sr.Data Engineer (Mar 2020- Present)

Built a variety Big Data platforms to explicitly address the needs of health, respiratory and personal protective equipments operating in the highly regulated markets of the US, Latin America and the Canada. Working in hand with different stakeholders to manage marketing data and transactions.

Designed and implemented the migration from on Premises to cloud Stage 3:

oImplemented Spark application to push parquet files into AWS S3 Buckets.

oDeveloped a Snowflake warehouse for the Central Repository

oCreated Glue jobs using Jupyter Notebooks and Pyspark to process data into Snowflake.

oUsed Lambda Functions to trigger Glue jobs when data files where stored into s3 Buckets.

oCreate Spark application using AWS EMR to process the data and store it into Hive and S3 Buckets.

oUsed Lambda Functions to trigger a submit to EMR cluster into Hive.

oCreating tables using Glue from s3 and using Athenas to query data.

oUsed SNS to send messages to track the jobs

Designing all business intelligence boards of the company, as well as decision making based on them and provide solutions to achieve the company's objectives.

Responsible for all logistics in the company, strategic planning of resources and inventory replenishment.

Implemented projects generating more than 1 million per year in savings for the company with implemented the price desk project in different business inside honeywell

Designed all the solutions for new clients and projects, led many improvement projects, and developed many processes, including software development, planning, warehouse, call center, and pricing team.

Implemented new technologies for the health sector, including automated pharmacies, software, and interfaces between equipment.

Designed and implemented the company's new lines of business, including software development using SCRUM methodology.

Automated the collection of information from all company processes in databases for analysis, including MySQL, SQL Server, SalesForce, HANA, and Kinesis.

Work Experience

HONEYWELL PMT

Data Engineer (Mar 2018– Feb 2020)

Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.

Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.

Created data ingestion framework for multiple source systems using PySpark.

Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.

Provided connections to different Business Intelligence tools to the tables in the data warehouse such as Tableau and Power BI.

Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala.

Created data pipelines using components in Amazon AWS computing like EC2, RDS, GLUE, Redshift, AWS Lambda, Step Functions, Kinesis, Sage maker and Dynamo DB.

Implemented Kafka common consumer for storing published payment lifecycle events in Apache Cassandra.

Wrote scripts using PySpark and Unix Shell Scripting.

Worked on Airflow scheduler to schedule spark job using spark operator.

Developed PySpark scripts for ingestion of structured and unstructured data.

Modeled data based on business requirements using PySpark.

Wrote streaming data into Cassandra tables with spark structured streaming.

Wrote Bash script to gather cluster information for Spark submits.

Wrote partitioned data into Hive external tables using Parquet format.

Worked on the job creation and schedule jobs in AWS Databricks.

HONEYWELL - HUSTON (REMOTE)

Data Engineer(Mar 2017– Mar 2018)

Lambda functions on AWS for event-driven pipelines consuming from, the HANNA API

Defined the Spark/Python (PySpark) ETL framework and best practices for development.

Created structured data from unstructured data using Spark.

Processed and load the data to the Snowflake data warehouse.

Applied advanced procedures such as text analytics and processing using in-memory computing capabilities such as Apache Spark with PySpark.

Used Spark RDDs and PySpark to convert Hive/SQL queries into Spark transformations.

Documented requirements, including the available code which should be implemented using Spark.

Established a continuous discretized DStream of data with a high level of abstraction with Spark Structured Steaming.

Analyzed and tuned the Snowflake data model for multiple internal projects and worked with analysts to model tables from business rules and enhance/optimize existing tables.

Migrating data stored in old legacy systems that used batch ETL jobs to load data to streaming pipelines that stream real-time events and store them in Snowflake.

Implemented Spark application to push parquet files into AWS S3 Buckets

Reviewed functional and non-functional requirements on the AWS.project and collaborated with stakeholders and various cross-functional teams.

Moved transformed data to Spark cluster where the data is set to go live on the application using AWS.

Used Lambda Functions to trigger Glue jobs when data files were stored in s3 Buckets.

Versioned with Git and used Jenkins CI to manage CI/CD practices.

Implemented CI/CD tools Upgrade, Backup, and Restore.

GM- MOTORS

Data Analyst (Mar 2016 – Feb 2017)

Configured tables and supported database operations.

Utilize Azure Data Factory to extract, transform, and load data from different sources into a centralized data warehouse, making it easier to perform data analysis and generate insights.

Deploy Azure Databricks to perform advanced analytics on GM's customer data, including predictive modeling and machine learning algorithms, to predict customer behavior and preferences.

Utilize Azure Event Hubs to ingest and process data from multiple sources, including website visits, social media interactions, and customer support interactions, to gain a comprehensive understanding of GM's customers.

Use Azure Synapse Analytics to build a centralized data platform that combines data warehousing, big data, and data integration services, making it easier to analyze large volumes of data and generate insights.

Implement Azure Stream Analytics to process real-time customer data and provide personalized recommendations based on their behavior and preferences.

Implemented a mail alert mechanism for alerting users when their selection criteria were met.

Conducted unit and systems integration tests to ensure the system functioned to specification.

Established communications interfacing between the software program and the database backend.

Programmed a range of functions (e.g., automate logistical tracking and analysis, automate schematic measurements, etc.).

Developed client-side testing/validation using JavaScript.

Worked hands-on with technologies such as python, XML, Java, and JavaScript.

LOREAL-MX

Data Analyst (Jan 2015 – Feb 2016)

Using SalesForce database, SAP, Tableau, and knime data structure to determine the best price for special customers and approve the pricing deviation requests, rebates, etc.

SalesForce automation reporting.

Negotiating with Sales Rep. in order to have an increase in the quotations having an immediate income impact.

Excel macros and SalesForce dashboards development.

Green Belt Six Sigma course.

Implementation of a KAIZEN project in the continuous development area for customer experience.

Creating Tableau dashboards with an excel database for warranties in the retail team, achieving a better understanding of the customer experience regarding the returns of the product.

Replacing the Credits & Returns excel dashboard for an SAP HANA- Tableau automatic version.

Excel training to other teams.

Education

Certifications

Mechatronic Engineer from Instituto Tecnológico de Monterrey

Innovation in Engineer from Abo Akademi University

Multiple Big data course from Udemy

Certified Advanced Tableau Desktop Tableau

Green Belt 6Sigma

Omar Gonzalez de Santiago

SR.DATA ENGINEER

Contact: +1-470-***-****

Email: *************@*****.***

Collaborator

Communicator

Planner

Change Agent

Learner

Thinker

technical skills

IDE: Jupyter Notebooks, DataBricks, IntelliJ

Project Methods: Agile, Kanban, Scrum,, Continuous Integration, Test-Driven Development

Hadoop Distributions: Cloudera, Hortonworks, EMR

Cloud Database & Tools: AWS S3 Buckets, AWS Glue, AWS Lambda, AWS Athenas, AWS EMR, AWS Redshift, SnowFlake, Very Strong SQL Skills, GPC Big Quey, VPC, IAM Roles, Airflow, Kafka, Sqoop

Programming Languages: Spark, Spark Streaming, Java, Python, Scala

Scripting: Shell Scripting

Continuous Integration (CI-CD): Jenkins, GitHub

File Systems: HDFS, CSV, JSON, Avro, Parquet, ORC

JAN 2013 - DEC 2013

SINCE MAR 2015

JAN 2014 - FEB 2015

JAN 2011 - DEC 2012

Thermofishers Scientific Inc. Indianapolis, IN

PiSA BioPHARM, Mexico

Caterpillar Torreon, Coah

Aflac, Colombus, GA.

Contact this candidate