•Areas of Exposure
Database SQL Server, MySQL, SAP, HANA and DynamoDB
AWS servies – glue, kinesis, Kafka
Data Modeling, Data Engineering
Tableau
Hive, Apache Spark
Python, PySpark
Extract, Transform, Load (ETL)
Office Access tools, Excel, Project, Problem Solving
Alteryx and Knime Analysis & Data Visualization Skills
Soft Skills
Profile Summary
Green Belt Project Certified from Honeywell with extensive experience of 7+ years in all phases of Big Data Engineering & project management life cycle.
oDistinction of generating more than 1 million per year in savings for the company with implemented projects on the marketing team during COVID -19.
Excellent academic credentials with Mechatronic Engineering from Instituto Tecnológico de Monterrey and Abo Akademi Finland along with working proficiency in Analytics, OLAP technologies, and more' experience in agile development methodologies.
Passionate about Big Data technologies and the delivery of effective solutions through creative problem solving with a track record of building large-scale systems using Big Data technologies.
Demonstrated excellence in:
oBusiness intelligence tools like Tableau, Alteryx and Knime.
oStrong Skills in Databases DynamoDB, MySQL, HANA and Cloud Storage.
oAWS Services
oData Integration Mining
oProgramming in Pyspark, Python, SAP API, SQL
oProject Techniques 6SIGMA, KAIZEN, SCRUM, KANBAN
Extensive experience with SQL and NoSQL solutions as well as Cassandra, HIVE,Redshift, Snowflake, MySQL,SAP, Postgres, Microsoft SQL Server.
Proficiency in data ingestion,processing, and manipulation both in on-premise and cloud platform across various sectors like healthcare, information technology, or finance using AWS and Azure.
Understanding of the Hadoop Architecture and its ecosystem such as HDFS, YARN, MapReduce, Sqoop, Avro, Spark, Hive,Kafka, and Zookeeper.
HONEYWELL SPS, CHARLOTTE,NC (REMOTE)
Sr.Data Engineer (Mar 2020- Present)
Built a variety Big Data platforms to explicitly address the needs of health, respiratory and personal protective equipments operating in the highly regulated markets of the US, Latin America and the Canada. Working in hand with different stakeholders to manage marketing data and transactions.
Designed and implemented the migration from on Premises to cloud Stage 3:
oImplemented Spark application to push parquet files into AWS S3 Buckets.
oDeveloped a Snowflake warehouse for the Central Repository
oCreated Glue jobs using Jupyter Notebooks and Pyspark to process data into Snowflake.
oUsed Lambda Functions to trigger Glue jobs when data files where stored into s3 Buckets.
oCreate Spark application using AWS EMR to process the data and store it into Hive and S3 Buckets.
oUsed Lambda Functions to trigger a submit to EMR cluster into Hive.
oCreating tables using Glue from s3 and using Athenas to query data.
oUsed SNS to send messages to track the jobs
Designing all business intelligence boards of the company, as well as decision making based on them and provide solutions to achieve the company's objectives.
Responsible for all logistics in the company, strategic planning of resources and inventory replenishment.
Implemented projects generating more than 1 million per year in savings for the company with implemented the price desk project in different business inside honeywell
Designed all the solutions for new clients and projects, led many improvement projects, and developed many processes, including software development, planning, warehouse, call center, and pricing team.
Implemented new technologies for the health sector, including automated pharmacies, software, and interfaces between equipment.
Designed and implemented the company's new lines of business, including software development using SCRUM methodology.
Automated the collection of information from all company processes in databases for analysis, including MySQL, SQL Server, SalesForce, HANA, and Kinesis.
Work Experience
HONEYWELL PMT
Data Engineer (Mar 2018– Feb 2020)
Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.
Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.
Created data ingestion framework for multiple source systems using PySpark.
Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
Provided connections to different Business Intelligence tools to the tables in the data warehouse such as Tableau and Power BI.
Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Created data pipelines using components in Amazon AWS computing like EC2, RDS, GLUE, Redshift, AWS Lambda, Step Functions, Kinesis, Sage maker and Dynamo DB.
Implemented Kafka common consumer for storing published payment lifecycle events in Apache Cassandra.
Wrote scripts using PySpark and Unix Shell Scripting.
Worked on Airflow scheduler to schedule spark job using spark operator.
Developed PySpark scripts for ingestion of structured and unstructured data.
Modeled data based on business requirements using PySpark.
Wrote streaming data into Cassandra tables with spark structured streaming.
Wrote Bash script to gather cluster information for Spark submits.
Wrote partitioned data into Hive external tables using Parquet format.
Worked on the job creation and schedule jobs in AWS Databricks.
HONEYWELL - HUSTON (REMOTE)
Data Engineer(Mar 2017– Mar 2018)
Lambda functions on AWS for event-driven pipelines consuming from, the HANNA API
Defined the Spark/Python (PySpark) ETL framework and best practices for development.
Created structured data from unstructured data using Spark.
Processed and load the data to the Snowflake data warehouse.
Applied advanced procedures such as text analytics and processing using in-memory computing capabilities such as Apache Spark with PySpark.
Used Spark RDDs and PySpark to convert Hive/SQL queries into Spark transformations.
Documented requirements, including the available code which should be implemented using Spark.
Established a continuous discretized DStream of data with a high level of abstraction with Spark Structured Steaming.
Analyzed and tuned the Snowflake data model for multiple internal projects and worked with analysts to model tables from business rules and enhance/optimize existing tables.
Migrating data stored in old legacy systems that used batch ETL jobs to load data to streaming pipelines that stream real-time events and store them in Snowflake.
Implemented Spark application to push parquet files into AWS S3 Buckets
Reviewed functional and non-functional requirements on the AWS.project and collaborated with stakeholders and various cross-functional teams.
Moved transformed data to Spark cluster where the data is set to go live on the application using AWS.
Used Lambda Functions to trigger Glue jobs when data files were stored in s3 Buckets.
Versioned with Git and used Jenkins CI to manage CI/CD practices.
Implemented CI/CD tools Upgrade, Backup, and Restore.
GM- MOTORS
Data Analyst (Mar 2016 – Feb 2017)
Configured tables and supported database operations.
Utilize Azure Data Factory to extract, transform, and load data from different sources into a centralized data warehouse, making it easier to perform data analysis and generate insights.
Deploy Azure Databricks to perform advanced analytics on GM's customer data, including predictive modeling and machine learning algorithms, to predict customer behavior and preferences.
Utilize Azure Event Hubs to ingest and process data from multiple sources, including website visits, social media interactions, and customer support interactions, to gain a comprehensive understanding of GM's customers.
Use Azure Synapse Analytics to build a centralized data platform that combines data warehousing, big data, and data integration services, making it easier to analyze large volumes of data and generate insights.
Implement Azure Stream Analytics to process real-time customer data and provide personalized recommendations based on their behavior and preferences.
Implemented a mail alert mechanism for alerting users when their selection criteria were met.
Conducted unit and systems integration tests to ensure the system functioned to specification.
Established communications interfacing between the software program and the database backend.
Programmed a range of functions (e.g., automate logistical tracking and analysis, automate schematic measurements, etc.).
Developed client-side testing/validation using JavaScript.
Worked hands-on with technologies such as python, XML, Java, and JavaScript.
LOREAL-MX
Data Analyst (Jan 2015 – Feb 2016)
Using SalesForce database, SAP, Tableau, and knime data structure to determine the best price for special customers and approve the pricing deviation requests, rebates, etc.
SalesForce automation reporting.
Negotiating with Sales Rep. in order to have an increase in the quotations having an immediate income impact.
Excel macros and SalesForce dashboards development.
Green Belt Six Sigma course.
Implementation of a KAIZEN project in the continuous development area for customer experience.
Creating Tableau dashboards with an excel database for warranties in the retail team, achieving a better understanding of the customer experience regarding the returns of the product.
Replacing the Credits & Returns excel dashboard for an SAP HANA- Tableau automatic version.
Excel training to other teams.
Education
Certifications
Mechatronic Engineer from Instituto Tecnológico de Monterrey
Innovation in Engineer from Abo Akademi University
Multiple Big data course from Udemy
Certified Advanced Tableau Desktop Tableau
Green Belt 6Sigma
Omar Gonzalez de Santiago
SR.DATA ENGINEER
Contact: +1-470-***-****
Email: *************@*****.***
Collaborator
Communicator
Planner
Change Agent
Learner
Thinker
technical skills
IDE: Jupyter Notebooks, DataBricks, IntelliJ
Project Methods: Agile, Kanban, Scrum,, Continuous Integration, Test-Driven Development
Hadoop Distributions: Cloudera, Hortonworks, EMR
Cloud Database & Tools: AWS S3 Buckets, AWS Glue, AWS Lambda, AWS Athenas, AWS EMR, AWS Redshift, SnowFlake, Very Strong SQL Skills, GPC Big Quey, VPC, IAM Roles, Airflow, Kafka, Sqoop
Programming Languages: Spark, Spark Streaming, Java, Python, Scala
Scripting: Shell Scripting
Continuous Integration (CI-CD): Jenkins, GitHub
File Systems: HDFS, CSV, JSON, Avro, Parquet, ORC
JAN 2013 - DEC 2013
SINCE MAR 2015
JAN 2014 - FEB 2015
JAN 2011 - DEC 2012
Thermofishers Scientific Inc. Indianapolis, IN
PiSA BioPHARM, Mexico
Caterpillar Torreon, Coah
Aflac, Colombus, GA.