BIG DATA ENGINEER
IT Professional with 8 years of Experience in Big Data Development along with Data administration; Passionate about Big Data technologies and the delivery of effective solutions through creative problem solving with a track record of building large-scale systems using Big Data technologies.
Excellent academic credentials with Bachelor of Mechatronics Engineering from Mexico along with working proficiency in using Hadoop clusters, Hadoop HDFS, Hadoop tools and Spark, Kafka, and Hive
Experienced with Big Data link technologies such as Amazon Warehouse Services (AWS), Microsoft Azure, cloud services (Amazon S3, EMR, EC2, Glue, SNS, Cloud watch), Cassandra, HIVE, No-SQL databases (like HBase, MongoDB), SQL databases (like Oracle, SQL, Postgres SQL, My SQL server, Snowflake).
Familiar with Software Development Life Cycle with experience in preparing and executing Unit Test Plan and Unit Test Cases after software development
Knowledge of performance-tuning data-heavy dashboards and reports for optimization using various options such as Extracts, Context filters, writing efficient calculations, Data source filters, Indexing, and Partitioning in the data source.
Brilliant in providing clear and effective testing procedures and systems to ensure an efficient and effective process. Clearly documents big data systems, procedures, governance and policies.
Cloud Data Pipelines
OS: Microsoft Windows, Microsoft Windows Server, Linux, Mac
Database Skills: Database partitioning, database optimization, building communication channels between structured and unstructured databases.
Data Stores: Data Lake, Data Warehouse, SQL Database, RDBMS, NoSQL Database, Amazon Redshift, Apache Cassandra, MongoDB, SQL, MySQL, Oracle
Data Pipelines/ETL: Flume, Apache Storm, Apache Spark, Nifi, Apache Kafka
Databases: Cassandra, Hbase, Redshift, DynamoDB, MongoDB, MS Access, SQL, MySQL, Oracle, PL/SQL, Postgres SQL, RDBMS
HADOOP DISTRIBUTIONS: Apache Hadoop, Cloudera Hadoop, Hortonworks Hadoop
CLOUD PLATFORMS: Amazon AWS - EC2, SQS, S3, Elastic Cloud, Microsoft Azure, GCP
IDE: Jupyter Notebooks (formerly iPython Notebooks), Eclipse, IntelliJ, PyCharm
AWS: AWS Lambda, AWS S3, AWS RDS, AWS EMR, AWS Redshift, AWS Kinesis, AWS ELK, AWS
ROCKWELL AUTOMATION, Milwaukee, WI January 2020 – Current
AWS Data Engineer
Working on many projects involving predictive maintenance in different industries such as pharmaceutical, automotive, and mass consumer products. Also helping other countries with their projects such as Colombia, Brazil, USA, and Europe.
Worked with a Big Data dev team that created a solution to allow you to join 2 or more tables from a trusted bucket and save the result in a new table into an enhanced-trusted bucket, query it using Dremio or RapidSQL, and save the data lineage in Collibra.
Implemented AWS EMR Spark using PySpark/Scala and utilized DataFrames and SparkSQL API for faster processing of data.
Registered datasets to AWS Glue using AWS Glue crawlers and the Glue data catalog.
Used AWS API Gateway to Trigger Lambda functions.
Queried with Athena on data residing in AWS S3 bucket.
Utilized AWS Step function to run a data pipelines in AWS.
Used DynamoDB to store metadata and logs.
Monitored and managed services with AWS CloudWatch.
Performed transformations using Apache SparkSQL in AWS Glue.
Wrote Spark applications for data validation, cleansing, transformation, and customed aggregation in Scala.
Developed Spark code using Python/Scala and Spark-SQL for faster testing and data processing.
Tuned Spark to increase job's performance.
Used Athena to query data from AWS S3.
Used Dremio as Query engine for faster Joins and complex queries over AWS S3 bucket using Dremio data reflections.
Wrote and upgraded Lambda function written in Python2 to Python3.
Conducted testing using PyCharm and PyTest functions upgraded from Python2 to Python3.
Loaded data into AWS Snowflake as Data warehouse.
Worked on POCs to ETL with S3, EMR(Spark) and Snowflake.
Worked as part of the Big Data Engineering team and worked on pipeline creation activities in the Azure environment.
Used Apache Airflow to schedule jobs and monitor.
Used Talend to transfer dataset to S3 bucket as POC.
Used Datasync to move data from NAS(EC2) storage to S3 bucket.
FORD MOTOR COMPANY, Dearborn, MI Jan 2019 – Dec 2019
Creating a Pipeline from a Rest API and Store the data into MySQL Database.
Ingesting Data using Apache Flume and Store the data into HDFS as JSON format.
Configured Flume Agent with Rest API Keys to ingest data
Used Apache Spark, spark-streaming to perform transformations
Configured a JDBC driver in PySpark to connect MySQL Database
Wrote incremental imports into Hive tables.
Created Dataframe to store the processed results in a tabular format.
Installed Airflow engine to run multiple spark Jobs.
Wrote Spark SQL queries and optimized the Spark queries with Spark SQL.
ETL to Hadoop file system (HDFS) and wrote Spark UDFs.
Established data consumption from information stored on Google Cloud.
Experienced in importing real-time logs to HDFS using Flume.
Created UNIX shell scripts to automate build process, and perform regular jobs like file transfers.
Hortonworks used for installation of Hortonwork Cluster and performance monitoring.
Made incremental imports to Hive with Sqoop.
Managed Hadoop clusters and check the status of clusters using Ambari.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Initiated data migration from/to traditional RDBMS with Apache Sqoop.
Developed scripts to automate the workflow processes and generate reports
SERVIPERF S.A. C.V. / ATLANTA INTERNATIONAL AIRPORT, REMOTE MAR 2017 – DEC 2018
Created UNIX shell scripts to automate the build process, and to perform regular jobs like file transfers.
Wrote shell scripts to automate workflows to pull data from various databases into FS.
Loaded data into a shared file system using terminal.
Performed upgrades, patches and bug fixes in Hadoop in a cluster environment.
Wrote shell scripts for automating the process of data loading
Created test cases during two-week sprints using agile methodology
Designed data visualization to present current impact and growth using Power BI
Developed a Java program to clean up data streams from server logs
Worked with a Java Messaging system to deliver logs
Monitored multiple SQL servers to improve query performance
Experience in configuring, installing and managing Hadoop binaries (PoC).
HADOOP DEVELOPER / METLIFE INSURANCE COMPANY, NEW YORK, NY MAR 2015 – FEB 2017
Hadoop Data Engineer
Worked with technology and business groups for Hadoop migration strategy.
Created map-reduce jobs for data processing using JAVA as a programming language.
Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
Used and validated Hadoop Infrastructure and data center planning considering data growth.
Transferred data to and from clusters, using Sqoop and various storage media such as flat files.
Developed MapReduce programs and Hive queries to analyze sales pattern and customer satisfaction index over the data present in various relational database tables.
Orchestrated hundreds of Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub-workflows
Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the MapReduce jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.
Followed Agile methodology for the entire project.
Bachelor of Mechatronic Engineering