Rey Amoodi
Lead Data Engineer Cloud & Big Data
Specialist Real-Time Pipeline Architect
*********@*****.*** Chicago, Illinois 60604
Profile
Experienced Lead Data Engineer with 6+ years in building scalable, cloud-native data platforms across AWS and GCP. Skilled in designing ETL/ELT pipelines using PySpark, Airflow, Snowflake, and Dagster, with hands-on experience in EMR, Kinesis, Athena, and BigǪuery. Adept at leading engineering teams, optimizing distributed systems, and delivering production-grade streaming and batch pipelines. Creator of a top-rated Udemy course on Big Data with PySpark and AWS, with a passion for automation, performance tuning, and innovation in data infrastructure. Skills
Programming & Tools: Python, PySpark, Scala, SǪL, Git, GitHub,Shell Scripting,Ǫuery Optimization
Cloud Platforms: WS (S3, EC2, RDS, Glue, Athena, EMR, Kinesis), GCP (BigǪuery, GCS), Databricks, AWS (Redshift,Lambda, Dataflow, Dataproc, Pub/Sub, Cloud Storage), Azure
Data Platforms: Snowflake, Redshift, Delta Lake, RDS, MPP Databases GDPR & HIPAA Compliance, PCI- DSS Controls, Security (Encryption, RBAC, Audit Logging) Data Architecture:
Data Mesh, Lakehouse (Delta Lake, Databricks), Snowflake, BigǪuery, Redshift, PostgreSǪL, Data Modeling (Star Schema, Data Vault), Data Contracts & SLAs ETL & Orchestration: Apache Flink, Kafka, Amazon Kinesis, Elasticsearch, OpenSearch
Modeling & BI: Data Architecture, Dimensional Modeling, Power BI CI/CD & DevOps: Docker, Kubernetes, Terraform, GitHub Actions, CI/CD Pipelines Optimization: Spark Tuning, Caching, Partitioning, Predicate Pushdown, Indexing Professional Experience
Lead Data Engineer
Self Employed/Contractor
04/2023 – Present
Designed and deployed scalable ETL pipelines using Airbyte, Dagster, dbt, and Snowflake to streamline ingestion and transformation. Architected a high-availability data platform using Spark, Kubernetes, and Airflow, improving job success rate and system resilience.
Led development of streaming pipelines with Amazon Kinesis and EMR to process real- time data from IoT and telemetry sources.
Implemented cost-optimized query layers using Athena and AWS Glue, enabling serverless analytics at scale.
Integrated BigǪuery for cross-cloud reporting and business intelligence across diverse datasets.
Mentored junior engineers, enforced code quality standards, and introduced infrastructure-as-code practices using Terraform. Lead Data Engineer
Verizon
06/2020 – 01/2023
Built and maintained ETL/ML pipelines across Snowflake, GCS, and BigǪuery using PySpark and Airflow to support demand forecasting. Led development of a custom pipeline to extract and structure email data from PST files using Codex APIs, Snowflake, and Delta Lake. Utilized EMR and Kinesis to implement near-real-time ingestion for support analytics and customer interaction modeling.
Designed Snowpipes and temporary views to simplify data access for analytics teams and reduce transformation time.
Partnered with cross-functional teams to deliver scalable, production-grade pipelines aligned with business objectives.
Data Engineer
Swoon Staffing
11/2019 – 06/2020
Developed a change data capture (CDC) pipeline to replicate data from AWS RDS to S3 and Snowflake, ensuring low-latency sync.
Built Spark streaming and batch processing jobs for COVID-19 analytics using Kinesis, EMR, Redshift, and Snowflake.
Improved pipeline efficiency through partitioning, broadcast joins, and schema management strategies.
Contributed to data architecture and governance efforts, including schema evolution, validation layers, and metadata tracking. Built and automated ETL pipelines in Python, SǪL, Airflow to integrate data from APIs, FTPs, and flat files into PostgreSǪL & Snowflake. Designed BI dashboards in Tableau & Looker for operational KPIs (inventory, supplier, sales), enabling near real-time reporting for finance and merchandising. Software Engineer
ArbiSoft
04/2019 – 10/2019
Created data scrapers using Python, Scrapy, and BeautifulSoup to collect structured data from 50+ websites across multiple domains. Developed a Django-based inventory system integrated with PySpark and RDS, handling real-time stock and invoice management.
Designed multi-language UI features and REST APIs for key inventory and procurement modules.
Key Projects
Course Creator – Big Data with PySpark & AWS
Developed and launched a comprehensive course on Udemy, now with 5,000+ students worldwide. The course includes real-world projects covering PySpark, AWS Glue, S3, Athena, and Databricks.
Data Platform Development
Led the architecture and implementation of a modern data platform leveraging Apache Spark, Snowflake, Airflow, and Kubernetes. Designed modular ETL components and scalable infrastructure, reducing processing time by 40% and supporting multi-tenant analytics at scale.
Email Analytics Pipeline
Built ETL pipelines to extract email data from PST files, apply Codex-based validations, and load structured outputs into Delta Lake, Snowflake, and Elasticsearch. Enabled analytics on customer support interactions and ticket resolution.
Time Series Forecasting Pipeline
Designed an end-to-end PySpark pipeline integrating BigǪuery, GCS, and Airflow to process historical sales data and apply machine learning models (FBProphet), achieving a 15% increase in forecast accuracy.
Education
Bachelor of Science in Computer Science