Sergio D. Munoz
Senior Data Engineer / Analyst
******.*********@*****.***
GreenVille, TX
SUMMARY
Experienced Senior Data Engineer with over 7 years of expertise in designing, building and optimizing scalable data pipelines across AWS, Azure and GCP platforms. Proficient in Python, SQL, PySpark and Databricks, with a strong focus on ETL processes, data modeling and real-time streaming. Adept at leveraging tools like DBT, Airflow and Kafka to deliver high-performance, cloud native solutions along with data quality and governance. SKILLS
Language: Python, C++, SQL, Scala, Linux Shell Scripting Data Technologies: Apache Spark, Hadoop, Databricks, Kafka, Apache Iceberg Cloud & Devops: Docker, Kubernetes, Linux / Unix, Bash, AWS services(Lambda, Step Function, EKS, S3, GLue, Kinesis, CloudWatch, etc), GCP(GKE, Cloud Function, BigQuery, etc), Azure(Event Hubs, Synapse, AKS), CI/CD, CircleCI, Datadog, Redis, Git, Terraform, Apache Airflow Tools &Libraries: OpenAI(LLM), Numpy, Pandas, PySpark, PyTorch, Power BI, Apache Superset WORK EXPERIENCE
Allocations - Miami, FL
Senior Data Engineer Aug 2024 - Present
● Launched and optimized scalable ETL pipeline on AWS using Lambda, S3 and Redshift to handle seamless integration of investor data and financial transactions into the platform and reduced data processing time by 30% throughout the overall data pipeline.
● Built and maintained high performance data models in AWS Redshift and leveraged Kafka for real time data streaming by manipulating investor onboarding data, capital flows and fund performance metrics to support SPV management.
● Implemented data governance and security protocols using Great Expectations for data lineage and AWS IAM for access control, ensuring GDPR and SEC compliance. Applied Cloudwatch and Datadog to monitor the pipeline to manage data privacy and stability and reduced system downtime by 15%.
● Developed ELT pipeline called “HappyETL” using Kafka, Airflow and modular Python libraries, slashing batch latency by 60% and cutting integration time by 30% while standardizing ETL templates to reduce code duplication by 40%.
● Implemented self service BI using Metabase, enabling 50+ business users to access analytics independently and reducing dashboard update times from hours to minutes by collaborating with executives on KPIs and reporting strategies.
● Architected serverless data processing solution using AWS Glue, Airflow, Lambda, Step Functions that transforms and loads data from various sources into a cloud service data lake on S3(Parquet format)
, which reduces operational costs by 40% and enhances the speed of data workflows.
● Mentored junior data engineers on best practices for data pipeline design and SQL optimization and conducted code reviews, knowledge sharing.
Astro Sirens - Austin, TX
Full Stack Data Engineer May 2020 - Jul 2024
● Built and optimized ETL pipelines using Azure Data Factory and Databricks for processing healthcare patient records and insurance claims data for real time reporting on claims and medical outcomes by following HIPAA compliance through secure data handling with Azure IAM and Azure Key Vault.
● Developed data models for the retail sector focusing on sales and customer transaction data to build a recommendation engine for personalized marketing by integrating data from multiple sources including POS systems and inventory management systems to provide actionable insight into product trends and stock levels.
● Managed ETL pipelines using AWS Glue, Lambda, S3and Redshift to guarantee smooth integration and transformation of large scale enterprise data while optimizing cost efficiency and system performance.
● Enhanced performance of complex PL/SQL codes including stored procedures, functions and triggers to support transactional and analytical workflows in a high volume relational database environment improving data processing speed by 20%.
● Led the integration of real-time streaming data from Kafka and Event Hubs into the Azure Data Lake
(ADLS) to control data ingestion, process structured/unstructured data to support AI/ML pipelines and business intelligence solutions.
● Spearheaded the development of data models and dimensional models (Star/Snowflake schema) for high-performance reporting and data analytics, contributing to accurate insights across multiple departments, including AI/ML, business intelligence and compliance.
● Optimized data workflows using Azure Data Factory and Databricks, delivering seamless data transformation processes and ensuring data consistency across multiple platforms, reducing pipeline execution time by 30%.
● Implemented robust data governance practices using Azure Purview to chase compliance with HIPAA and GDPR across all data operations, enhancing data security and maintaining regulatory standards.
● Setup a CI/CD pipeline for data workflows using GitHub Actions and Azure DevOps, automating the deployment of data transformation models and configuring a reliable, fault tolerant delivery process.
● Collaborated with backend teams to design REST APIs across multiple projects development using Flask/FastAPI for efficient API handling and integration with the data platform, improving operational efficiency by 35%.
● Engineered data pipelines for insurance data analytics focusing on fraud detection by processing large datasets of claims and policyholder information. Applied SQL and PySpark for data transformations to achieve the improvement of the detection rate of fraudulent claims by 25%. Meta - Fort Worth, TX
Data Engineer Oct 2017 - May 2020
● Engineered data pipelines for personalized product ranking by utilizing Apache Spark and TensorFlow integration, improving search and browse relevance, leading to a 12% uplift in user engagement.
● Built and deployed real-time product recommendation systems by leveraging Apache Kafka for event streaming and SQL transformations, confirming efficient and scalable data processing.
● Maintained robust data pipelines for fraud detection, collaborating with product team to implement ETL processes using XGBoost for anomaly detection and validating timely data integration.
● Built A/B testing pipelines for email campaign optimization, using Airflow for scheduling and orchestration, resulting in an 18% increase in conversion rates through streamlined data flows.
● Revamped feature extraction pipelines for multi-modal data including product descriptions, images and user clickstream logs using Apache Spark for large scale data processing and AWS S3 for storage.
● Developed and implemented data quality monitoring systems using Apache Airflow and DBT, automating the detection of upstream data issues and confirming reliable data pipelines.
● Collaborated with marketing teams to create scalable pipelines for customer lifetime value prediction, integrating data from multiple sources using AWS Glue and Redshift. IBM - Austin, TX
Business Intelligence Intern Jan 2017 - Jun 2017
● Designed Kimball data warehouse models in Oracle Database, enabling efficient storage and retrieval of financial data, improving query performance by 40%.
● Revamped the AML system for the company internal reporting system to optimize ETL time by 30%, reporting time by 20% and data warehouse size by 40%. EDUCATION
Bachelor of Science in Computer Science
( Jan 2011 - Aug 2015 )
University of Texas at El Paso
Master of Science in Computer Science
University of Texas at El Paso
( Aug 2015 - Aug 2017 )
CERTIFICATES
AWS Certified Solutions Architect - Amazon Web Services, 2022