Senior Cloud Data Engineer - ETL/ELT & Analytics Solutions

Location:

Miami, FL

Salary:

$150000

Posted:

December 11, 2025

Contact this candidate

Resume:

PROFILE SUMMARY

WORK EXPERIENCE

EDUCATION

SKILLS

+1-656-***-****

*****.****@*****.***

Tampa, FL - 55543

Senior Data Engineer with 8 years of experience delivering large-scale, cloud-based data solutions across AWS, Azure, and GCP. Skilled in building high-performance ETL/ELT pipelines, real-time streaming solutions, and scalable data warehouses using Databricks, Spark, Snowflake, BigQuery, and Synapse Analytics. Experienced in modern data architectures including data lakes, Delta/Medallion frameworks, and distributed processing systems, with expertise in workflow automation using Airflow, ADF, and Cloud Composer. Proficient in applying robust data quality frameworks (dbt, Great Expectations, Dataplex), performance tuning (partitioning, clustering, caching, workload optimization), and implementing secure, compliant cloud migration projects. Adept at collaborating with cross-functional teams to deliver reliable, cost-efficient, and production- ready data platforms that power critical analytics for fraud detection, credit risk modeling, and real- time customer intelligence.

Senior Data Engineer Citibank, FL Feb 2025 - Present D. Y. Patil College of Engineering & Technology, KASABA BAWADA Senior Data Engineer Cambia Health, OR JAN 2023 - JAN 2025 Designed and implemented scalable data architectures using Databricks, AWS Glue, EMR, Lambda, Redshift, Step Functions, SNS, and DynamoDB.

Built and automated ETL workflows with AWS Glue and Lake Formation, integrating PySpark, Snowflake, and dbt for advanced data transformations. Optimized Snowflake queries and pipelines, reducing query runtimes and improving large-scale processing efficiency by 40%.

Developed real-time and batch data pipelines using GCP Dataproc, Dataflow, and BigQuery, enabling faster analytics and reporting across business units. Created and maintained Snowflake + dbt transformation frameworks, ensuring data quality, governance, and reusability.

Automated pipeline orchestration and monitoring with Airflow, improving reliability and reducing manual intervention.

Enhanced PySpark and Apache Beam workflows, improving performance and scalability for high- volume enterprise datasets.

Senior Data Engineer

SANDEEP MENON

University of Southern California 2023

2016

Big Data & Streaming: Apache Spark (Scala, Java, PySpark), Kafka, Hadoop, Delta Table, Apache Airflow

AWS: Glue, Redshift, EMR, Athena, Lambda, S3, Step Functions, Kinesis, DynamoDB, RDS, IAM, CloudWatch, CloudFormation.

GCP: BigQuery, Dataflow, Pub/Sub, Dataform, Data Fusion, Cloud Storage, Cloud Composer. Azure: ADF, Synapse, Databricks, Azure Functions, Cosmos DB, Azure Blob Storage, Azure Monitor

Snowflake (DBT, Snowpark, Streams & Tasks), Informatica, Talend, Matillion, SQL Server, PostgreSQL, MongoDB

Jenkins, Docker, Kubernetes, Terraform, Git, CI/CD Pipelines RBAC, IAM Policies, Encryption, Data Governance (GDPR, HIPAA), Vertex AI linkedin.com

CITIBANK, TAMPA, FL IMPL. THROUGH LTIMINDTREE

SENIOR DATA ENGINEER FEB 2025 - PRESENT

Key Responsibilities:

Developed and optimized ETL pipelines on Databricks with PySpark, Delta Tables, and AWS EMR/Glue, leveraging techniques like OPTIMIZE, Z-Ordering, and Deletion Vectors in Delta Lake to handle large datasets efficiently.

Implemented event-driven streaming solutions using Amazon Kinesis, Kafka (MSK), and AWS Lambda, enabling real-time ingestion and analytics with low latency. Built and maintained data warehousing solutions on Amazon Redshift and Snowflake, applying partitioning, clustering, and materialized views to manage petabyte-scale data and reduce query latency. Automated ETL workflows with AWS Glue bookmarks, crawlers, DMS, and Lambda functions for incremental loads, schema evolution, and cost-effective processing. Migrated data from on-prem systems (Oracle, SQL Server) into AWS Aurora, PostgreSQL, and Snowflake using AWS DMS, SCT, SSIS, and Python-based automation. Created ingestion pipelines from diverse sources using Apache Sqoop, Python JDBC, Pandas, NumPy, Spark, and Iceberg connectors to ensure seamless integration across systems. Implemented machine learning pipelines on AWS SageMaker, Glue ML Transforms, and DataRobot, embedding MLOps practices for automated model deployment. Built and optimized ETL jobs in Matillion for large-scale data transformation, reducing processing time by

~30% through job parallelization and tuning.

Applied Unity Catalog for centralized governance, implementing row-level security and lineage tracking to meet Citi’s compliance requirements.

Developed pipelines in Python, AWS Lambda, and Snowflake to automate ingestion, transformations, and reporting, and created business insights dashboards using Streamlit. Leveraged Snowflake advanced features such as Streams, Tasks, Snowpark, and Stored Procedures, and orchestrated workflows with Airflow for dependency tracking and SLA monitoring. Designed knowledge graph pipelines with Neo4j for complex relationship analytics, using Cypher queries to support fraud detection and compliance use cases. Built pipelines on AWS Glue Lakehouse integrating S3, Iceberg, and Snowflake to manage structured and unstructured data for scalable analytics.

CAMBIA HEALTH, PORTLAND, OR IMPL. THROUGH CITIUS TECH SENIOR DATA ENGINEER JAN 2023 - JAN 2025

Key Responsibilities:

Developed enterprise-scale ETL/ELT pipelines using Databricks (Delta Tables, Auto Loader, Pipelines, and Workflows), Azure Synapse Analytics, Snowflake, ADF, and dbt to process structured and semi-structured financial and accounting data.

Migrated legacy ADF/Synapse pipelines into PySpark- and dbt-based solutions on Databricks, improving processing speed and maintainability by over 40%.

Built real-time streaming pipelines using Apache Spark Streaming, Delta Live Tables, and Snowflake Streaming with watermarking, windowing, and aggregation logic to deliver low-latency financial insights. Designed incremental processing frameworks in PySpark and Python to enable near real-time ingestion and reduce pipeline latency.

Created data models in Snowflake and Synapse Analytics, applying dbt standards for transformations, query optimization, and scalable financial reporting. Orchestrated workflows with ADF, Airflow, Terraform, and dbt Cloud to automate pipelines, achieving 99.9% uptime with strong error handling.

Tuned Spark and Snowflake jobs using partitioning, clustering, caching, and parallelization, reducing execution time and compute cost by up to 30%.

Applied governance and compliance controls with Azure Purview, Unity Catalog, and Snowflake RBAC for lineage, metadata, and secure access.

Worked closely with finance and engineering teams to deliver efficient technical solutions aligned with business reporting needs.

Delivered hands-on solutions using Python, PySpark, dbt, Databricks, Snowflake, and Azure-native tools to modernize financial analytics workflows.

Key Responsibilities:

Develop high-level and detailed solution architectures for data engineering projects, ensuring alignment with business objectives and scalability.

Design solutions using big data technologies like PySpark over Amazon EMR, AWS Glue, and others. Real-time data processing using services like Amazon Kinesis Design and optimize data warehouses using services like Amazon Redshift and implement strategies to optimize query performance for large datasets

Design and implement data models that meet business needs and ensure efficient data storage and retrieval

Set up monitoring and logging for data pipelines and warehouses. Diagnose and resolve issues in data pipelines and architectures

Effectively communicate technical decisions, challenges, and solutions to both technical and non- technical stakeholders

Extensive hands-on experience leveraging PySpark, Spark with Scala and Java on Amazon EMR, Hadoop with Java for large-scale data processing, ensuring efficient ETL workflows and optimal resource utilization. Data ingestion using Apache Sqoop, Python JDBC application, Spark connectors, Fivetran Built and optimized data pipelines in Snowflake using Streams, Tasks, and Stored Procedures, while applying transformation frameworks with dbt to enforce modeling standards, improve query performance, and enable scalable, governed analytics. HEALTHFIRST IMPL. THROUGH TCS

DATA ENGINEER JAN 2021- NOV 2022

Key Responsibilities:

Developed enterprise-scale data platforms on GCP using BigQuery, Dataproc, Cloud SQL, and GCS, ensuring scalable and secure infrastructure for analytics and reporting. Migrated financial and operational datasets from on-prem systems to GCP using Database Migration Service (DMS), Change Data Capture (CDC), and custom ETL scripts, ensuring zero data loss and minimal downtime.

Built and optimized real-time and batch pipelines with Apache Beam, Dataproc (Spark), and Dataflow to process both streaming and historical data at scale. Implemented ML and feature pipelines in Vertex AI Pipelines by integrating Dataflow, BigQuery, and custom models for fraud detection and forecasting use cases. Applied data quality checks using GCP Dataplex, integrating profiling, anomaly detection, and rule-based validations across BigQuery and GCS assets.

Managed data governance and sharing with Analytics Hub, configuring secure access, curated datasets, and lineage tracking for internal and external consumers. Designed and maintained dimensional and analytical models aligned with finance and operations metrics, ensuring reusable and trusted datasets across teams. Orchestrated pipelines using Apache Airflow and Cloud Composer with DAG-based scheduling, error handling, and dependency management.

Implemented data quality frameworks with dbt-utils, dbt-expectations, Apache Griffin, and Great Expectations to ensure accuracy and completeness of critical datasets. Optimized BigQuery and Cloud SQL performance and cost through partitioning, clustering, indexing, and materialized views, reducing query costs by ~25%.

Automated infrastructure provisioning and deployment using Terraform, Cloud Build, and Jenkins, enabling CI/CD for data services.

Created Tableau dashboards and data stories for finance, operations, and leadership teams, enabling data-driven decision-making.

PAYU FINANCE, LAZYPAY, BANGALPRE IMPL. THROUGH AAKAR AI DATA ENGINEER JAN 2018 - NOV 2020

Contact this candidate