Post Job Free
Sign in

Data Engineer Engineering

Location:
Plant City, FL
Salary:
90000
Posted:
October 15, 2025

Contact this candidate

Resume:

Aravind Reddy

****************@*****.*** — 904-***-**** — LinkedIn: linkedin.com/in/aravind45

Professional Summary

• 5+ years of experience in designing and scaling data engineering solutions, cloud migrations, and ETL optimization across Healthcare, Banking, and Insurance domains.

• Hands-on expertise in building real-time (Kafka, Spark Streaming, Flink) and batch (PySpark, Hive, Airflow, NiFi) data pipelines.

• Skilled in modern Lakehouse, Data Mesh, and Data Warehouse architectures using Snowflake, Databricks, Synapse, Redshift, BigQuery.

• Proficient in multi-cloud: AWS (EMR, Glue, Redshift, Lambda, S3), Azure (ADF, Synapse, Databricks, ADLS), GCP (BigQuery, Dataflow, Pub/Sub).

• Strong knowledge of CI/CD, IaC, and DevOps using Jenkins, GitLab CI, Terraform, Docker, Kubernetes, Helm, and Ansible.

• Experienced in data governance, quality, and compliance (HIPAA, GDPR, PCI, SOX) with Great Expec- tations, Deequ, Control-M, and Markit EDM.

Experience

Optum — Boston, MA Jan 2024 – Present

Data Engineer

• Solved frequent SLA breaches by building exactly-once streaming on Amazon MSK EMR Spark with DLQs in SQS; reduced SLA breaches by 30%.

• Standardized the lakehouse on S3 + Apache Iceberg with Glue Catalog and Parquet; improved analytic query latency by 35%.

• Stopped duplicate and missing records from CDC sources using AWS DMS to S3 with watermarking and idempotent upserts in Spark; data defects reduced by 40%.

• SQL (Athena/Redshift/Snowflake): Wrote performance-critical queries for sessionization and SCD2 using window functions, partition pruning, and materialized views; reduced query cost by 20% and dashboard latency by 35%.

• Built batch backfills and feature prep on Databricks on AWS (Jobs, Workflows, Delta I/O) and loaded curated marts to Snowflake via Snowpipe; simplified model hand-offs to analytics.

• Orchestrated pipelines with MWAA (Airflow) and Step Functions, and alerted with CloudWatch; reduced mean time to detect incidents.

• Enforced HIPAA controls with Lake Formation row and column policies, IAM/KMS, and Macie; enabled safe self-serve access.

• Shipped data and infra via Terraform + CodeBuild/CodePipeline with Great Expectations checks; reduced manual release effort by 50%.

• Integrated Azure ADLS and GCP Pub/Sub feeds into S3 through API Gateway + Lambda and schema contracts; shortened partner onboarding time.

ICICI Bank — Hyderabad, India Jan 2021 – May 2023

Data Engineer

• Built fraud and transactions streaming with Kinesis Data Streams/Firehose and Flink on KDA; reduced false positives by 18%.

• SQL (Redshift/Snowflake): Designed star schema and authored high-impact queries using window functions, late-arrival handling, sort/dist keys, and materialized views; analyst query time reduced by 30%.

• Implemented CDC using DMS to S3 with Hudi tables and MERGE logic; prevented replay errors and late- arriving duplicates.

1

• Ran batch enrichment and periodic backfills on Databricks on AWS with Delta Lake formats; published curated datasets to Snowflake for BI and risk analytics.

• Met BCBS 239 and PCI needs with Lake Formation lineage, Glue schemas, masking, and KMS encryption; passed audits without findings.

• Orchestrated with MWAA and Step Functions, added retries and DLQs, and monitored with Cloud- Watch/OpenSearch; reduced unplanned downtime by 25%.

• Automated environments using CloudFormation/Terraform and CodeBuild; saved more than 20 engineer- ing hours per month.

• Integrated partner APIs through API Gateway + Lambda with JSON Schema validation into S3 Glue

Redshift/Snowflake; partner feed latency reduced by 45%. Accenture — Bangalore, India Jun 2019 – Dec 2020

Data Engineer (Platform and Analytics Integration)

• Migrated batch and streaming ETL to EMR PySpark with Hive and NiFi landing Parquet/ORC in S3; processing throughput increased by 30%.

• SQL (Redshift/Athena/Snowflake/Hive): Implemented SCD2 merges, partitioned external tables, and CTE-based transformations; report refresh times reduced by 50% for more than 300 users.

• Implemented Databricks on AWS with Delta Lake for reliable upserts and backfills; stabilized downstream joins and reduced small-file issues through compaction.

• Warehousing on Redshift RA3 and Snowflake with Glue Catalog external tables; improved concurrency and lowered cost through workload management and compression.

• Ran pipelines with MWAA and Control-M, triggered by EventBridge, with metrics in CloudWatch; improved SLA adherence across daily loads.

• Enforced governance with Lake Formation permissions, KMS encryption, Macie for PII, and Great Ex- pectations checks; reduced production rollbacks.

• Built CI/CD using Terraform + CodePipeline, containerized Spark jobs in ECR/EKS, and added data contract tests; deployment time reduced significantly.

• Integrated Azure Synapse and GCP BigQuery feeds into S3 with conformed schemas; simplified cross-cloud reporting and reduced manual reconciliation.

Technical Skills

Big Data & Streaming Spark (PySpark/Scala), Kafka/MSK, Kinesis (Data Streams/Firehose), DMS

(CDC), Flink, NiFi

AWS Data Platform S3 (Bronze/Silver/Gold), Glue (Jobs, Crawlers, Data Quality, Schema Registry), EMR/EMR Serverless, Athena, Redshift/Serverless, Lake Formation, QuickSight Lakehouse & Formats Apache Hudi, Apache Iceberg, Delta Lake; Parquet, ORC, Avro; SCD2, CDC, data contracts

Orchestration/Messaging MWAA (Airflow), Step Functions, EventBridge, SQS, SNS, Control-M ETL/Modeling & BI dbt (Redshift/Athena), Glue Studio, Informatica, Talend; Tableau, Power BI Databases Snowflake, Redshift, Athena/Glue Catalog, SQL Server, Oracle, PostgreSQL, MySQL, DynamoDB, MongoDB, Cassandra

Programming Python, SQL, Scala, Java, Shell

DevOps & IaC Terraform, CloudFormation, Jenkins/GitLab CI, Docker, Kubernetes/EKS, Helm Security/Governance IAM, KMS, Macie (PII), CloudTrail, Great Expectations, Deequ, (Amazon Data- Zone)

Observability CloudWatch (metrics/logs/alarms), OpenSearch, Grafana ML & DL Frameworks scikit-learn, XGBoost,TensorFlow, Keras, PyTorch; Hugging Face Transformers ML Adjacent (AWS) SageMaker (batch/real-time, Feature Store), MLflow (tracking) 2

Certifications

• AWS Certified Data Engineer – Associate

• Databricks Certified Data Engineer Associate

• Microsoft Certified: Azure Data Engineer Associate (in progress/optional) 3



Contact this candidate