Aravind Reddy
****************@*****.*** — 904-***-**** — LinkedIn: linkedin.com/in/aravind45
Professional Summary
• 5+ years of experience in designing and scaling data engineering solutions, cloud migrations, and ETL optimization across Healthcare, Banking, and Insurance domains.
• Hands-on expertise in building real-time (Kafka, Spark Streaming, Flink) and batch (PySpark, Hive, Airflow, NiFi) data pipelines.
• Skilled in modern Lakehouse, Data Mesh, and Data Warehouse architectures using Snowflake, Databricks, Synapse, Redshift, BigQuery.
• Proficient in multi-cloud: AWS (EMR, Glue, Redshift, Lambda, S3), Azure (ADF, Synapse, Databricks, ADLS), GCP (BigQuery, Dataflow, Pub/Sub).
• Strong knowledge of CI/CD, IaC, and DevOps using Jenkins, GitLab CI, Terraform, Docker, Kubernetes, Helm, and Ansible.
• Experienced in data governance, quality, and compliance (HIPAA, GDPR, PCI, SOX) with Great Expec- tations, Deequ, Control-M, and Markit EDM.
Experience
Optum — Boston, MA Jan 2024 – Present
Data Engineer
• Solved frequent SLA breaches by building exactly-once streaming on Amazon MSK EMR Spark with DLQs in SQS; reduced SLA breaches by 30%.
• Standardized the lakehouse on S3 + Apache Iceberg with Glue Catalog and Parquet; improved analytic query latency by 35%.
• Stopped duplicate and missing records from CDC sources using AWS DMS to S3 with watermarking and idempotent upserts in Spark; data defects reduced by 40%.
• SQL (Athena/Redshift/Snowflake): Wrote performance-critical queries for sessionization and SCD2 using window functions, partition pruning, and materialized views; reduced query cost by 20% and dashboard latency by 35%.
• Built batch backfills and feature prep on Databricks on AWS (Jobs, Workflows, Delta I/O) and loaded curated marts to Snowflake via Snowpipe; simplified model hand-offs to analytics.
• Orchestrated pipelines with MWAA (Airflow) and Step Functions, and alerted with CloudWatch; reduced mean time to detect incidents.
• Enforced HIPAA controls with Lake Formation row and column policies, IAM/KMS, and Macie; enabled safe self-serve access.
• Shipped data and infra via Terraform + CodeBuild/CodePipeline with Great Expectations checks; reduced manual release effort by 50%.
• Integrated Azure ADLS and GCP Pub/Sub feeds into S3 through API Gateway + Lambda and schema contracts; shortened partner onboarding time.
ICICI Bank — Hyderabad, India Jan 2021 – May 2023
Data Engineer
• Built fraud and transactions streaming with Kinesis Data Streams/Firehose and Flink on KDA; reduced false positives by 18%.
• SQL (Redshift/Snowflake): Designed star schema and authored high-impact queries using window functions, late-arrival handling, sort/dist keys, and materialized views; analyst query time reduced by 30%.
• Implemented CDC using DMS to S3 with Hudi tables and MERGE logic; prevented replay errors and late- arriving duplicates.
1
• Ran batch enrichment and periodic backfills on Databricks on AWS with Delta Lake formats; published curated datasets to Snowflake for BI and risk analytics.
• Met BCBS 239 and PCI needs with Lake Formation lineage, Glue schemas, masking, and KMS encryption; passed audits without findings.
• Orchestrated with MWAA and Step Functions, added retries and DLQs, and monitored with Cloud- Watch/OpenSearch; reduced unplanned downtime by 25%.
• Automated environments using CloudFormation/Terraform and CodeBuild; saved more than 20 engineer- ing hours per month.
• Integrated partner APIs through API Gateway + Lambda with JSON Schema validation into S3 Glue
Redshift/Snowflake; partner feed latency reduced by 45%. Accenture — Bangalore, India Jun 2019 – Dec 2020
Data Engineer (Platform and Analytics Integration)
• Migrated batch and streaming ETL to EMR PySpark with Hive and NiFi landing Parquet/ORC in S3; processing throughput increased by 30%.
• SQL (Redshift/Athena/Snowflake/Hive): Implemented SCD2 merges, partitioned external tables, and CTE-based transformations; report refresh times reduced by 50% for more than 300 users.
• Implemented Databricks on AWS with Delta Lake for reliable upserts and backfills; stabilized downstream joins and reduced small-file issues through compaction.
• Warehousing on Redshift RA3 and Snowflake with Glue Catalog external tables; improved concurrency and lowered cost through workload management and compression.
• Ran pipelines with MWAA and Control-M, triggered by EventBridge, with metrics in CloudWatch; improved SLA adherence across daily loads.
• Enforced governance with Lake Formation permissions, KMS encryption, Macie for PII, and Great Ex- pectations checks; reduced production rollbacks.
• Built CI/CD using Terraform + CodePipeline, containerized Spark jobs in ECR/EKS, and added data contract tests; deployment time reduced significantly.
• Integrated Azure Synapse and GCP BigQuery feeds into S3 with conformed schemas; simplified cross-cloud reporting and reduced manual reconciliation.
Technical Skills
Big Data & Streaming Spark (PySpark/Scala), Kafka/MSK, Kinesis (Data Streams/Firehose), DMS
(CDC), Flink, NiFi
AWS Data Platform S3 (Bronze/Silver/Gold), Glue (Jobs, Crawlers, Data Quality, Schema Registry), EMR/EMR Serverless, Athena, Redshift/Serverless, Lake Formation, QuickSight Lakehouse & Formats Apache Hudi, Apache Iceberg, Delta Lake; Parquet, ORC, Avro; SCD2, CDC, data contracts
Orchestration/Messaging MWAA (Airflow), Step Functions, EventBridge, SQS, SNS, Control-M ETL/Modeling & BI dbt (Redshift/Athena), Glue Studio, Informatica, Talend; Tableau, Power BI Databases Snowflake, Redshift, Athena/Glue Catalog, SQL Server, Oracle, PostgreSQL, MySQL, DynamoDB, MongoDB, Cassandra
Programming Python, SQL, Scala, Java, Shell
DevOps & IaC Terraform, CloudFormation, Jenkins/GitLab CI, Docker, Kubernetes/EKS, Helm Security/Governance IAM, KMS, Macie (PII), CloudTrail, Great Expectations, Deequ, (Amazon Data- Zone)
Observability CloudWatch (metrics/logs/alarms), OpenSearch, Grafana ML & DL Frameworks scikit-learn, XGBoost,TensorFlow, Keras, PyTorch; Hugging Face Transformers ML Adjacent (AWS) SageMaker (batch/real-time, Feature Store), MLflow (tracking) 2
Certifications
• AWS Certified Data Engineer – Associate
• Databricks Certified Data Engineer Associate
• Microsoft Certified: Azure Data Engineer Associate (in progress/optional) 3