Data Engineer - AWS/Azure Cloud & ETL Expert

Location:

Springfield, IL, 62704

Salary:

$75000

Posted:

March 20, 2026

Contact this candidate

Resume:

Sumanth M.

Data Engineer

*+ years’ experience React Spring Boot AWS Microservices

Work Authorization: F1 OPT (STEM eligible until 2027)

Open to relocation anywhere in the U.S.

Illinois, USA

+1-217-***-**** **************@*****.***

LinkedIn: http://linkedin.com/in/sumanth-marri-a0a483330

PROFESSIONAL SUMMARY

Data Engineer with ~5 years of experience designing scalable cloud data platforms across AWS and Azure.

Proven expertise in building end-to-end ETL/ELT pipelines using Airflow, Azure Data Factory, AWS Glue, and Databricks.

Architected modern data lake & Lakehouse solutions using Delta Lake and Bronze–Silver–Gold architecture.

Strong background in Spark (PySpark & Scala) for large-scale distributed processing and performance tuning.

Delivered reliable real-time and batch pipelines using Kinesis, Kafka, and Azure Event Hubs.

Improved analytics readiness and decision support through optimized data models and warehousing in Redshift & Synapse.

Implemented data quality, governance, and security controls including IAM, RBAC, encryption, and PII compliance.

Experienced in CI/CD automation, monitoring, and cost-efficient cloud architectures.

Exposure to AI-assisted anomaly detection and predictive insights for data quality monitoring.

TECHNICAL SKILLS

Cloud Platforms: AWS (S3, EMR, Redshift, Glue, Lambda, Kinesis, DynamoDB,

IAM, CloudWatch) Azure (ADF, ADLS Gen2, Databricks, Synapse,

Event Hubs, Functions, Logic Apps).

Data Engineering & ETL: Apache Airflow, AWS Glue, Azure Data Factory, Databricks, Delta

Lake, ETL/ELT, Pipelines, Bronze–Silver–Gold Architecture.

Big Data & Streaming: Apache Spark (PySpark, Scala), Hadoop, Hive, Kafka, Kinesis

Streams Azure Event Hubs.

Databases & Warehousing: Amazon Redshift, Azure Synapse, SQL, Dimensional Modeling,

Star & Snowflake Schemas, SCD.

DevOps, Security & Observability: Git, Azure DevOps, AWS CodePipeline, CI/CD, IAM, RBAC, Data

Governance, CloudWatch, Azure Monitor.

Data Analytics & AI Exposure: Pandas, Trend Analysis, Anomaly Detection, Predictive Insights,

Data Quality Monitoring.

PROFESSIONAL EXPERIENCE

MetLife — Data Engineer

New York, USA Jul 2024 – Present

Designed scalable Airflow-orchestrated pipelines reducing manual operations and improving workflow reliability.

Built real-time ingestion pipelines using Kinesis Firehose S3 Data Lake, enabling fault-tolerant analytics delivery.

Executed large-scale transformations on EMR using Spark, improving processing efficiency for high-volume datasets.

Implemented AWS Glue Data Catalog for centralized metadata management and seamless data discovery.

Engineered DynamoDB-to-S3 synchronization pipelines using Hive SerDe, enabling efficient daily data transfers.

Developed Spark (Scala) transformations for transactional datasets, improving downstream analytics accuracy.

Automated workflows using AWS Lambda & Step Functions, enhancing pipeline flexibility and reducing latency.

Architected Delta Lake solutions on S3, resolving re-ingestion and backfill challenges while ensuring data consistency.

Optimized EMR auto-scaling strategies, improving cost efficiency and maintaining high availability.

Conducted data profiling and quality assessments, improving dataset integrity and trustworthiness.

Environment: AWS, Airflow, EMR, Spark, S3, Redshift, Lambda, Step Functions, Glue, DynamoDB, Delta Lake

TD Bank — Cloud Data Engineer

Michigan, USA Sep 2023 – Jun 2024

Built end-to-end ADF pipelines ingesting high-volume insurance data from batch and streaming sources.

Developed real-time ingestion using Event Hubs ADLS Gen2 following Bronze–Silver–Gold architecture.

Implemented large-scale transformations in Azure Databricks (PySpark/Scala) for data cleansing and enrichment.

Enabled schema evolution and CDC using Delta Lake, ensuring reliable historical reprocessing.

Designed Synapse data warehouse models and optimized performance via partitioning and distribution strategies.

Automated pipeline orchestration using parameterized ADF workflows and triggers.

Tuned Spark jobs using caching, partitioning, and broadcast joins, significantly reducing latency.

Built serverless workflows using Azure Functions & Logic Apps for event-driven transformations.

Delivered BI-ready datasets supporting Power BI dashboards for actuarial and underwriting analytics.

Environment: Azure Data Factory, Databricks, Synapse, ADLS Gen2, Event Hubs, Delta Lake, Power BI

Infosys — Data Engineer

India May 2019 – Jul 2022

Designed cloud-based ETL/ELT pipelines on AWS & Azure based on regulatory and business requirements.

Developed PySpark transformations on AWS Glue & Databricks for incremental and historical processing.

Implemented data quality checks and reconciliation rules improving reporting accuracy.

Built SCD Type implementations for finance data warehouses ensuring historical consistency.

Optimized SQL queries across Redshift and Synapse to improve performance and validation.

Automated deployments using CI/CD pipelines with Git, Azure DevOps, and AWS CodePipeline.

Implemented IAM, RBAC, encryption, and secure secrets management to meet compliance standards.

Migrated data from on-prem Oracle/MySQL systems to cloud storage platforms.

Collaborated in Agile teams delivering high-quality data solutions within sprint cycles.

Environment: AWS, Azure, Glue, Databricks, Redshift, Synapse, SQL, Python, CI/CD

EDUCATION

Master of Science — Management Information Systems (University of Illinois)

Bachelor of Engineering — Mechanical Engineering

Contact this candidate