Sumanth M.
Data Engineer
*+ years’ experience React Spring Boot AWS Microservices
Work Authorization: F1 OPT (STEM eligible until 2027)
Open to relocation anywhere in the U.S.
Illinois, USA
+1-217-***-**** **************@*****.***
LinkedIn: http://linkedin.com/in/sumanth-marri-a0a483330
PROFESSIONAL SUMMARY
Data Engineer with ~5 years of experience designing scalable cloud data platforms across AWS and Azure.
Proven expertise in building end-to-end ETL/ELT pipelines using Airflow, Azure Data Factory, AWS Glue, and Databricks.
Architected modern data lake & Lakehouse solutions using Delta Lake and Bronze–Silver–Gold architecture.
Strong background in Spark (PySpark & Scala) for large-scale distributed processing and performance tuning.
Delivered reliable real-time and batch pipelines using Kinesis, Kafka, and Azure Event Hubs.
Improved analytics readiness and decision support through optimized data models and warehousing in Redshift & Synapse.
Implemented data quality, governance, and security controls including IAM, RBAC, encryption, and PII compliance.
Experienced in CI/CD automation, monitoring, and cost-efficient cloud architectures.
Exposure to AI-assisted anomaly detection and predictive insights for data quality monitoring.
TECHNICAL SKILLS
Cloud Platforms: AWS (S3, EMR, Redshift, Glue, Lambda, Kinesis, DynamoDB,
IAM, CloudWatch) Azure (ADF, ADLS Gen2, Databricks, Synapse,
Event Hubs, Functions, Logic Apps).
Data Engineering & ETL: Apache Airflow, AWS Glue, Azure Data Factory, Databricks, Delta
Lake, ETL/ELT, Pipelines, Bronze–Silver–Gold Architecture.
Big Data & Streaming: Apache Spark (PySpark, Scala), Hadoop, Hive, Kafka, Kinesis
Streams Azure Event Hubs.
Databases & Warehousing: Amazon Redshift, Azure Synapse, SQL, Dimensional Modeling,
Star & Snowflake Schemas, SCD.
DevOps, Security & Observability: Git, Azure DevOps, AWS CodePipeline, CI/CD, IAM, RBAC, Data
Governance, CloudWatch, Azure Monitor.
Data Analytics & AI Exposure: Pandas, Trend Analysis, Anomaly Detection, Predictive Insights,
Data Quality Monitoring.
PROFESSIONAL EXPERIENCE
MetLife — Data Engineer
New York, USA Jul 2024 – Present
Designed scalable Airflow-orchestrated pipelines reducing manual operations and improving workflow reliability.
Built real-time ingestion pipelines using Kinesis Firehose S3 Data Lake, enabling fault-tolerant analytics delivery.
Executed large-scale transformations on EMR using Spark, improving processing efficiency for high-volume datasets.
Implemented AWS Glue Data Catalog for centralized metadata management and seamless data discovery.
Engineered DynamoDB-to-S3 synchronization pipelines using Hive SerDe, enabling efficient daily data transfers.
Developed Spark (Scala) transformations for transactional datasets, improving downstream analytics accuracy.
Automated workflows using AWS Lambda & Step Functions, enhancing pipeline flexibility and reducing latency.
Architected Delta Lake solutions on S3, resolving re-ingestion and backfill challenges while ensuring data consistency.
Optimized EMR auto-scaling strategies, improving cost efficiency and maintaining high availability.
Conducted data profiling and quality assessments, improving dataset integrity and trustworthiness.
Environment: AWS, Airflow, EMR, Spark, S3, Redshift, Lambda, Step Functions, Glue, DynamoDB, Delta Lake
TD Bank — Cloud Data Engineer
Michigan, USA Sep 2023 – Jun 2024
Built end-to-end ADF pipelines ingesting high-volume insurance data from batch and streaming sources.
Developed real-time ingestion using Event Hubs ADLS Gen2 following Bronze–Silver–Gold architecture.
Implemented large-scale transformations in Azure Databricks (PySpark/Scala) for data cleansing and enrichment.
Enabled schema evolution and CDC using Delta Lake, ensuring reliable historical reprocessing.
Designed Synapse data warehouse models and optimized performance via partitioning and distribution strategies.
Automated pipeline orchestration using parameterized ADF workflows and triggers.
Tuned Spark jobs using caching, partitioning, and broadcast joins, significantly reducing latency.
Built serverless workflows using Azure Functions & Logic Apps for event-driven transformations.
Delivered BI-ready datasets supporting Power BI dashboards for actuarial and underwriting analytics.
Environment: Azure Data Factory, Databricks, Synapse, ADLS Gen2, Event Hubs, Delta Lake, Power BI
Infosys — Data Engineer
India May 2019 – Jul 2022
Designed cloud-based ETL/ELT pipelines on AWS & Azure based on regulatory and business requirements.
Developed PySpark transformations on AWS Glue & Databricks for incremental and historical processing.
Implemented data quality checks and reconciliation rules improving reporting accuracy.
Built SCD Type implementations for finance data warehouses ensuring historical consistency.
Optimized SQL queries across Redshift and Synapse to improve performance and validation.
Automated deployments using CI/CD pipelines with Git, Azure DevOps, and AWS CodePipeline.
Implemented IAM, RBAC, encryption, and secure secrets management to meet compliance standards.
Migrated data from on-prem Oracle/MySQL systems to cloud storage platforms.
Collaborated in Agile teams delivering high-quality data solutions within sprint cycles.
Environment: AWS, Azure, Glue, Databricks, Redshift, Synapse, SQL, Python, CI/CD
EDUCATION
Master of Science — Management Information Systems (University of Illinois)
Bachelor of Engineering — Mechanical Engineering