Post Job Free
Sign in

Senior Data Engineer AWS/Azure Data Solutions Expert

Location:
Springfield, IL, 62704
Salary:
$70k - $80k
Posted:
February 05, 2026

Contact this candidate

Resume:

PROFESSIONAL SUMMARY

Sumanth Marri

Data Engineer

+1-217-***-**** **************@*****.*** Illinois

Around 5 years of experience in designing and implementing scalable AWS and Azure cloud-based data solutions.

Expertise in building end-to-end ETL/ELT pipelines using Apache Airflow, AWS Glue, Azure Data Factory, and Databricks.

Extensive experience with AWS services including S3, EMR, Redshift, Lambda, Kinesis, DynamoDB, Glue, IAM, and CloudWatch.

Proven hands-on experience with Azure ecosystem including ADF, ADLS Gen2, Databricks, Synapse Analytics, Event Hubs, Azure Functions, and Logic Apps.

Skilled in designing modern data lake architectures using Delta Lake and Bronze–Silver–Gold layered approach.

Strong background in Spark (Scala & PySpark) for large-scale data processing and performance optimization.

Expertise in data warehousing concepts, dimensional modeling, Star and Snowflake schemas, and SCD Type implementations.

Proficient in SQL, Python, and Shell scripting for data transformation, automation, and orchestration.

Experience working with streaming and real-time data pipelines using Kinesis, Event Hubs, and Kafka-based systems.

Hands-on experience with Redshift and Azure Synapse performance tuning, partitioning, indexing, and query optimization.

Strong understanding of CI/CD pipelines using Git, Azure DevOps, AWS CodePipeline, and infrastructure automation.

Experienced in data governance, security, and compliance, including IAM, RBAC, encryption, and PII handling.

Proven ability in data quality validation, monitoring, backfills, and troubleshooting production pipelines.

Experience implementing serverless and event-driven architectures using AWS Lambda and Azure Functions.

Self-motivated professional with excellent communication skills, thriving in Agile/Scrum environments.

EDUCATION

Master of Science in Management Information Systems Bachelor of Engineering in Mechanical Engineering

TECHNICAL SKILLS

Cloud Platforms

AWS (S3, EMR, Redshift, Glue, Lambda, Kinesis, DynamoDB, IAM), Azure (ADF, ADLS Gen2,

Databricks, Synapse, Event Hubs, Functions)

Data Engineering &

ETL

Apache Airflow, AWS Glue, Azure Data Factory, Databricks, Delta Lake, ELT/ETL Pipelines,

Bronze–Silver–Gold Architecture

Big Data & Processing

Apache Spark (PySpark, Scala), Hadoop, Hive, Kafka, Kinesis Streams, Event Hubs

Databases &

Warehousing

Amazon Redshift, Azure Synapse (Dedicated/Serverless), SQL, Dimensional Modeling, SCD

Type

DevOps, Security &

Tools

Git, Azure DevOps, AWS CodePipeline, CI/CD, IAM, RBAC, Data Governance, Monitoring

(CloudWatch, Azure Monitor)

PROFESSIONAL EXPERIENCE

MetLife, NY Jul 2024 - Present

Data Engineer Responsibilities:

Designed and implemented scalable data pipelines and workflows using Apache Airflow, automating data processing tasks and reducing operational overhead.

Streamlined real-time data ingestion from Kinesis streams to S3 Data Lake using Kinesis Firehose, ensuring efficient and fault-tolerant data delivery for analytics.

Performed complex ETL operations on data stored in S3 using AWS EMR, enabling high-performance transformations and processing at scale.

Leveraged AWS Glue as a Hive Metastore to organize and catalog large datasets, enabling seamless data discovery and querying across the data lake.

Engineered solutions for data synchronization between DynamoDB and S3, using Hive tables with DynamoDB SerDe for efficient daily data transfers.

Developed and executed Spark jobs using Scala for data transformations, leveraging RDDs, DataFrames, and Datasets to manipulate transactional data for analytics.

Leveraged AWS Lambda and Step Functions for building serverless workflows, automating complex data processing pipelines and improving system flexibility.

Architected and deployed delta lake solutions on S3, resolving backfill and re-ingestion challenges and improving data consistency in a highly dynamic environment.

Ensured high availability and scalability of data pipelines through auto-scaling strategies for EMR clusters based on workload requirements.

Led data profiling and data quality assessments, utilizing Redshift, EMR, and Glue to gain insights into the structure, integrity, and completeness of datasets.

Collaborated cross-functionally with business stakeholders, using AWS CloudWatch to monitor and troubleshoot data workflows, enhancing system reliability and performance.

Environment: Apache Airflow, Redshift, Hadoop, Spark, AWS EMR, Dynamo DB, S3 Data Lake, Hive, EC2, AWS Glue, Scala, Kinesis Streams, Kinesis Firehouse

TD Bank, MI Sep 2023 – Jun 2024

Cloud Data Engineer Responsibilities:

Designed and implemented end-to-end data pipelines in Azure Data Factory (ADF) for ingesting high-volume insurance data (policies, claims, premiums) from streaming and batch sources.

Built real-time ingestion pipelines using Azure Event Hubs/Kafka equivalents and landed data into Azure Data Lake Storage Gen2 (ADLS) following Bronze–Silver–Gold architecture.

Performed large-scale ETL/ELT using Azure Databricks (Spark/Scala/PySpark) to cleanse, enrich, and transform insurance transactional data.

Implemented Delta Lake on ADLS to support schema evolution, backfills, CDC, and reliable re-processing of historical insurance data.

Developed and optimized Azure Synapse Analytics (Dedicated & Serverless SQL Pools) for enterprise data warehousing and analytical workloads.

Applied data modeling best practices (star/snowflake schemas) and optimized Synapse performance using distribution keys, partitioning, and indexing strategies.

Automated data orchestration and dependency management using ADF pipelines, triggers, and parameterized workflows.

Created Hive-compatible external tables and metadata management using Azure Purview/Glue-equivalent cataloging concepts.

Implemented data lifecycle management policies to move cold insurance data to Azure Archive tier for compliance and cost optimization.

Tuned Spark jobs by optimizing memory, partitioning, caching, broadcast joins, and serialization, reducing processing latency significantly.

Built serverless workflows using Azure Functions and Azure Logic Apps for lightweight transformations and event- driven processing.

Monitored and troubleshot data pipelines using Azure Monitor, Log Analytics, and Application Insights.

Developed BI-ready views and aggregates in Synapse to support reporting tools like Power BI for actuarial and underwriting analytics.

Environment: Azure Data Factory (ADF), Azure Databricks (Spark, PySpark, Scala), Azure Synapse Analytics, Azure Data Lake Storage Gen2 (ADLS), Delta Lake, Kafka, Azure Functions, Logic Apps, Power BI, Azure Monitor, Log Analytics, Azure Purview, ETL/ELT, Data Modeling (Star/Snowflake)

Citius Tech, India May 2019 – Jul 2022

Data Engineer Responsibilities:

Gathered business and regulatory requirements and designed end-to-end cloud-based ETL/ELT pipelines on AWS and Azure.

Developed PySpark transformations on AWS Glue/Azure Databricks to process historical and incremental datasets efficiently.

Implemented data quality checks, reconciliation rules, and schema validation using SQL, Python, and Spark to ensure accuracy of reporting data.

Designed and implemented Slowly Changing Dimensions (SCD Type) for finance data warehouses using Spark, SQL, and cloud-native storage.

Created and optimized SQL queries on Amazon Redshift, Azure Synapse, and relational databases to validate data consistency and performance.

Implemented CI/CD pipelines using Git, Azure DevOps, and AWS CodePipeline for automated deployment of data engineering assets.

Ensured data security and compliance by implementing IAM roles, Azure RBAC, encryption at rest/in transit, and secure secrets management.

Migrated and ingested data from on-prem databases (Oracle, MySQL) to AWS and Azure cloud storage using cloud- native tools.

Collaborated in Agile/Scrum teams, participated in sprint planning, code reviews, and delivered high-quality data solutions on time.

Troubleshot production issues and implemented hot fixes to ensure minimal downtime for critical reporting pipelines.

Documented data models, pipeline designs, and unit test results to support audit and compliance requirements in finance systems.

Environment: AWS (Glue, Redshift, S3, IAM, CodePipeline), Azure (ADF, Databricks, Synapse, RBAC), PySpark, SQL, Python, CI/CD (Git, Azure DevOps), Oracle, MySQL, Agile/Scrum



Contact this candidate