Senior Data Engineer - Cloud-Native ETL & Analytics Expert

Location:

Brewster, NY

Posted:

December 12, 2025

Contact this candidate

Resume:

Anil MANDADAPU

west haven, CT-***** ***************@*****.*** 203-***-**** linkedin.com/in/anil-mandadapu-1bb05b313

Professional Summary:

Experienced Data Engineer with over 5 years of expertise in building scalable, cloud-native data pipelines and ETL frameworks across AWS and Azure ecosystems.

Proven success in designing, developing, and optimizing data warehouses using Snowflake, AWS Redshift, and Azure Synapse Analytics for high-performance analytics.

Proficient in data migration and integration, including historical and real-time ingestion from Teradata, SQL Server, and APIs into cloud data platforms.

Hands-on expertise in building Spark and PySpark-based pipelines on AWS EMR, Databricks, and Azure HDInsight for batch and streaming data workloads.

Adept at automating metadata management and job orchestration using AWS Glue, Lambda, Step Functions, and Apache Airflow.

Skilled in ETL/ELT development using Informatica, Talend, Fivetran, and Apache NiFi, integrating diverse data sources into centralized repositories.

Strong scripting background in Python, Scala, and SQL, used for data cleansing, transformation, and developing UDFs and APIs.

Designed and deployed serverless and containerized solutions using Lambda, Docker, Kubernetes, and Jenkins for scalable and maintainable data applications.

Implemented cloud-native monitoring, disaster recovery, and security practices using AWS S3, RDS, CloudWatch, and Snowflake RBAC.

Delivered real-time analytics and dashboards through Spark Streaming, ELK Stack, Tableau, Cassandra, and QlikView, driving business insights.

Experienced in working with distributed systems (HDFS, Kafka, MapReduce, Hive) and managing large-scale data workflows on Hadoop/YARN.

Demonstrated success in CI/CD, version control, and agile collaboration using GitHub, Jenkins, JIRA, and ServiceNow.

Skills:

Programming Languages

Python, Java, Scala, PL/SQL, Shell Scripting, C#, JavaScript, TypeScript, R, Go, Ruby

Cloud Platforms

AWS (S3, Redshift, EC2, Lambda, Glue, EMR, Athena, QuickSight, Lake Formation), Azure (Blob Storage, Synapse, Data Factory, Azure Functions, Azure Data Lake, Cosmos DB, Power BI)

Data Processing Tools

Apache Spark, PySpark, Spark Streaming, Kafka, Flume, DBT

Data Warehousing

AWS Redshift, Azure Synapse Analytics, Snowflake

Database Systems

Oracle, MS SQL Server, MySQL, Teradata, DynamoDB, ElasticCache, Azure SQL Database, Azure Cosmos DB

Real-Time Data Tools

Kafka, Spark Streaming, Azure Event Hubs

ETL Tools

AWS Glue, Azure Data Factory, Talend, Informatica

Data Visualization

AWS QuickSight, Microsoft Power BI, Tableau, Looker

Version Control

Git, GitHub, GitLab

CI/CD Tools

Jenkins, GitLab CI/CD, Azure DevOps, AWS CodePipeline

Containerization

Docker, Kubernetes, OpenShift

Load Balancers

AWS Elastic Load Balancer, Azure Load Balancer

Educational Qualifications:

Lindsey Wilson College Columbia, KY, USA

Master’s in information technology Oct 2022 – April 2024

KL University Vijayawada, India

Bachelor’s in computer science & engineering April 2015 – May 2019

Certifications:

SnowPro Advanced Data Engineer

AWS Certified Data Engineer-Associate

Professional Experience:

U.S. Bank AWS Data Engineer Jan 2024 – Current Minneapolis, MN(Remote)

Roles & Responsibilities:

Designed and implemented robust AWS Data Pipelines to automate data transfer and transformation processes, enhancing efficiency in data management using AWS S3, AWS Glue, and Snowflake.

Managed large-scale data warehouses using AWS Redshift and SQL Server, ensuring optimal data storage, retrieval, and scalability for enterprise applications.

As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.

Engineered and executed data processing scripts using PySpark, Scala, and Python, significantly improving data manipulation and batch processing tasks in a Hadoop YARN environment.

Developed and managed ETL workflows using Informatica, enabling seamless data integration across multiple sources, and improving data accuracy and consistency for business reporting.

Utilized Python scripting for data cleansing, transformation, and enrichment, ensuring high-quality data availability for analytical applications.

Implemented comprehensive data backup and recovery solutions using AWS S3 and RDS, safeguarding critical business data against potential loss or corruption.

Developed interactive dashboards and reports using Tableau, Cassandra, and QlikView, providing actionable insights into business operations and customer behavior.

Leveraged Sqoop to efficiently transfer bulk data between Hadoop and relational databases, enhancing data integration and consistency across platforms.

Orchestrated the migration of enterprise data to cloud platforms, utilizing AWS Glue, Informatica, and Talend to ensure seamless data integration and consistency.

Programmed complex ETL processes using Talend, Fivetran, and AWS Glue, facilitating the consolidation of data from multiple sources into a centralized repository.

Zepto Data Engineer June 2021 – Aug 2022 Hyderabad, TN, India

Roles & Responsibilities:

Architected a scalable data warehouse solution using Azure Synapse Analytics and executed advanced analytical queries within the Azure environment.

Participated in building and integrating a data lake on Azure Data Lake Storage to support a wide range of application and development needs.

Enhanced operational workflows by scripting automation solutions with Azure Functions using Python in the Azure cloud.

Utilized Azure HDInsight for big data processing across Hadoop clusters, efficiently using Azure Virtual Machines and Blob Storage.

Developed and executed Spark jobs in HDInsight via Azure Notebooks to streamline large-scale data processing tasks.

Built high-performance Spark applications in Python to run on HDInsight clusters, improving data handling efficiency.

Deployed the ELK stack (Elasticsearch, Logstash, Kibana) on Azure to collect, analyze, and visualize website logs.

Designed and deployed robust ETL processes using tools such as Apache NiFi, Talend, or Informatica to ingest data from APIs, flat files, and relational databases.

Applied testing best practices by writing thorough unit tests with PyTest to ensure code reliability and maintainability.

Created serverless architectures incorporating Azure API Management, Azure Functions, Azure Blob Storage, and Cosmos DB, with auto-scaling features for enhanced performance.

Leveraged Azure Stream Analytics and Synapse Analytics to populate and manage data warehousing solutions efficiently.

Programmed User Defined Functions (UDFs) in Scala to encapsulate and automate business logic within data applications.

Built end-to-end Azure Data Factory pipelines to ingest, transform, and store data, seamlessly integrating with various Azure services.

Ran Hadoop and Spark jobs on HDInsight using data stored in Azure Blob Storage to support distributed big data processing.

Designed custom infrastructure using Azure Resource Manager (ARM) templates to deploy and manage pipelines effectively.

The Oriental Insurance Company Data Engineer Apr 2020 – Jun 2021 Hyderabad, India

Roles & Responsibilities:

Designed and built end-to-end data pipelines for cloud migration using AWS Glue, Lambda, Step Functions, PySpark, and Java.

Automated metadata discovery and cataloging with AWS Glue Crawlers across S3, RDS, and other data sources.

Integrated AWS Managed Kafka with Spark clusters in Databricks for real-time streaming and analytics.

Migrated large-scale data from on-prem SQL Server to Amazon RDS, EMR Hive, and Redshift, streamlining cloud adoption.

Developed and optimized ETL workflows in Glue using Python, Java, and Scala for structured and unstructured data transformation.

Managed and scaled Hadoop and Cloudera clusters on EC2 to support high-volume data processing.

Engineered PySpark-based ingestion pipelines and utilized Spark SQL/DataFrames for efficient data loading.

Implemented Snowflake and Snowpipe for downstream processing, analytics, and near real-time ingestion.

Created Lambda functions for job triggering, monitoring, and serverless data transformation tasks.

Orchestrated complex workflows with AWS Step Functions, improving automation, fault tolerance, and maintainability.

The Oriental Insurance Company Big Data Engineer Jan 2019 – Mar 2020 Hyderabad, India

Roles & Responsibilities:

Built and optimized data pipelines using Spark on AWS EMR to ingest and transform data from S3 and deliver curated datasets to Snowflake.

Developed Spark SQL and PySpark scripts in Databricks for data extraction, transformation, and real-time processing using Spark Streaming.

Designed and deployed end-to-end ETL/ELT workflows using Apache Airflow, integrating Snowflake, Snowpark, and AWS services.

Authored Python parsers to extract insights from unstructured data and automated content updates to databases.

Created and maintained complex Snowflake SQL queries, defined roles, and optimized warehouse configurations for cost-efficient performance.

Engineered Hadoop-based data workflows using HDFS, Sqoop, Hive, MapReduce, and Spark; supported Teradata-to-Hive incremental imports using Sqoop.

Developed Java and Talend ETL jobs for data ingestion into Hadoop and Redshift, leveraging Talend Big Data and cloud components.

Implemented disaster recovery plans and security controls for Snowflake, ensuring business continuity and compliance.

Built CI/CD pipelines with Jenkins and Docker, and deployed containers to Kubernetes for scalable runtime environments.

Migrated legacy on-premises applications to AWS, using EC2, S3, and CloudWatch for monitoring, logging, and alerting.

Utilized AWS Athena, Redshift Spectrum, and S3 to enable serverless querying and virtual data lake architecture without traditional ETL.

Collaborated using GitHub for version control and managed issues and change tickets through JIRA and ServiceNow.

Contact this candidate