Data Engineer Senior

Location:

Alpharetta, GA

Posted:

September 19, 2025

Contact this candidate

Resume:

Anudeep Reddy Yendrapalli

Senior Data Engineer

: *******.***********@*****.***

: +1-470-***-****

: linkedin.com/in/anudeepreddy

SUMMARY OF EXPERIENCE:

Highly skilled Senior Data Engineer with over 11+ years of comprehensive experience with special emphasis in Architecture, Design, Development, and optimizing data pipelines, ETL workflows, and data lake/warehouse solutions across on-premises and cloud platforms (AWS, GCP, Azure). Proven expertise in building scalable and robust Bigdata solutions with Databricks and Snowflake

Proficient in AWS and GCP services including S3, EC2, EMR, Glue, Redshift, Athena, GCP cloud storage, Data proc, Data Flow, GCP pub/sub, Big Query to build scalable and cost-effective data engineering solutions.

Experience with Snowflake Data warehouse with AWS cloud, deep understanding of Snowflake architecture and processing experience with performance tuning of Snowflake data warehouse with Query Profiler, Caching and Virtual data warehouse scaling.

Proficient in advanced Snowflake features like Snow pipe for real-time data ingestion, Time Travel, Zero- Copy Cloning, and object cloning for efficient data versioning and recovery. skilled in configuring data retention, data masking, and security policies to support compliance and governance.

Proficient in performance tuning of Spark jobs using techniques like broadcast joins, cache, and partitioning to minimize shuffles and improve efficiency.

Developed and managed Hive Meta store schemas and partitioned tables for efficient querying and data organization.

Excellent understanding of Apache Spark & Hadoop YARN architecture and ecosystem.

Designed and developed ADF pipelines to orchestrate data ingestion, transformation, and movement over ADLS for HDInsight and Synapse Analytics

Integrated ADF with Azure Databricks, Azure Data Lake, Synapse Analytics, Snowflake, Blob Storage, and SQL Databases for seamless data flow. Enabled incremental loads and delta processing using watermark and control tables to optimize performance.

Developed and managed notebooks, orchestrated workflows using Databricks Jobs

Created Airflow workflows and coordinators to automate data pipelines daily, weekly, and monthly.

I am proficient in programming and scripting languages such as Java, Python, Scala, Shell script and SQL for data manipulation and transformation

Strong knowledge of Lakehouse architecture, including structured streaming, data versioning, and ACID transactions with Delta tables.

In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler.

Having Experience in real time processing using Spark Streaming and Kafka

Extensive experience in query optimization, performance tuning, and advanced SQL scripting including Unix/Shell scripting, T-SQL, Stored Procedures, Triggers, Views, and User-Defined Functions.

Hands on experience in working with Multiple file formats like ORC, Json, Avro and parquet.

Good experience in using scheduling tools Oozie and Airflow

Competitive enough to understand the requirement and convert them into meaningful result and desired outputs.

Excellent communication, interpersonal, analytical skills, and strong ability to perform as part of team

Capable to delve into the new leading technologies

Ability to work well in both a team environment and an individual environment

Willingness and ability to easily adapt/learn newer Technology or Software

TECHNICAL SKILLS:

Cloud Platforms

AWS (S3, CloudWatch, Step Function, Lambda, EMR, IAM, Glue, Athena),

GCP (Cloud Storage, Data Proc, Data Flow, Pub/Sub, Big Query, Composer, Airflow)

Azure (ADF, Databricks, Synapse, cosmos DB, Kusto, ServiceBus, HDInsight)

Cloud

Datawarehouse

Snowflake, Azure Synapse Analytics, Amazon Redshift, GCP Big Query

Bigdata

Hadoop, Hive, Sqoop, HDFS, Spark, Spark Streaming, PySpark, Yarn

Databases

Oracle 10g, MS SQL Server, Cosmos DB, Dynamo DB, HBase, Postgres, MY SQL, Cassandra

Scheduling

Oozie, Airflow, Grafana

Data Formats

Parquet, ORC, Avro, JSON, XML

Data Visualization

Tableau, Power BI

Operating Systems

Windows XP, Windows 7, Windows 10 Enterprise, Windows 11 Pro, UNIX,

LINUX

Tracking Tools

Jira

Versioning Tool

GitHub, Bitbucket

Languages

Java, Python, Scala, SQL, T-SQL, KQL, Shell Script, Groovy

EDUCATION QUALIFICATIONS:

Jawaharlal Nehru Technological University - Anantapur, India Bachelor of Technology (B. Tech), April 2013

CERTIFICATIONS:

Google Certified: Professional Data Engineer

Microsoft Certified Fabric Data Engineer Associate

Snowflake Snowpro Certification

PROFESSIONAL EXPERIENCE:

Duration

Company

Role

June 2020 – Mar 2023

DBS Bank

Senior Data Engineer

May 2018 – June 2020

May 2017 – May 2018

Nielsen

TCS

Data Engineer

Feb 2014 – May 2017

Capgemini

Software Engineer

DBS BANK

Role: Senior Data Engineer

April 2023 – August 2025

Skills:

Cloud Storage (GCS), Dataflow, Dataproc, BigQuery, SSIS, SQL Server, Flink, Hadoop, HDFS, Hive, Spark, Java, Python, SQL, Airflow, Jenkins, Terraform, Bitbucket, Superset, Grafana

Role & Responsibilities:

Designed, Build and orchestrated scalable Pipelines to automate data ingestion, transformation, and loading workflows from diverse sources.

Built dynamic pipelines with parameterized datasets and linked services to enable reusable and modular designs

Optimized cluster configurations and auto scaling features to balance performance and cost.

Developed scalable Spark applications using Java to process batch data.

Tuned Spark jobs use partitioning, broadcast joins, and caching to enhance performance.

Leveraged Spark SQL and Data Frames to perform complex transformations and aggregations.

Led end-to-end migration of big data workloads from Hadoop-based ecosystem to

GCP Cloud, ensuring minimal downtime and preserving data integrity.

Enabled workspace security, data masking, and audit logging to ensure enterprise-grade compliance.

Optimized SSIS packages by using parallel processing, staging tables, and indexing, improving performance by 40%.

Configured SSIS error handling, event logging, and email notifications for job failures.

Filtering the data using Spark SQL complex logic as per the business requirement

Involved in peer reviews in an Agile methodology

Optimization of Spark SQL code.

DBS BANK

Role: Data Engineer

June 2020 - Mar 2022

Skills:

Snowflake, DBT, AWS, S3, GIT Hub, Visual studio, CI/CD, Jira, Share Point, Ervin, Python, Java, PL/SQL, Lambda, RDS, DynamoDB, SageMaker, CloudWatch, Agents, SQL, Airflow

Role & Responsibilities:

Design and manage data workflows leveraging Snowflake with AWS services including S3 for storage, Lambda for serverless processing, RDS and DynamoDB for database interactions and CloudWatch for monitoring and alerting.

Used snowflake Time Travel, Data Retention Settings for Storage for Crucial Tables which helped to analyze the data for testing.

Established DBT process to improve performance, scalability and reliability in cloud AWS.

Migrated legacy transformation code into modular DBT data models.

Written and optimized SQL queries within DBT to enhance data transformation process and improve overall performance.

Implemented Snowflake SQL scripts to achieve business requirements.

Designed complex SQL scripts in Snowflake with Joins, Sub Queries, Co-related subqueries, window functions.

Worked with business in UAT and PROD phases to assist on data deliverables.

Troubleshoot and resolve performance bottlenecks in data pipelines through debugging, profiling and

optimization techniques.

Collaborate with DataOps teams to implement data observability, health checks, and proactive maintenance of mission-critical workflows.

Create and manage monitoring and alerting systems to proactively detect and resolve pipeline or

data anomalies.

Work with Support team to provide the knowledge transfer as part of post-production support.

Provide critical production support including off-shift and weekend availability to resolve high-priority incidents.

Participate in code reviews, documentation, analysis, troubleshooting, Bug fix and knowledge-sharing sessions to support a scalable data engineering environment.

Work closely with business stakeholders to understand and support Customer and Equipment data domains ensuring accurate representation and data integrity across systems.

Collaborate with onshore and offshore teams including data engineers, designers, and support teams.

Investigate and resolve data quality issues, system performance bottlenecks, and infrastructure- level problems.

Used Cloud watch logs to monitor the jobs and analyze the issues for any failures.

Developed the Python scripts to connect with Snowflake and automate the process.

Query Performance Tuning by Clustering Tables and creating views to allow access to required ones.

Nielsen

Role: Data Engineer

May 2018 - June 2020

Skills: S3, EC2, EMR, IAM, CloudWatch, Glue, Athena, Lambda, SQS, SNS, Terraform,

Hadoop, HDFS, Hive, Spark SQL, Java, Scala, Python, Jenkins, GitLab, Oozie, Airflow

Role & Responsibilities:

Design and implement complex Spark jobs by analyzing stored procedures for data transformation and reporting.

Designed an S3-based data lake with raw, curated, and processed zones, enabling scalable storage for structured and semi-structured data.

Developed serverless ETL pipelines using Glue (PySpark) and Lambda for batch and real-time data ingestion.

Built Athena queries and optimized partitions for faster analytics, reducing query costs by 30%.

Deployed Spark jobs on EMR clusters to process 5TB+ daily, improving performance with autoscaling and spot instances.

Automated provisioning of AWS infrastructure using Terraform, enabling reproducible deployments across multiple environments.

Implemented CloudWatch monitoring and SNS alerts for proactive issue detection and resolution.

Ensured data security using IAM roles, policies, and KMS encryption

Monitor and analyze YARN logs for identifying bottlenecks and resource inefficiencies.

DBS BANK

Role: Data Engineer

May 2017 - May 2018

Skills: Hadoop, HDFS, Sqoop, Hive, Spark SQL using Java, SQL, Jenkins, GitLab

Role & Responsibilities:

Design and implement complex Spark jobs by analyzing stored procedures for data transformation and reporting.

Develop and maintain Hive schema, tables, views to ensure efficient data storage and retrieval.

Optimize and tune SQL queries to improve performance and reduce execution time.

Collaborate with business analysts and stakeholders to gather and analyze business requirements for database solutions.

Troubleshoot and resolve production incidents related to Spark job performance, connectivity, and data integrity.

Respond to service interruptions or outages and provide timely recovery actions to restore service.

Regularly review and test disaster recovery plans and procedures.

Perform capacity planning and scaling of YARN clusters based on workload requirements.

Monitor and analyze YARN logs for identifying bottlenecks and resource inefficiencies.

Royal Bank of Canada

Role: Software Engineer

Feb 2014 - May 2017

Skills: Java, Spring Core, Spring MVC, Hibernate, SQL Server

Role & Responsibilities:

Design and implemented the persistence logic using Hibernate, business logic using Spring framework

Analyzed User stories, involved in Story Points to estimate the story complexity along with BA’s and developers

Involved in Coding and Bug Fixing

Followed Agile/Scrum methodology and actively participated in Sprint calls to complete the sprints

Reporting module feature enhancements to the client

Contact this candidate