Data Engineering Leader with Cloud Expertise

Location:

Chino Hills, CA

Posted:

November 21, 2025

Contact this candidate

Resume:

SAI KRISHNA

+1-925-***-**** ******@*****.*** https://www.linkedin.com/in/sai-krishna-reddy-075467180/

PROFESSIONAL SUMMARY

Software development Professional with over 6 years of experience designing, architecting, and delivering scalable, high-performance data solutions across cloud and distributed environments. Proficient in leveraging Microsoft Azure, AWS, and Google Cloud (GCP) to build end-to-end ETL pipelines, data lakes, and real-time analytics frameworks that drive business intelligence and operational efficiency Skilled in Hadoop, Apache Spark Hive, Snowflake, and Databricks for large-scale data processing and analytics Experienced in implementing Azure Synapse, ADF, Data Lake, Cosmos DB, BigQuery, Dataflow, and Pub/Sub for both batch and streaming data workflows Strong expertise in SQL Server, Oracle, MySQL, Teradata, and MongoDB, with a focus on query optimization, performance tuning, and data integrity. Adept in developing data warehousing architectures using OLAP and OLTP models to support enterprise analytics Proficient in Python, Scala, Java, and SQL, with hands-on experience automating workflows using Airflow, NiFi Experienced in implementing CI/CD pipelines using Jenkins, Bitbucket, GitHub, and Azure DevOps, ensuring efficient deployment and version control Demonstrated ability to deliver interactive dashboards and insights using Power BI, Tableau, and Looker, enabling data-driven decisions Recognized for strong technical leadership, problem-solving, and collaboration, consistently delivering secure, optimized, and business-focused data solutions that enhance organizational performance.

TECHNICAL SKILLS

Programming Languages Scala, Python, Java, Unix shell scripting

Big Data Technologies Batch processing, ETL/ELT, Hadoop, Spark, PySpark, SQL, HDFS,

Hive, Sqoop, Impala, Kafka, Oozie, Cosmos, Druid, Automic, Airflow, YARN

Cloud Technologies GCP, Azure Databricks, AWS, BigQuery, Athena, Azure Data Factory,

S3, ADLs, Dataproc, Terraform, Azure SQL, Azure Synapse Analytics, IAM, EMR, Snowflake, Dataflow, Pub/Sub

Relational Databases Oracle, MySQL, PostgreSQL, DB2, Delta tables, DBT

Operating System Windows, Linux, Unix, CentOS

Hadoop Distributions Cloudera, Hortonworks, Azure Databricks, Dataproc

Others Git, Jenkins, Splunk, REST API, Presto

ACHIEVEMENTS

Team Member to optimize and reduce cloud costs by 40%, Significantly enhancing project profitability.

Successfully assisted the team in migrating data pipelines from a legacy system to a Hadoop and Spark.

Enhanced reporting capabilities by optimizing data models in BigQuery leading to a 50% reduction in query times.

Achieved a 35% performance improvement in distributed data processing by tuning Spark job configurations and optimizing data partitioning strategies.

EXPERIENCE

Channel Performance – Developing a web-based dashboard application, generating comprehensive reports and offering insights into sales data for customers across Walmart stores.

Technologies: ETL Batch processing, Spark, Hive, Scala, PySpark, GCP, Databricks, BigQuery, Druid, Cosmos, SQL, Dataproc, Automic, Airflow, Python, AWS, Dataflow, Pub/Sub

Duties and Operational Responsibilities

Collaborated with product managers and data stewards to understand business requirements and design scalable data pipelines aligned with project goals.

Designed and developed ETL and ELT pipelines using Python, PySpark, and SQL, ensuring seamless data accessibility and transformation across multiple platforms.

Integrated and processed data from Hive, Snowflake, BigQuery, Cosmos DB, and PostgreSQL, creating reusable, enterprise-level data lake and analytics pipelines.

Built PySpark jobs on Dataproc to extract and transform API and BigQuery data with Pandas/PySpark, exporting results to CSV/Excel within minutes.

Implemented Apache Airflow to orchestrate, monitor, and automate complex ETL workflows for reliability and efficiency.

Developed utility tools in Python and JavaScript using REST Assured to validate and compare data with APIs.

Created and executed data and functional testing scenarios for Sales and Inventory metrics, ensuring data quality and accuracy.

Automated and reviewed end-to-end (E2E) flows for Report Builder and Insights applications covering 750+ functional cases.

Worked closely with API and UI teams to address data-related issues and improve integration accuracy through scripting and automation.

Responsible for biweekly production releases, ensuring smooth deployment and minimal downtime.

Collaborated with developers to increase code coverage from 75% to 85%, improving code reliability and maintainability.

Followed agile methodologies for sprint planning, project management, and timely delivery.

Delivered interactive dashboards and insights using Power BI, Tableau, and Looker, supporting data-driven business decisions.

Athena – Participated in designing, developing, and supporting scalable solutions and data engineering pipelines to ingest data from various sources globally and integrate them to perform advanced data analytics and analysis.

Technologies: PySpark, Hive, Azure, ADLs, ADF, SQL, ETL Batch processing, Databricks, CA scheduler

Duties and Operational Responsibilities

Team of data engineers in ingesting raw layer data into a structured Hive data layer for advanced data analytics.

Collaborated with the platform team to transition the on-premises data platform to Azure Databricks while maintaining business as usual for data pipelines.

Designed and developed internal tools using Angular and Node.js for data validation and monitoring

Worked with Delta tables to perform incremental loads and ACID properties

Developed and implemented comprehensive data governance policies using Azure Purview, which led to a significant reduction in security incidents by 20% within the first year

Automated CI/CD pipelines with Cloud Build and Source Repositories, improving build, test, and deployment cycles for GCP-based applications

Identified areas for performance improvement and implemented optimization techniques to efficient processing

Arena – The project focuses on developing data products for individual retirement accounts, with the goal of integrating them into Tableau dashboards for enhanced visualization and analysis.

Technologies: ETL Batch processing, Spark, Hive, SQL, Databricks, Scala, Azure, ADF, ADLs, Scala, Shell Scripting, Oozie, Kafka, Snowflake

Duties and Operational Responsibilities

Developed ETL frameworks using Spark and Python to process financial data for analytics dashboards

Created and maintained Hive tables and integrated with React.js and Angular dashboards for visualization.

Built Node.js RESTful APIs for data retrieval, transformation, and integration with front-end interfaces.

Implemented Kafka streaming for real-time updates to dashboards and reports

Involved in solution-driven agile development methodology and actively participated in daily scrum meetings

Data Solutions - Worked on the External Data Hub (XDH) project, a centralized data platform designed to integrate and process large-scale external data sources. Ensured data quality, and contributed to system scalability, reliability, and continuous delivery.

Technologies: ETL Batch processing, Spark, Hive, Sqoop, AWS S3 Scala, shell scripting, Atomic Scheduler.

Duties and Operational Responsibilities

Transforming the Data using Scala Spark as per the Business requirements.

Configured various APIs for data ingestion into HDFS & AWS S3 from different applications.

Configuring Java Web Services for data migration into AWS S3 using PCF.

Creating Hive Schemas on the data, converting data into Avro and providing data to Downstream Teams.

Developed PySpark Applications for Data Modeling, Transformation and Loading in Hive.

Automated Bulk Hive Table creations using Python

Developed PySpark applications using Boto3 and Pandas for Data migration to AWS S3 & EC2

Configuring Jenkins Pipeline Integration with GitHub for auto builds into Art factory.

EDUCATION

Masters in Computer Science June 2017 - Aug 2018

Wilmington University Delaware

Bachelor of Technology (B.Tech) June 2011 - Jun 2015

JNTU, Hyderabad, Telangana, India

Bachelors of Technology in Computer Science and Engineering

Data Engineer WALMART Dallas, Texas Oct 2023 - Present

Domain: Retail E-com

Data Engineer COMCAST Philadelphia, PA July 2022 – Oct 2023

Domain: Telecommunication

Software Developer AMEX Phoenix, AZ May 2021 – Jun 2022

Domain: Financial

Big Data Developer ALL STATE Northbrook, Chicago March 2019 – April 2021

Domain: Insurance

Contact this candidate