SAI KRISHNA
+1-925-***-**** ******@*****.*** https://www.linkedin.com/in/sai-krishna-reddy-075467180/
PROFESSIONAL SUMMARY
Software development Professional with over 6 years of experience designing, architecting, and delivering scalable, high-performance data solutions across cloud and distributed environments. Proficient in leveraging Microsoft Azure, AWS, and Google Cloud (GCP) to build end-to-end ETL pipelines, data lakes, and real-time analytics frameworks that drive business intelligence and operational efficiency Skilled in Hadoop, Apache Spark Hive, Snowflake, and Databricks for large-scale data processing and analytics Experienced in implementing Azure Synapse, ADF, Data Lake, Cosmos DB, BigQuery, Dataflow, and Pub/Sub for both batch and streaming data workflows Strong expertise in SQL Server, Oracle, MySQL, Teradata, and MongoDB, with a focus on query optimization, performance tuning, and data integrity. Adept in developing data warehousing architectures using OLAP and OLTP models to support enterprise analytics Proficient in Python, Scala, Java, and SQL, with hands-on experience automating workflows using Airflow, NiFi Experienced in implementing CI/CD pipelines using Jenkins, Bitbucket, GitHub, and Azure DevOps, ensuring efficient deployment and version control Demonstrated ability to deliver interactive dashboards and insights using Power BI, Tableau, and Looker, enabling data-driven decisions Recognized for strong technical leadership, problem-solving, and collaboration, consistently delivering secure, optimized, and business-focused data solutions that enhance organizational performance.
TECHNICAL SKILLS
Programming Languages Scala, Python, Java, Unix shell scripting
Big Data Technologies Batch processing, ETL/ELT, Hadoop, Spark, PySpark, SQL, HDFS,
Hive, Sqoop, Impala, Kafka, Oozie, Cosmos, Druid, Automic, Airflow, YARN
Cloud Technologies GCP, Azure Databricks, AWS, BigQuery, Athena, Azure Data Factory,
S3, ADLs, Dataproc, Terraform, Azure SQL, Azure Synapse Analytics, IAM, EMR, Snowflake, Dataflow, Pub/Sub
Relational Databases Oracle, MySQL, PostgreSQL, DB2, Delta tables, DBT
Operating System Windows, Linux, Unix, CentOS
Hadoop Distributions Cloudera, Hortonworks, Azure Databricks, Dataproc
Others Git, Jenkins, Splunk, REST API, Presto
ACHIEVEMENTS
Team Member to optimize and reduce cloud costs by 40%, Significantly enhancing project profitability.
Successfully assisted the team in migrating data pipelines from a legacy system to a Hadoop and Spark.
Enhanced reporting capabilities by optimizing data models in BigQuery leading to a 50% reduction in query times.
Achieved a 35% performance improvement in distributed data processing by tuning Spark job configurations and optimizing data partitioning strategies.
EXPERIENCE
Channel Performance – Developing a web-based dashboard application, generating comprehensive reports and offering insights into sales data for customers across Walmart stores.
Technologies: ETL Batch processing, Spark, Hive, Scala, PySpark, GCP, Databricks, BigQuery, Druid, Cosmos, SQL, Dataproc, Automic, Airflow, Python, AWS, Dataflow, Pub/Sub
Duties and Operational Responsibilities
Collaborated with product managers and data stewards to understand business requirements and design scalable data pipelines aligned with project goals.
Designed and developed ETL and ELT pipelines using Python, PySpark, and SQL, ensuring seamless data accessibility and transformation across multiple platforms.
Integrated and processed data from Hive, Snowflake, BigQuery, Cosmos DB, and PostgreSQL, creating reusable, enterprise-level data lake and analytics pipelines.
Built PySpark jobs on Dataproc to extract and transform API and BigQuery data with Pandas/PySpark, exporting results to CSV/Excel within minutes.
Implemented Apache Airflow to orchestrate, monitor, and automate complex ETL workflows for reliability and efficiency.
Developed utility tools in Python and JavaScript using REST Assured to validate and compare data with APIs.
Created and executed data and functional testing scenarios for Sales and Inventory metrics, ensuring data quality and accuracy.
Automated and reviewed end-to-end (E2E) flows for Report Builder and Insights applications covering 750+ functional cases.
Worked closely with API and UI teams to address data-related issues and improve integration accuracy through scripting and automation.
Responsible for biweekly production releases, ensuring smooth deployment and minimal downtime.
Collaborated with developers to increase code coverage from 75% to 85%, improving code reliability and maintainability.
Followed agile methodologies for sprint planning, project management, and timely delivery.
Delivered interactive dashboards and insights using Power BI, Tableau, and Looker, supporting data-driven business decisions.
Athena – Participated in designing, developing, and supporting scalable solutions and data engineering pipelines to ingest data from various sources globally and integrate them to perform advanced data analytics and analysis.
Technologies: PySpark, Hive, Azure, ADLs, ADF, SQL, ETL Batch processing, Databricks, CA scheduler
Duties and Operational Responsibilities
Team of data engineers in ingesting raw layer data into a structured Hive data layer for advanced data analytics.
Collaborated with the platform team to transition the on-premises data platform to Azure Databricks while maintaining business as usual for data pipelines.
Designed and developed internal tools using Angular and Node.js for data validation and monitoring
Worked with Delta tables to perform incremental loads and ACID properties
Developed and implemented comprehensive data governance policies using Azure Purview, which led to a significant reduction in security incidents by 20% within the first year
Automated CI/CD pipelines with Cloud Build and Source Repositories, improving build, test, and deployment cycles for GCP-based applications
Identified areas for performance improvement and implemented optimization techniques to efficient processing
Arena – The project focuses on developing data products for individual retirement accounts, with the goal of integrating them into Tableau dashboards for enhanced visualization and analysis.
Technologies: ETL Batch processing, Spark, Hive, SQL, Databricks, Scala, Azure, ADF, ADLs, Scala, Shell Scripting, Oozie, Kafka, Snowflake
Duties and Operational Responsibilities
Developed ETL frameworks using Spark and Python to process financial data for analytics dashboards
Created and maintained Hive tables and integrated with React.js and Angular dashboards for visualization.
Built Node.js RESTful APIs for data retrieval, transformation, and integration with front-end interfaces.
Implemented Kafka streaming for real-time updates to dashboards and reports
Involved in solution-driven agile development methodology and actively participated in daily scrum meetings
Data Solutions - Worked on the External Data Hub (XDH) project, a centralized data platform designed to integrate and process large-scale external data sources. Ensured data quality, and contributed to system scalability, reliability, and continuous delivery.
Technologies: ETL Batch processing, Spark, Hive, Sqoop, AWS S3 Scala, shell scripting, Atomic Scheduler.
Duties and Operational Responsibilities
Transforming the Data using Scala Spark as per the Business requirements.
Configured various APIs for data ingestion into HDFS & AWS S3 from different applications.
Configuring Java Web Services for data migration into AWS S3 using PCF.
Creating Hive Schemas on the data, converting data into Avro and providing data to Downstream Teams.
Developed PySpark Applications for Data Modeling, Transformation and Loading in Hive.
Automated Bulk Hive Table creations using Python
Developed PySpark applications using Boto3 and Pandas for Data migration to AWS S3 & EC2
Configuring Jenkins Pipeline Integration with GitHub for auto builds into Art factory.
EDUCATION
Masters in Computer Science June 2017 - Aug 2018
Wilmington University Delaware
Bachelor of Technology (B.Tech) June 2011 - Jun 2015
JNTU, Hyderabad, Telangana, India
Bachelors of Technology in Computer Science and Engineering
Data Engineer WALMART Dallas, Texas Oct 2023 - Present
Domain: Retail E-com
Data Engineer COMCAST Philadelphia, PA July 2022 – Oct 2023
Domain: Telecommunication
Software Developer AMEX Phoenix, AZ May 2021 – Jun 2022
Domain: Financial
Big Data Developer ALL STATE Northbrook, Chicago March 2019 – April 2021
Domain: Insurance