SAI RAMA KRISHNA GANGAVARAPU
********************@*****.*** 925-***-**** linkedin.com/in/krishnagangavarapu Pleasanton, USA Education
California State University East Bay
Master of Science, Computer Science
08/2022 – 06/2024 United States
R.V.R.&.J.C College of Engineering
Bachelor of Technology, Computer Science
06/2018 – 06/2022 India
Professional Experience
Associate Software Engineer
Pike Solutions
08/2024 – Present United States
•Developed and automated ETL pipelines using Azure Data Factory and Apache Spark in Databricks to ingest and process large datasets from multiple sources like Azure Data Lake Storage (ADLS) and Oracle databases.
•Engineered scalable data transformations using Apache Spark within Databricks, reducing processing time by 30% and enabling real- time analytics.
•Created reusable frameworks in Databricks to streamline data processing workflows, improving performance for large-scale data products and reports.
•Optimized SQL queries and performed advanced data transformations in Databricks SQL to prepare clean, high-quality datasets for analysis and reporting.
•Improved Delta Lake performance by implementing Z-Ordering and Liquid Clustering, enhancing query performance for both data engineering and analytical workloads.
•Performed exploratory data analysis (EDA) using Databricks Notebooks, applying Spark-based data processing to uncover insights, trends, and anomalies.
•Developed interactive data visualizations in Databricks Notebooks using Python libraries (e.g., Matplotlib, Seaborn) to present analytical findings.
•Collaborated with business stakeholders to define analytical requirements, creating custom reports and dashboards in Databricks. Software Intern
Cognizant Technology Solutions
01/2022 – 06/2022 India
•Developed SQL queries to extract and analyze data from relational databases for business insights.
•Created basic web interfaces using HTML, CSS, and JavaScript for internal tools.
•Collaborated with analytics team to build data visualizations and summary reports.
•Assisted in building proof-of-concept dashboards using Power BI for data reporting.
•Solved algorithmic challenges and optimized Python code to improve performance and efficiency. Skills
Core Skills: ETL / ELT Pipeline Development, Data Wrangling & Transformation, Data Modeling, Data Lakes & Data Warehousing, Cloud Computing, RESTful API Development, Microservices Architecture, Dashboard Design & KPI Monitoring, Data Analysis & Reporting, Problem Solving & Debugging, Agile Methodology and CI/CD pipelines Programming languages: C, Python, Java, SQL, JavaScript, HTML, CSS Tools and Technologies: Big Data Frameworks(Hadoop, Hive, Apache Spark), Cloud Platforms Azure(ADLS, Azure HDInsight, Azure Synapse, Azure Data Factory, Azure Cosmos DB), Data Engineering Tools(Databricks, Apache Airflow, Apache Kafka), Development(Python(Flask, Django, REST APIs, JWT), React, PostgreSQL, MongoDB), Data Analysis(Python(Numpy, Pandas, Matlplotlib, Seaborn)), Business Intelligence(PowerBI(Power BI Desktop, Power BI Service, Advanced DAX)), DevOps & Utilities(Git, Docker, Linux,Redis, Jupyter Notebooks, GitHub Actions, Jenkins). Projects
NYC Taxi Trips Data Pipeline - Delta Lake & Databricks
•Built an end-to-end ETL pipeline using Apache Spark and Delta Lake on Databricks to process large-scale NYC Taxi data following the Medallion Architecture (Bronze, Silver, Gold).
•Ingested raw CSV data into the Bronze layer with enforced schema validation and enabled Delta Lake time travel for data versioning and rollback.
•Developed PySpark and SQL logic in the Silver layer for data cleansing, anomaly filtering, and accurate trip duration calculations.
•Applied Delta Lake performance optimizations including OPTIMIZE, ZORDER, and VACUUM to reduce query latency and optimize storage.
•Used Azure Data Lake Storage (ADLS) for scalable, secure data storage, orchestrated with fault-tolerant Databricks Jobs.
•Implemented dynamic partitioning strategies based on pickup/drop-off dates to enhance query performance and minimize unnecessary data scans.
•Created interactive dashboards using Databricks SQL to visualize trip volume trends, peak hours, and high-demand zones.
•Exposed Gold layer outputs via Unity Catalog-registered Delta tables, enabling analytics and BI teams to access trusted, curated data easily.
Employee Performance Analytics Dashboard
•Ingested and automated data flow from various sources (HR systems, sales databases) using Azure Data Factory (ADF) to ingest raw performance data into Azure Blob Storage and Azure SQL Database.
•Cleaned and transformed data by applying SQL queries in Azure SQL Database to handle missing values, remove duplicates, and standardize performance metrics.
•Aggregated employee performance metrics (sales, tasks completed, customer satisfaction) using SQL functions like SUM and AVG to create summary tables for reporting.
•Built an interactive Power BI dashboard by connecting Power BI to Azure SQL Database, including visualizations such as KPIs, bar charts, and time-trend analysis to track individual and team performance.
•Implemented filters and drill-through features in Power BI for detailed analysis by department, employee, or time period, enabling managers to gain actionable insights.
•Automated data refresh and scheduling updates in both Azure Data Factory and Power BI to ensure real-time reporting of the latest employee performance data.
•Published and shared Power BI reports with stakeholders through Power BI Service, ensuring managers and HR teams had access to up-to-date performance insights.
Superstore Sales Analysis Dashboard – Power BI
•Built an interactive Power BI dashboard to analyze Superstore sales data across regions, categories, and time periods, using DAX and Power Query.
•Designed visualizations for KPIs, sales trends, and product performance, enabling drill-down analysis and dynamic filtering.
•Delivered insights on top-performing products, regional profit margins, and customer segments to support data-driven decisions.
•Optimized data model by creating calculated columns and measures in DAX, reducing report load time and improving responsiveness.
•Published and scheduled dashboard refreshes using Power BI Service, enabling stakeholders to access up-to-date insights via web and mobile.
E-Commerce Website
•Developed a dynamic e-commerce website using Python (Django) for backend logic and HTML, CSS, JavaScript for the frontend interface.
•Built and integrated key modules including user authentication, product catalog, shopping cart, and payment gateway.
•Implemented secure and scalable backend functionalities such as session management, database operations, and RESTful APIs for seamless client-server communication.
•Added product filtering, sorting, and search functionality using query parameters and server-side logic for personalized user experiences.
•Implemented user registration with email verification and password reset via tokenized links for secure account management.
•Optimized database queries using indexing and ORM best practices to ensure smooth performance under load.
•Implemented cart persistence and order history using cookies and server-side tracking for returning users.