Sai Priyanka Tupakula
+1-972-***-**** ******.****@*****.*** http://www.linkedin.com/in/sai-priya-t SUMMARY
Data Engineer with extensive experience in designing scalable, cloud-native platforms using Azure Data Factory, Databricks, and Python ETL pipelines. Proven expertise in developing efficient data workflows and implementing robust data governance frameworks to ensure data quality and compliance. Committed to driving strategic business decisions through optimized data integration and innovative solutions.
EDUCATION
Trine University Present
Masters, Computer Science
• GPA: 3.8
• Achievements: CGPA - 3.8
WORK EXPERIENCE
Ford Motor Company Oct 2024 - Jun 2025
Senior Data Engineer United States
• Developed and optimized data processing scripts using Python, SQL, T-SQL, and Bash, leveraging tools like SSMS and Jupyter Notebooks for in-depth analysis.
• Enhanced data processing speed by constructing scalable data pipelines with Snowflake and Azure Databricks, applying Medallion Architecture and Delta Lake for performance tuning.
• Designed and managed robust ETL/ELT pipelines and workflows with Azure tools, incorporating Azure Data Factory when applicable, to improve data integration efficiency.
• Implemented Unity Catalog across Databricks workspaces to centralize governance, apply fine-grained access controls, and maintain data lineage, thereby enhancing data security and compliance.
• Integrated MLflow into Databricks pipelines to facilitate experiment tracking, model versioning, and lifecycle management, ensuring reproducibility and auditability.
• Deployed data solutions across Azure services including Data Lake, Blob Storage, SQL DB, Synapse Analytics, and ADLS Gen2; integrated OneLake and Snowflake for scalable warehousing and managed cloud-based data interchange via AWS S3 (basic).
• Designed and built dimensional models using Star and Snowflake schemas; created Fact and Dimension tables, implemented Slowly Changing Dimensions (SCD), and applied normalization/denormalization techniques.
• Orchestrated scalable big data transformations using Spark SQL and Hadoop on Azure Databricks, while monitoring data flow through Unity Catalog lineage tracking.
• Developed interactive dashboards and reports with Power BI (DAX, Paginated Reports), Tableau, and SSRS; performed analytics using Excel and Google Sheets with PivotTables and Power Query.
• Promoted data trust with Microsoft Fabric tools, implementing governance policies through RBAC, Azure Active Directory, and dynamic access controls via Unity Catalog.
• Secured data workflows with OAuth and Azure Key Vault, managing secrets and credentials in adherence to enterprise compliance standards.
• Collaborated using Git, GitHub, GitLab, TFS, and Azure DevOps while implementing CI/CD pipelines to ensure continuous integration and deployment of data products.
Wells Fargo Jan 2023 - Sep 2024
Data Engineer United States
• Designed and maintained data pipelines using Azure Databricks and PySpark, adhering to best practices in Delta Lake architecture to ensure scalable, reliable, and ACID-compliant data processing.
• Imported data from Oracle and MySQL into HDFS using Sqoop and defined external Hive tables, facilitating distributed querying and transformation.
• Developed Spark SQL and HiveQL scripts to validate data, manage partitions, and prepare datasets for downstream analytics and reporting.
• Built and monitored Flume-based ingestion pipelines to stream server logs into HDFS, enabling centralized log analysis and operational insights.
• Automated job execution on AWS EMR using Bash and monitored performance with Spark UI and CloudWatch, ensuring SLA adherence and operational efficiency.
• Enhanced AWS data pipelines by writing Python scripts for AWS Glue, leading to faster data loading into Redshift.
• Applied data quality checks and anomaly detection using PySpark and SQL, bolstering data reliability across reporting layers.
• Leveraged Git for version control, engaged in code reviews, and collaborated on Agile projects using Jira and Confluence.
• Enhanced IAM role configuration and RBAC policies to improve security compliance across AWS services and reduce unauthorized access incidents.
• Optimized data pipeline efficiency by employing dbt for modular SQL development and Apache Airflow for streamlined workflow automation.
ADP Jul 2020 - Dec 2021
Data Engineer India
• Developed and managed end-to-end ETL pipelines using Azure Data Factory (ADF) to integrate structured and unstructured data from Azure Data Lake, Blob Storage, and SQL Database, reducing manual data handling by 40%
• Cleansed and transformed large-scale datasets using PySpark in Azure Databricks, processing over 50 million rows monthly from HR and sales systems through distributed, fault-tolerant workflows
• Created and optimized complex SQL and T-SQL queries to improve ETL performance, reducing pipeline runtimes by 30% and enhancing data freshness for downstream analytics
• Built enterprise-grade dimensional models in Azure Synapse Analytics using star and snowflake schemas, supporting 100+ business reports with consistent and governed data foundations
• Enhanced metadata governance and data lineage using Microsoft Fabric tools and Azure Data Catalog, increasing traceability across data domains by 60%
• Authored ad hoc SQL queries and Databricks notebooks to support real-time decision-making for workforce and compensation analytics
• Performed data validation and anomaly detection in staging zones, reducing critical data defects in production by 25%
• Utilized Azure Data Studio, SSMS, and Jupyter Notebooks for iterative development, debugging, and cross-environment testing
• Applied version control using Git and collaborated in Agile teams using Jira and Confluence, accelerating analytics feature delivery by 20% sprint-over-sprint
• Developed Hadoop-based ETL jobs for batch processing of log and telemetry data, leveraging HDFS and MapReduce for scalable ingestion
• Wrote Scala scripts (basic) for Spark-based data transformations in distributed environments, which improved performance and optimized job execution times
• Integrated Snowflake for cross-platform analytics, improving data access and analysis speed with AWS S3 ingestion and Snowflake SQL transformations.
Cognizant India Jan 2019 - Jun 2020
Data Engineer India
• Assisted in developing ETL pipelines using Azure Data Factory (ADF) to integrate structured and semi-structured data from Azure Data Lake, Blob Storage, and SQL Database, reducing manual data handling.
• Supported data transformation tasks using PySpark in Azure Databricks, working with moderately large datasets from HR and sales systems.
• Wrote and optimized SQL/T-SQL queries for data validation and transformation, contributing to improved pipeline performance and data accuracy.
• Gained hands-on experience with dimensional modeling in Azure Synapse Analytics, supporting business reporting through basic star schema designs.
• Helped implement RBAC and AAD-based access controls to secure data access across cloud storage and databases.
• Contributed to metadata management and data lineage tracking using Microsoft Fabric tools, improving visibility into data flows.
• Authored ad hoc SQL queries and Databricks notebooks to support reporting and operational analytics.
• Participated in data validation and anomaly detection in staging environments, helping reduce data quality issues before production deployment.
• Used tools like Azure Data Studio, SSMS, and Jupyter Notebooks for development, testing, and debugging.
• Practiced version control using Git, and collaborated in Agile teams using Jira and Confluence to manage tasks and documentation. SKILLS
• Languages & Tools: SQL, T-SQL, Python, DAX, PySpark, Scala (basic), Bash, Hadoop, Snowflake, Databricks, AWS (basic), SSMS, Azure Data Studio, Jupyter Notebooks
• ETL & Orchestration: Azure Data Factory, Microsoft Fabric Dataflows, SSIS, Databricks, Apache Airflow (basic), Logic Apps, Databricks Workflows, Hadoop-based ETL pipelines
• Cloud & Storage: Azure (Data Lake, Blob Storage, SQL DB, Synapse Analytics, ADLS Gen2), Snowflake, AWS S3 (basic), OneLake
• Data Warehousing & Modeling: Star Schema, Snowflake Schema, Dimensional Modeling, Data Vault (intro), Normalization/De- normalization, Slowly Changing Dimensions (SCDs), Fact/Dimension table design
• BI & Reporting: Power BI (Modeling, DAX, Paginated Reports), Tableau, SSRS, Excel (PivotTables, Power Query), Google Sheets
• Data Quality & Governance: Data validation, anomaly detection, data lineage (Fabric tools), RBAC, AAD Integration, Metadata management
• Security & Access Management: Azure Active Directory, Role-Based Access Control (RBAC), OAuth, Key Vault Integration
• Big Data & Processing: Azure Databricks, Delta Lake, Microsoft Fabric Lakehouse, Spark SQL, Parallel data processing, Hadoop, Scala-based distributed processing
• Version Control & Workflow: Git, GitHub, TFS, Azure DevOps, GitLab, CI/CD (basic), Branching strategies
• Development & Project Methodologies: Agile/Scrum, DevOps Practices, Jira, Confluence, SDLC CERTIFICATION
• Databricks Certified Data Engineer Associate
PUBLICATIONS AND RESEARCH PAPERS
• OPTIMIZING DEEP LEARNING INFERENCE VIA LOW-LEVEL GPU PROFILING AND KERNEL DEBUGGING.This internal research report explores advanced techniques for enhancing deep learning inference by applying fine-grained GPU profiling and CUDA kernel-level debugging. The study highlights a systematic approach to identifying and mitigating performance bottlenecks in GPU-intensive workloads. Performed detailed performance analysis using nvprof, Nsight, and NCCL, leading to the discovery of critical GPU-level inefficiencies. Implemented iterative optimizations in CUDA kernel code, resulting in substantial latency reductions across inference tasks. Delivered actionable insights and strategic guidelines for integrating low-level GPU enhancements into scalable machine learning production pipelines.