SRINIDHI GUTTA
Data Engineer
***************@*****.*** +1-972-***-****.
LinkedIn Location: Seattle, WA, 98119.
SUMMARY:
Data Engineer with 5+ years of experience designing and optimizing data pipelines, data models, and cloud data architectures. Specialized in building ETL/ELT workflows, real-time streaming pipelines, and cloud data warehouses on Azure, Microsoft Fabric, Databricks, and Synapse. Skilled at data modeling, data governance, query optimization, and database performance tuning. Proven success in migrating on-premises systems to cloud, enabling analytics, reporting, and machine learning solutions for industries including finance, healthcare, and payroll. Strong expertise in security and compliance (GDPR, HIPAA, SOC 2, PCI DSS) and delivering actionable insights through Power BI and Tableau dashboards. Open to relocation and available to join immediately. Professional Highlights:
• Data Engineer with expertise in ETL/ELT pipeline design, Batch and Real-Time Processing, and Streaming Analytics using Azure Data Factory, Databricks, PySpark, Spark SQL, and Apache Spark. Improved data freshness, reliability, and performance by 40% through large-scale data ingestion and transformation workflows.
• Designed event-driven and streaming architectures using Azure Event Hubs, Stream Analytics, Service Bus, Spark Streaming, and Auto Loader, supporting real-time decision-making for fraud detection, IoT telemetry, and e-commerce systems. Built resilient frameworks capable of processing millions of events per second.
• Developed and optimized Data Warehousing and Lakehouse solutions with Azure Synapse, Microsoft Fabric, ADLS, Delta Lake, and Unity Catalog, applying SQL optimization, partitioning, bucketing, and AQE to improve query performance by 50% and reduce compute costs.
• Architected Data Lakehouse and Medallion Architectures, integrating structured and unstructured data into unified layers. Designed Data Vault and Dimensional Models (Star, Snowflake) for BI workloads, enhancing analytical flexibility in Power BI and Azure Analysis Services.
• Implemented Metadata Management and Data Governance using Purview, Data Catalog, Collibra, and Alation, ensuring data lineage, schema evolution, and versioning for transparency and regulatory readiness.
• Built Data Quality and Validation pipelines with automated checks, profiling, and anomaly detection, improving accuracy, completeness, and trust in enterprise datasets.
• Enforced Data Security and Compliance with AAD, RBAC, Key Vault, IAM, and Azure Policy, ensuring adherence to GDPR, HIPAA, SOC 2, and PCI DSS through encryption and key rotation strategies.
• Automated infrastructure deployment and orchestration using Azure DevOps, Jenkins, Git, Terraform, Docker, and Kubernetes (AKS/EKS). Built CI/CD pipelines that reduced provisioning time by 50% and improved release consistency.
• Delivered end-to-end data solutions across Azure and AWS (S3, Redshift, Glue, Lambda, EMR, Kinesis, Athena, Step Functions) for data migration, scalability, and cost optimization, designing hybrid multi-cloud architectures.
• Collaborated with data scientists and analysts to operationalize ML workflows using MLflow, Databricks SQL, Azure ML, TensorFlow, Scikit-Learn, and PyTorch, integrating predictive models into data pipelines.
• Developed and managed secure REST APIs and data integration services supporting JSON, Parquet, ORC, and Avro formats, improving cross-platform data access and system interoperability.
• Implemented monitoring and observability frameworks with Azure Monitor, Log Analytics, Application Insights, AWS CloudWatch, Prometheus, and Grafana, enhancing performance visibility and anomaly detection.
• Optimized queries and performance using Spark SQL, Databricks SQL, and Synapse, applying indexing, caching, and partitioning to reduce execution times and enhance scalability.
• Experienced in Agile/Scrum, sprint planning, and cross-functional collaboration, translating business needs into secure, scalable, and high-performance data architectures with cost efficiency in mind.
• Strong foundation in data strategy, scalability engineering, and system optimization, with proven ability to build end-to-end cloud data platforms enabling actionable insights and AI-driven decision-making. CERTIFICATIONS:
• Microsoft Certified: Fabric Data Engineer Associate – Microsoft, 2025
• Python for Data Science, AI & Development
RELEVANT WORK EXPERIENCE:
Microsoft Jan 2024 - Present
Role: Data Engineer
• Built end-to-end ETL/ELT pipelines in Azure Data Factory and Databricks (PySpark, Spark SQL), ingesting structured, semi-structured, and unstructured data into a Fabric Lakehouse (Medallion Architecture – Bronze, Silver, Gold).
• Migrated on-prem SQL Server and PostgreSQL datasets to Azure Synapse and Fabric Lakehouse using Change Data Capture (CDC), Auto Loader, and Dataflows, improving query performance by 50%.
• Designed streaming data pipelines with Event Hubs, Stream Analytics, Service Bus, and Spark Streaming, enabling real-time fraud detection, IoT telemetry, and order tracking with sub-second latency.
• Optimized Databricks workflows using partitioning, bucketing, caching, Z-ordering, and Adaptive Query Execution
(AQE), reducing runtime by 40% and improving cluster efficiency.
• Applied Unity Catalog and Microsoft Purview for metadata management, schema evolution, access control, and data lineage, ensuring compliance with GDPR and HIPAA.
• Designed and optimized Delta Lake schemas with schema enforcement and ACID compliance, improving data quality, versioning, and auditability.
• Built Power BI dashboards integrated with Fabric Semantic Models and Synapse SQL pools, delivering real-time business intelligence for finance, healthcare, and operations.
• Automated CI/CD deployments with Azure DevOps, Git, and Terraform, reducing deployment effort by 60% and ensuring version-controlled, reproducible environments.
• Configured monitoring and logging using Fabric Monitoring Hub, Azure Monitor, Log Analytics, and Application Insights, improving anomaly detection and reducing downtime by 30%.
• Collaborated with cross-functional teams to enable end-to-end pipeline development, supporting both BI workloads and ML model training.
ADP Jul 2021 - Jul 2022
Role: Data Engineer
• Designed and automated payroll ETL/ELT pipelines with ADF, Databricks (Job Clusters, PySpark), and Delta Lake, processing 50M+ HR and payroll records across multiple formats (CSV, JSON, Parquet).
• Built batch and streaming workflows with Event Hubs, Spark Streaming, and Logic Apps to ensure timely payroll processing and compliance reporting.
• Developed dimensional data models (Star and Snowflake schemas) in Azure Synapse and cloud data warehouses, supporting 100+ BI dashboards and compliance reports.
• Authored and optimized SQL/T-SQL queries with indexing, partitioning, bucketing, and materialized views for payroll calculations and tax reporting, reducing execution time by 40%.
• Implemented data validation, anomaly detection, and late-arrival handling frameworks, improving payroll data quality by 30%.
• Strengthened data governance and lineage using Microsoft Purview, Unity Catalog, and Fabric Dataflows, ensuring compliance with GDPR, SOC 2, HIPAA.
• Secured sensitive payroll data with RBAC, AAD, Key Vault, and encryption standards, ensuring strong access control.
• Delivered real-time payroll insights with Power BI dashboards connected to Synapse and Fabric semantic models.
• Automated payroll workflows with Databricks notebooks, Logic Apps, and CI/CD pipelines, reducing payroll cycle delays by 20%.
• Collaborated with Finance and HR teams in an Agile/Scrum environment, ensuring effective cross-functional collaboration for comprehensive payroll analytics solutions VIATRIS Aug 2019 - Jun 2021
Role: Data Engineer
• Designed scalable batch and streaming pipelines in ADF, Event Hubs, and Databricks to ingest clinical trial and pharmaceutical manufacturing data into Synapse and Data Lake.
• Implemented event-driven architectures with Service Bus, Event Hubs, and Azure Functions, enabling near real-time adverse event monitoring and regulatory submissions.
• Developed ETL workflows in Databricks (PySpark, Spark SQL, Delta Lake) with schema enforcement and ACID compliance, reducing ETL runtime by 40% and ensuring data reliability.
• Applied Medallion architecture (Bronze, Silver, Gold layers) to standardize ingestion, transformation, and reporting pipelines for clinical and regulatory datasets.
• Enhanced metadata management and lineage with Purview and Unity Catalog, improving auditability and compliance with HIPAA, GDPR, and FDA 21 CFR Part 11.
• Designed data warehouses in Synapse and cloud-native platforms (Snowflake, Redshift) with partitioning, distribution strategies, and performance tuning.
• Implemented monitoring and observability with Azure Monitor, Application Insights, and CloudWatch, reducing downtime by 30%.
• Automated infrastructure provisioning with Terraform and Azure DevOps pipelines, creating scalable, version- controlled environments.
• Processed data across multiple formats (Avro, ORC, JSON, Parquet) for downstream BI and ML applications.
• Collaborated with compliance, regulatory, and analytics teams to deliver secure, high-performance, cloud-native data solutions.
TECHNICAL SKILLS :
CATEGORY
SKILLS
Programming
Languages
Python, SQL (Advanced), PySpark, Scala, R, Java, C#, JavaScript, Unix Shell Scripting, Bash, Go
Cloud Platforms Azure: Data Factory, Synapse Pipelines, Event Hubs, IoT Hub, Service Bus, Functions, Logic Apps, Databricks, Synapse Analytics, HDInsight, Stream Analytics, Microsoft Fabric
(Lakehouse, Dataflows, Real-Time Analytics, Notebooks), Data Lake Storage (ADLS Gen2), Blob Storage, Cosmos DB, SQL Database, SQL Managed Instance, Delta Lake, Analysis Services, Fabric Semantic Models, Power BI, Purview, Active Directory (AAD), Key Vault, Policy, RBAC, DevOps, Terraform, Bicep, Monitor, Log Analytics, Application Insights. AWS: S3, Redshift, Glue, Lambda, EMR, Kinesis, Athena, DynamoDB, Step Functions, CloudWatch, IAM
Data Warehousing
& Modeling
Data Modeling (Star Schema, Snowflake Schema, Data Vault 2.0, Normalization), Data Warehousing, Azure Synapse Analytics, Microsoft Fabric Warehouse, Snowflake, AWS Redshift, Oracle Data Warehouse, Delta Lake, Lakehouse Architecture, Medallion Architecture, Unity Catalog, Metadata Management
Data Pipelines &
Orchestration
Azure Data Factory, Databricks Workflows, Apache Spark, Apache Airflow, DBT, Talend, Informatica PowerCenter, Prefect, Apache NiFi, AWS Glue, Scheduling, Job Clusters, Interactive Clusters, Auto Loader, Orchestration, Data Versioning, Adaptive Query Execution, Z-Ordering
Data Architectures
& Storage
Data Lake, Lakehouse, Data Mesh, Event-Driven Architecture, Streaming, Batch Processing, Parquet, ORC, Avro, JSON, Delta Lake, Azure Data Lake Storage, Azure Blob Storage, Microsoft Fabric Lakehouse, AWS S3, AWS Glue, Optimization, Partitioning, Bucketing Data Governance
& Security
Data Governance, Data Quality, Data Validation, Azure Purview, Microsoft Fabric Dataflows, RBAC, Azure Active Directory (AAD), Key Vault, IAM (AWS/Azure), KMS, Access Control, GDPR, HIPAA, PCI DSS, SOC 2, Data Encryption, Collibra, Alation, AWS Lake Formation, AWS Glue Catalog
Monitoring &
Logging
Azure Monitor, Log Analytics, Application Insights, Aws CloudWatch, OpenSearch, Prometheus, Grafana
Visualization &
Analytics Power BI (Advanced DAX, Semantic Models), Tableau, Looker, AWS QuickSight, Analytical Skills, BI Reporting, Statistical Analysis, Performance Tuning and SQL Optimization Ci/Cd & Devops Git, GitHub, GitLab, Jenkins, Azure DevOps, Terraform, Ansible, AWS CodePipeline, Docker, Kubernetes (AKS, EKS), Infrastructure as Code, CI/CD Automation Machine Learning
& Advanced
Analytics
Azure Machine Learning, MLflow, TensorFlow, Scikit-Learn, PyTorch, Databricks SQL, Spark SQL, Spark Streaming
Query
Optimization
SQL Optimization, Query Tuning, Performance Tuning (Synapse, Snowflake, Redshift), Partitioning, Indexing, Materialized Views, Adaptive Query Execution SDLC & Testing Agile/Scrum, Unit Testing, Integration Testing, Performance Testing, UAT Testing EDUCATION :
University of Texas at Arlington Aug2022 – May 2024 Arlington, Texas (master’s degree in computer science)