Data Engineer Cloud

Location:

Seattle, WA

Posted:

October 15, 2025

Contact this candidate

Resume:

SRINIDHI GUTTA

Data Engineer

***************@*****.*** +1-972-***-****.

LinkedIn Location: Seattle, WA, 98119.

SUMMARY:

Data Engineer with 5+ years of experience designing and optimizing data pipelines, data models, and cloud data architectures. Specialized in building ETL/ELT workflows, real-time streaming pipelines, and cloud data warehouses on Azure, Microsoft Fabric, Databricks, and Synapse. Skilled at data modeling, data governance, query optimization, and database performance tuning. Proven success in migrating on-premises systems to cloud, enabling analytics, reporting, and machine learning solutions for industries including finance, healthcare, and payroll. Strong expertise in security and compliance (GDPR, HIPAA, SOC 2, PCI DSS) and delivering actionable insights through Power BI and Tableau dashboards. Open to relocation and available to join immediately. Professional Highlights:

• Data Engineer with expertise in ETL/ELT pipeline design, Batch and Real-Time Processing, and Streaming Analytics using Azure Data Factory, Databricks, PySpark, Spark SQL, and Apache Spark. Improved data freshness, reliability, and performance by 40% through large-scale data ingestion and transformation workflows.

• Designed event-driven and streaming architectures using Azure Event Hubs, Stream Analytics, Service Bus, Spark Streaming, and Auto Loader, supporting real-time decision-making for fraud detection, IoT telemetry, and e-commerce systems. Built resilient frameworks capable of processing millions of events per second.

• Developed and optimized Data Warehousing and Lakehouse solutions with Azure Synapse, Microsoft Fabric, ADLS, Delta Lake, and Unity Catalog, applying SQL optimization, partitioning, bucketing, and AQE to improve query performance by 50% and reduce compute costs.

• Architected Data Lakehouse and Medallion Architectures, integrating structured and unstructured data into unified layers. Designed Data Vault and Dimensional Models (Star, Snowflake) for BI workloads, enhancing analytical flexibility in Power BI and Azure Analysis Services.

• Implemented Metadata Management and Data Governance using Purview, Data Catalog, Collibra, and Alation, ensuring data lineage, schema evolution, and versioning for transparency and regulatory readiness.

• Built Data Quality and Validation pipelines with automated checks, profiling, and anomaly detection, improving accuracy, completeness, and trust in enterprise datasets.

• Enforced Data Security and Compliance with AAD, RBAC, Key Vault, IAM, and Azure Policy, ensuring adherence to GDPR, HIPAA, SOC 2, and PCI DSS through encryption and key rotation strategies.

• Automated infrastructure deployment and orchestration using Azure DevOps, Jenkins, Git, Terraform, Docker, and Kubernetes (AKS/EKS). Built CI/CD pipelines that reduced provisioning time by 50% and improved release consistency.

• Delivered end-to-end data solutions across Azure and AWS (S3, Redshift, Glue, Lambda, EMR, Kinesis, Athena, Step Functions) for data migration, scalability, and cost optimization, designing hybrid multi-cloud architectures.

• Collaborated with data scientists and analysts to operationalize ML workflows using MLflow, Databricks SQL, Azure ML, TensorFlow, Scikit-Learn, and PyTorch, integrating predictive models into data pipelines.

• Developed and managed secure REST APIs and data integration services supporting JSON, Parquet, ORC, and Avro formats, improving cross-platform data access and system interoperability.

• Implemented monitoring and observability frameworks with Azure Monitor, Log Analytics, Application Insights, AWS CloudWatch, Prometheus, and Grafana, enhancing performance visibility and anomaly detection.

• Optimized queries and performance using Spark SQL, Databricks SQL, and Synapse, applying indexing, caching, and partitioning to reduce execution times and enhance scalability.

• Experienced in Agile/Scrum, sprint planning, and cross-functional collaboration, translating business needs into secure, scalable, and high-performance data architectures with cost efficiency in mind.

• Strong foundation in data strategy, scalability engineering, and system optimization, with proven ability to build end-to-end cloud data platforms enabling actionable insights and AI-driven decision-making. CERTIFICATIONS:

• Microsoft Certified: Fabric Data Engineer Associate – Microsoft, 2025

• Python for Data Science, AI & Development

RELEVANT WORK EXPERIENCE:

Microsoft Jan 2024 - Present

Role: Data Engineer

• Built end-to-end ETL/ELT pipelines in Azure Data Factory and Databricks (PySpark, Spark SQL), ingesting structured, semi-structured, and unstructured data into a Fabric Lakehouse (Medallion Architecture – Bronze, Silver, Gold).

• Migrated on-prem SQL Server and PostgreSQL datasets to Azure Synapse and Fabric Lakehouse using Change Data Capture (CDC), Auto Loader, and Dataflows, improving query performance by 50%.

• Designed streaming data pipelines with Event Hubs, Stream Analytics, Service Bus, and Spark Streaming, enabling real-time fraud detection, IoT telemetry, and order tracking with sub-second latency.

• Optimized Databricks workflows using partitioning, bucketing, caching, Z-ordering, and Adaptive Query Execution

(AQE), reducing runtime by 40% and improving cluster efficiency.

• Applied Unity Catalog and Microsoft Purview for metadata management, schema evolution, access control, and data lineage, ensuring compliance with GDPR and HIPAA.

• Designed and optimized Delta Lake schemas with schema enforcement and ACID compliance, improving data quality, versioning, and auditability.

• Built Power BI dashboards integrated with Fabric Semantic Models and Synapse SQL pools, delivering real-time business intelligence for finance, healthcare, and operations.

• Automated CI/CD deployments with Azure DevOps, Git, and Terraform, reducing deployment effort by 60% and ensuring version-controlled, reproducible environments.

• Configured monitoring and logging using Fabric Monitoring Hub, Azure Monitor, Log Analytics, and Application Insights, improving anomaly detection and reducing downtime by 30%.

• Collaborated with cross-functional teams to enable end-to-end pipeline development, supporting both BI workloads and ML model training.

ADP Jul 2021 - Jul 2022

Role: Data Engineer

• Designed and automated payroll ETL/ELT pipelines with ADF, Databricks (Job Clusters, PySpark), and Delta Lake, processing 50M+ HR and payroll records across multiple formats (CSV, JSON, Parquet).

• Built batch and streaming workflows with Event Hubs, Spark Streaming, and Logic Apps to ensure timely payroll processing and compliance reporting.

• Developed dimensional data models (Star and Snowflake schemas) in Azure Synapse and cloud data warehouses, supporting 100+ BI dashboards and compliance reports.

• Authored and optimized SQL/T-SQL queries with indexing, partitioning, bucketing, and materialized views for payroll calculations and tax reporting, reducing execution time by 40%.

• Implemented data validation, anomaly detection, and late-arrival handling frameworks, improving payroll data quality by 30%.

• Strengthened data governance and lineage using Microsoft Purview, Unity Catalog, and Fabric Dataflows, ensuring compliance with GDPR, SOC 2, HIPAA.

• Secured sensitive payroll data with RBAC, AAD, Key Vault, and encryption standards, ensuring strong access control.

• Delivered real-time payroll insights with Power BI dashboards connected to Synapse and Fabric semantic models.

• Automated payroll workflows with Databricks notebooks, Logic Apps, and CI/CD pipelines, reducing payroll cycle delays by 20%.

• Collaborated with Finance and HR teams in an Agile/Scrum environment, ensuring effective cross-functional collaboration for comprehensive payroll analytics solutions VIATRIS Aug 2019 - Jun 2021

Role: Data Engineer

• Designed scalable batch and streaming pipelines in ADF, Event Hubs, and Databricks to ingest clinical trial and pharmaceutical manufacturing data into Synapse and Data Lake.

• Implemented event-driven architectures with Service Bus, Event Hubs, and Azure Functions, enabling near real-time adverse event monitoring and regulatory submissions.

• Developed ETL workflows in Databricks (PySpark, Spark SQL, Delta Lake) with schema enforcement and ACID compliance, reducing ETL runtime by 40% and ensuring data reliability.

• Applied Medallion architecture (Bronze, Silver, Gold layers) to standardize ingestion, transformation, and reporting pipelines for clinical and regulatory datasets.

• Enhanced metadata management and lineage with Purview and Unity Catalog, improving auditability and compliance with HIPAA, GDPR, and FDA 21 CFR Part 11.

• Designed data warehouses in Synapse and cloud-native platforms (Snowflake, Redshift) with partitioning, distribution strategies, and performance tuning.

• Implemented monitoring and observability with Azure Monitor, Application Insights, and CloudWatch, reducing downtime by 30%.

• Automated infrastructure provisioning with Terraform and Azure DevOps pipelines, creating scalable, version- controlled environments.

• Processed data across multiple formats (Avro, ORC, JSON, Parquet) for downstream BI and ML applications.

• Collaborated with compliance, regulatory, and analytics teams to deliver secure, high-performance, cloud-native data solutions.

TECHNICAL SKILLS :