SATHISH LAVUDIYA
DATA ENGINEER
• Dallas, Texas *5038 • +1-682-***-**** • **************@*****.***
Professional Summary:
Detail-oriented Data Engineer with 4 years of experience turning raw data into trusted insights that support business growth. Skilled at designing efficient data solutions that improve accuracy, speed, and reliability. Experienced in managing large datasets, streamlining data processes, and ensuring compliance with industry standards. Known for delivering clear results in fast-paced environments and working closely with teams to achieve shared goals.
Skills:
Programming: Python, SQL, Scala, Java
Cloud Platforms: Microsoft Azure, AWS, Google Cloud Platform (GCP)
Big Data Frameworks: Apache Spark, Apache Kafka, Hadoop
Data Warehousing: Azure Synapse, Amazon Redshift, Snowflake, Google Big Query
ETL Tools: Azure Data Factory, Apache NiFi, Talend, Informatica, SSIS, Databricks
Data Lake/Data Storage: Azure Data Lake Storage (Gen2), Amazon S3, Delta Lake
Workflow Orchestration: Apache Airflow, Azure Data Factory
Containerization & DevOps: Docker, Kubernetes, Terraform, Azure DevOps
Streaming Technologies: Apache Kafka, Azure Event Hubs, Spark Structured Streaming
Data Modeling & Governance: Data Modeling, Common Data Model, RBAC, Purview, Unity Catalog
Business Intelligence & Visualization: Power BI, Tableau
Security: IAM, HIPAA Compliance, Azure Key Vault
API Development: Azure API Management, RESTful APIs
Machine Learning Integration: Scikit-learn, ML pipelines
Work Experience
Data Engineer, 12/2023 to Current
Pwc- Remote
Architected a full-stack Azure Lakehouse using Data Factory, ADLS Gen2, and Delta Lake to integrate multi-source healthcare data (EMR systems, claims/CPT files, NPI/ICD APIs), delivering a unified analytics layer for RCM use cases.
Orchestrated metadata-driven ingestion pipelines in Azure Data Factory, enabling incremental and full loads with parallel execution, audit logging, and archival, improving reliability and cutting load time by 30%.
Designed a Medallion Architecture (Landing Bronze Silver Gold) in Databricks (Spark/Python), standardizing 10+ datasets into Parquet/Delta formats with ACID compliance, resulting in 50% faster query performance.
Implemented the Common Data Model and SCD Type 2 in the silver layer, creating surrogate keys to unify hospital IDs and maintaining complete historical records, ensuring 100% longitudinal accuracy.
Modeled the Gold layer with one fact (transactions) and six dimensions (patients, providers, encounters, claims, ICD, CPT), enabling KPI tracking for AR > 90 days, Days in AR, and Net Collection Rate.
Enhanced governance and security by applying Azure Key Vault, AAD app registrations, RBAC, and Databricks Unity Catalog, ensuring HIPAA compliance and enterprise-wide lineage visibility.
Data Engineer, 01/2022 to 07/2023
Spsoft global
Developed real-time streaming pipelines using Apache Kafka and Azure Event Hubs to process 5M+ daily events, reducing order tracking latency from minutes to under 5 seconds across fulfillment centers.
Migrated 20TB+ of historical data from MySQL/PostgreSQL into Azure Synapse with ADF and CDC, boosting query performance by 50% while ensuring seamless batch-to-stream integration.
Optimized 30+ ETL workflows in Databricks (Spark), reducing processing latency by 40% and increasing SLA compliance by 25%.
Established a centralized Azure Data Lake (Gen2) governed by Purview, applying 100+ RBAC policies to secure 100TB+ of structured and semi-structured data for enterprise-wide access.
Deployed event-driven pipelines using Azure Functions and Durable Functions to power real-time fraud detection and inventory updates, achieving 99.9% accuracy.
Automated infrastructure and CI/CD pipelines with Terraform and Azure DevOps, reducing manual deployment by 70% and accelerating release cycles from weekly to daily.
Delivered 50+ secure APIs via Azure API Management and created 10+ Power BI dashboards, lowering SLA response time for customer inquiries by 40% and enabling real-time executive KPI tracking.
Enhanced data quality by applying systematic cleaning, validation, and transformation procedures, reducing downstream errors.
Scripted Python automations for repetitive tasks, increasing team productivity and minimizing manual mistakes.
Data Engineer Intern, 01/2021 to 12/2021
Spsoft global
Maintained SSIS ETL workflows to automate ingestion from diverse sources into SQL Server Data Warehouse, boosting pipeline performance and ensuring integrity.
Authored SQL queries and stored procedures for analytics and reporting, improving data retrieval efficiency by 25%.
Designed Power BI dashboards for senior management, cutting data analysis time by 40%.
Automated validation checks with Python and SQL scripts, reducing inconsistencies across datasets.
Tuned slow-running SQL queries, improving execution times by 30%.
Constructed ETL jobs for structured data loading and transformation, ensuring reliable datasets for reporting.
Integrated business logic into SSRS reports to align with user needs and enhance usability.
Created interactive dashboards using Power BI and SQL, enabling KPI tracking and improving decision-making speed by 20%.
Education
Degree: Business Analytics, 05/2025
Trine University - Detroit, Michigan