Senior Data Engineer Azure Lakehouse, Vault, ETL

Location:

Cleveland, OH

Posted:

February 16, 2026

Contact this candidate

Resume:

N Prathyusha

Email: **************@*****.*** Phone number: +1-216-***-**** Location: Cleveland, OH

OBJECTIVE

Data Engineering experience in designing and implementing scalable Lakehouse, Data Vault, and DataMart architectures to support healthcare, compliance, and operational analytics use cases.

Java-based ETL pipelines using Spring Batch for large-scale data transformation and validation

Strong expertise in Azure data platform including Azure Data Factory, Azure Databricks (PySpark), ADLS, Synapse/Fabric, and Delta Lake to build end-to-end ELT pipelines from raw ingestion to curated reporting layers.

Advanced proficiency in SQL and Python for complex data transformations, reconciliation, automation, SLA monitoring, and development of reliable data processing frameworks.

Hands-on experience with Snowflake and dbt for building analytics-ready datasets, implementing clustering/partitioning strategies, and optimizing query performance for BI workloads.

Deep knowledge of data modeling techniques including Star Schema, Snowflake Schema, and Data Vault 2.0 (Hubs, Links, Satellites) to enable historical tracking and governed analytics.

Experienced in implementing data quality, validation, and governance frameworks using schema enforcement, CDC, audit trails, and automated QA checks to ensure trusted data.

Enabled business intelligence and reporting teams by delivering clean, structured datasets used in Power BI, Tableau, Looker, and Excel dashboards.

Proficient in orchestration and DevOps practices using Apache Airflow, Control-M, CI/CD pipelines, Terraform, Git/GitHub, and monitoring/alerting mechanisms for pipeline reliability.

Exposure to AWS services such as S3, Lambda, and Glue and experience modernizing legacy SSIS and Hadoop workflows into cloud-native ELT architectures.

TECHNICAL SKILLS

Data Engineering & Processing: Azure Data Factory (ADF), Azure Databricks, Apache Spark, PySpark, Spark SQL, Delta Lake, dbt, Apache Airflow, ETL/ELT Development, Data Lakehouse Architecture, Web Scraping / Public Data Extraction, Azure Event Hubs, Kafka, Data Freshness Checks, Failure Recovery, Alerting, Query Tuning, Compute Scaling, Storage Optimization

Programming: Java (ETL & batch processing), Spring Batch (data transformation jobs), Python, SQL, Scala, Shell Scripting, Docker

Databases: SQL Server, PostgreSQL, Oracle, Snowflake, MySQL, BigQuery

Cloud Platforms: Azure (ADLS, Synapse, Databricks), AWS (S3, Lambda, Glue), Snowflake

Data Modeling & Quality: Star/Snowflake Schema Design, CDC Frameworks, Data Validation, Data Governance, Automated QA (Python + SQL), dbt Models, Data Contracts, Schema Enforcement

Analytics & Visualization: Power BI, Tableau, Looker, Excel

Testing & Automation: ETL Testing, Data Validation, Unit Testing (SQL/Python), JIRA, CI/CD for ADF & Databricks, Terraform

Tools & Version Control: Git/GitHub, Control-M, SSIS, REST APIs, Azure DevOps Pipelines

EXPERIENCE

Cortracker

Data Engineer January 2024 – Present

Built Apache Airflow DAGs for batch and near real-time data pipelines with SLA monitoring and alerting.

Worked closely with upstream Oracle systems to extract, transform, and prepare operational data for downstream analytics and reporting workflows.

Processed healthcare claims and eligibility datasets ensuring accurate reconciliation and compliance reporting.

Designed and built new end-to-end Snowflake pipelines using Snowpipe, Streams & Tasks for incremental loading, warehouse auto-scaling, and performance-optimized ELT processing.

Translated business reporting requirements into DataMart designs, STTM mappings, and analytics-ready SQL datasets used by compliance and operations teams.

Modeled enterprise data using Star Schema and Data Vault 2.0 (Hubs, Links, Satellites) to enable scalable historical tracking and governed analytics.

Maintained Java-based ETL pipelines using Spring Batch to transform and validate large operational datasets before loading into analytical platforms.Introduced SQL and Python-based SLA monitoring checks that proactively detected pipeline failures and improved data availability.

Performed SQL-based data validation and reconciliation across Oracle and downstream analytical systems to ensure accuracy and consistency of transformed data.

Standardized ADLS and Fabric storage structures, naming conventions, and lifecycle policies for large-scale governed datasets.

Optimized large datasets using Parquet/Delta formats, partitioning strategies, and file-size tuning to improve BI query performance.

Delivered executive and operational Power BI dashboards using Direct Lake connectivity to the Fabric Lakehouse.

Supported UAT, created data validation test cases, and ensured reporting accuracy for business stakeholders.

Produced system flow diagrams, data mapping documents, and walkthrough sessions to help users understand dashboards and report.

Value Software Technologies Pvt Ltd, India

Data Engineer May 2021 – August 2022

Designed relational, dimensional, and Data Vault data models to support governed BI reporting across healthcare and banking-style datasets.

Built and supported batch-oriented ETL processes using Java and SQL, integrating Oracle source systems and preparing curated datasets for analytics platforms including Snowflake.

Created detailed STTM documents and performed SQL data analysis to ensure accurate integration across multiple healthcare systems.

Implemented ETL validation, audit logging, and error-handling mechanisms within batch processing workflows to support regulated reporting requirements.

Improved Snowflake and Azure query performance using clustering, partitioning, and columnar storage formats.

Defined reusable dataset templates and data quality rules adopted across reporting teams.

Assisted clinical and pharmacy teams by creating Excel/Access-based validation tools for operational data checks.

Conducted user training sessions explaining workflows, alerts, and compliance reporting processes.

SMART BRIDGE, India

Data Engineer June 2020 – May 2021

Modernized legacy SSIS and Hadoop workflows by introducing Java- and SQL-based batch ETL processes, improving reliability and scalability of data transformations.

Automated manual reporting processes, reducing business effort by 50%.

Performed complex SQL transformations to cleanse, integrate, and align reporting datasets.

Implemented enterprise data validation frameworks and audit checks to ensure trusted analytics data.

Configured role-based access and security rules in healthcare reporting applications.

PUBLICATION

Published the project "Text Classification using Different Feature Extraction Techniques”, as a research article in the JETIR (Journal of Emerging Technologies and Innovative Research), Volume 9, Issue 5, May 2022.

Link: https://www.jetir.org/papers/JETIR2205C10.pdf

Explored various methods to convert text into meaningful data, including TF-IDF, Bag-of-Words, and word embeddings, to determine the most effective classification results.

Built and compared multiple machine learning models, including SVM, Naïve Bayes, and Logistic Regression, to identify those with the highest accuracy and consistency.

Cleaned and prepared large sets of text data by removing noise, normalizing words, and applying NLP preprocessing techniques to enhance model reliability.

Evaluated each model using practical metrics like accuracy, precision, recall, and F1-score, making the results straightforward to interpret and compare.

Analyzed model results and summarized findings to explain which feature extraction techniques performed better and why, based on real evaluation metrics.

Contact this candidate