Data Engineer Azure

Location:

Cincinnati, OH

Salary:

80000

Posted:

May 01, 2025

Contact this candidate

Resume:

Sneha Sameera Medepalli

Cincinnati, Ohio

+1-513-***-**** # ****************@*****.*** ï LinkedIn Profile Summary

• Experienced Cloud Data Engineer with 5+ years of expertise designing scalable, secure, and efficient data platforms across AWS, Azure.

• Specialized in building robust ETL/ELT pipelines, data lakes, and warehousing solutions using tools like Azure Data Factory, AWS Glue, and Google Dataflow

• Proficient in Python, PySpark, and SQL for data processing, transformation, and automation in large-scale distributed environments

• Hands-on experience with streaming technologies including Kafka, Kinesis, and Spark Streaming for real-time ingestion and analytics

• Strong background in healthcare, retail, and telecom domains, ensuring compliance with standards like HIPAA while delivering high-impact analytics solutions

• Implemented CI/CD and monitoring frameworks using tools such as Azure DevOps, CloudWatch, and CodePipeline, enabling faster and reliable releases

• Collaborated cross-functionally with data scientists, business stakeholders, and engineering teams to align data solutions with organizational goals

• Agile practitioner with active participation in bi-weekly sprints, sprint planning, and retrospectives to drive continuous improvement in data delivery

Education

University of cincinnati 2023–2024

Master’s in Information Technology Cincinnati, Ohio, USA Relevant Coursework

• Introduction to Algorithms

• Design and Analysis of Algorithm

• Database organization

• Online Social Networks Analysis

• Data Preparation and Analysis

• Software Project Management

Certifications

Microsoft Certified: Azure Data Engineer

Experience

Cleveland Clinic Apr 2024 – Present

Sr. Data Engineer Cleveland, Ohio

• Architected and deployed scalable ETL pipelines using Azure Data Factory and Azure Synapse Analytics to process over 10M daily clinical records, enabling real-time insights for physicians and research teams

• Engineered data lake solutions using Azure Data Lake Gen2 and Delta Lake, facilitating centralized, secure storage of electronic health records (EHR) across multiple departments

• Leveraged Azure Databricks and PySpark for data wrangling and transformation of high-volume FHIR and HL7 datasets, improving downstream analytics performance by 35%

• Collaborated closely with clinical data teams, bioinformaticians, and compliance officers to align data models with HIPAA and internal governance policies using Azure Purview

• Built CI/CD pipelines using Azure DevOps to automate deployment of data pipelines, unit tests, and lineage validation, reducing manual effort by 40%

• Developed and monitored data quality checks using Azure Monitor, Log Analytics, and custom alerts, ensuring integrity in patient-critical datasets

• Integrated Power BI dashboards with Azure Synapse to provide medical leadership with live operational insights, reducing report turnaround times from days to minutes

• Actively contributed to bi-weekly Agile sprints, participating in sprint planning, backlog grooming, and daily stand-ups to ensure timely and collaborative delivery of data engineering solutions

(NTT DATA) Walgreens Apr 2022 – Jul 2023

AWS Data Engineer Banglore, India

• Designed and implemented end-to-end data pipelines using AWS Glue, Lambda, and Step Functions to ingest and transform retail pharmacy and supply chain data

• Utilized Amazon S3, Athena, and Redshift Spectrum for building a cost-effective, serverless analytics platform for inventory and sales forecasting

• Processed real-time transactional data using Kinesis Data Streams and stored in Amazon Redshift, supporting dynamic inventory adjustments across 1,000+ retail locations

• Created ETL frameworks with PySpark and AWS Glue to handle large-scale prescription and patient data while maintaining HIPAA compliance

• Implemented CI/CD pipelines using AWS CodePipeline, CodeBuild, and CloudFormation for automated deployment and monitoring of data infrastructure

• Collaborated with supply chain analysts, clinical data teams, and business stakeholders to define data requirements and validate KPIs for patient-centric retail insights

• Performed data quality checks and anomaly detection using Amazon CloudWatch, SNS, and custom alerting scripts to ensure data integrity

• Participated in Agile ceremonies including sprint planning, retrospectives, and backlog grooming to drive timely delivery of cloud-based data solutions

Humana Mar 2020 – Mar 2022

Data Engineer Chennai, India

• Developed and maintained ETL workflows using Informatica PowerCenter and SQL Server Integration Services (SSIS) to extract, transform, and load healthcare claims and enrollment data

• Built robust data pipelines to process large volumes of HIPAA-compliant patient data, improving data availability for analytics and actuarial teams

• Created stored procedures and complex T-SQL queries to automate data validation and transformation for monthly healthcare reporting

• Collaborated with business analysts, compliance teams, and data stewards to ensure accuracy, privacy, and timeliness of sensitive member data

• Worked on performance tuning of queries and ETL workflows to reduce end-to-end processing time by 25%

• Integrated data from multiple source systems including Oracle, Flat Files, and external vendor APIs into a centralized data warehouse

• Utilized Tableau to develop internal dashboards for clinical operations teams to track claim statuses, member visits, and provider efficiency

• Participated in Agile sprints, contributing to sprint reviews and retrospectives while aligning deliverables with HIPAA regulations and data security policies

Technical Skills

Data Storage: Data Lakes, Data Warehousing, SQL & NoSQL Databases, Cloud Storage (Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, Azure Cosmos DB) Data Processing: ETL/ELT Development, Data Pipelines, Batch & Stream Processing, Data Transformation, Real-time Data Processing (Event Hubs, Stream Analytics)

Big Data Technologies: Hadoop, Spark, Distributed Computing, Large-scale Data Processing Cloud Platforms: Azure, AWS, Google Cloud Platform (GCP) Programming Languages: Python, SQL, Java, Scala, R ETL/ELT Tools: Apache NiFi, Informatica, Talend, Apache Airflow, Azure Data Factory, AWS Glue, SSIS, SSAS, AWS Data Pipeline

Analytics & Reporting: Data Visualization, BI Tools (Tableau, Power BI), DAX, Descriptive & Predictive Analytics Orchestration: Workflow Automation, Job Scheduling, Data Pipeline Orchestration Machine Learning: Model Deployment, Feature Engineering, Integration with Data Pipelines Networking: Data Integration, API Management, Cloud Networking, Data Migration Data Warehousing: Dimensional Modeling (Star/Snowflake Schemas), Azure Synapse Analytics, OLAP, Data Marts

Contact this candidate