Data Engineer Azure

Location:

Cincinnati, OH, 45220

Salary:

99000

Posted:

October 24, 2025

Contact this candidate

Resume:

Pravalika S

Sr. Data Engineer

283-***-**** **************@*****.***

PROFESSIONAL SUMMARY:

• Data Engineer with 4+ years of experience in architecting and optimizing end-to-end data pipelines, ETL frameworks, and analytical platforms across AWS, Azure, and GCP environments for large-scale enterprise data solutions.

• Proficient in Azure Data Factory, AWS Glue, Databricks, and PySpark, automating ingestion and transformation from SQL Server, Oracle, and API sources, improving data processing efficiency and scalability by over 30%.

• Designed and implemented data lakes and warehouses using Snowflake, Azure Synapse Analytics, and Amazon Redshift, leveraging Star/Snowflake schemas and partitioning to enhance query performance and enable near real-time analytics.

• Skilled in containerized and serverless data orchestration using Kubernetes, Amazon EKS, Azure Functions, and Lambda, integrating Kafka, Kinesis, and Event Hubs for low-latency streaming and event-driven architectures.

• Experienced in data modeling, CI/CD, and governance with Terraform, Jenkins, and Azure DevOps, implementing CDC, Delta Lake, and cross-cloud synchronization strategies to ensure reliability, lineage, and secure data delivery.

• Implemented Change Data Capture (CDC) and Delta Lake frameworks for incremental and real-time data processing, ensuring accuracy, lineage tracking, and audit compliance across cloud data lakes.

• Hands-on with Terraform, Jenkins, and Azure DevOps for CI/CD automation, infrastructure as code (IaC), and data pipeline monitoring — reducing deployment errors and improving system reliability.

• Partnered with data science and analytics teams to deliver ML-ready datasets and performance-optimized pipelines, accelerating model training and business reporting across finance, healthcare, and enterprise domains. TECHNICAL SKILLS:

Category Key Skills

Programming Languages Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn), PL/SQL, T-SQL, SnowSQL, MDX, PySpark Cloud Platforms Azure, AWS, GCP, Oracle Cloud, IBM Cloud Azure Services

Azure Data Lake Storage, Azure SQL Database, Azure Synapse Analytics, Azure Data Factory (ADF), Azure Databricks, Azure DevOps, Azure Synapse Serverless SQL AWS Services

AWS Glue, AWS S3, AWS Redshift, AWS Kinesis, AWS Athena, AWS Lambda, AWS SNS, AWS SQS, EMR, EC2, EKS, CloudFormation, CloudWatch

GCP Services Google Cloud Dataflow, Google Cloud Composer, BigQuery Big Data Technologies &

Container Orchestration

Apache Spark, Apache Airflow, Hadoop, Hive, MapReduce, Kafka, Kubernetes, Amazon EKS ETL & Integration SSIS, Informatica, Azure Data Factory (ADF), RESTful APIs, Web Services, Data Pipelines Data Modeling &

Warehousing

Star Schema, Snowflake, OLAP, OLTP, Data Marts, ODS, Database Design, TFS, Windows Server SQL & Database Tools

SQL Server (2008–2019), Oracle (9i–18c), PostgreSQL, DB2, Teradata, Index Tuning, Performance Tuning

API & Web Integration API Integration, Data Services, Power BI REST API, XMLA, Microsoft Flow Data & BI Skills

Power BI, SSAS, Tableau, KPI Reporting, Data Visualization, Big Data Analytics Data Science & ML Machine Learning (Supervised), Hypothesis Testing, Regression, ANOVA, SciPy, Apache Spark Project & Team Skills Agile, Scrum, Jira, Waterfall, Team Collaboration, Stakeholder Communication, Mentoring PROFESSIONAL EXPERIENCE:

COMMONWEALTH CARE ALLIANCE, BOSTON, MA Jan 2024 – Present Role: Sr. Data Engineer

Responsibilities:

• Designed and engineered Azure Data Factory (ADF) pipelines integrating data from claims, pharmacy, and EHR systems into centralized data lakes, applying modular Python scripts and parameterized SQL logic to enable scalable ingestion and transformation processes.

• Developed and maintained ETL/ELT workflows across Azure SQL Database, Synapse Analytics, and Azure Data Lake Storage (ADLS) environments, ensuring consistency, reliability, and performance in healthcare analytics pipelines.

• Implemented dimensional data models following Kimball methodologies (Star and Snowflake schemas) to support reporting, predictive analytics, and downstream ML processes.

• Optimized data validation and exception-handling frameworks in Python, integrating data quality checks and compliance rules aligned with HIPAA and internal governance standards.

• Automated deployment workflows and infrastructure provisioning using Git, Azure DevOps, and Terraform, ensuring seamless CI/CD integration, version control, and environment consistency.

• Collaborated with product managers, clinical analysts, and business intelligence teams in Agile sprints to deliver data assets aligned with operational and regulatory objectives.

• Built Power BI dashboards to visualize pipeline health, data latency, and processing metrics, integrating alerting through Azure Monitor and Logic Apps for proactive issue resolution.

• Leveraged Databricks and PySpark for large-scale data transformation, aggregation, and performance optimization, integrating batch and streaming workloads with ADF pipelines.

• Integrated metadata-driven orchestration and logging mechanisms to enhance observability, traceability, and maintenance across production data pipelines.

• Participated in cross-platform data initiatives, coordinating integration efforts across AWS S3, Azure Synapse, and on-prem SQL Server, ensuring interoperability and consistency in hybrid environments. Project Impact: Enhanced CCA healthcare analytics infrastructure by improving data processing speed by 35%, reducing manual intervention in pipeline monitoring by 50%, and accelerating analytics delivery timelines by 25% through automation and optimized CI/CD workflows.

WELLSFARGO, CHARLOTTE, NC Aug 2020 – July 2023

Role: Data Engineer

Responsibilities:

• Built and supported end-to-end data pipelines on AWS Glue, S3, Lambda, and Redshift, enabling secure ingestion, transformation, and delivery of financial, transactional, and risk data for reporting and analytics.

• Developed modular ETL/ELT workflows in Python and SQL, integrating data from multiple internal banking systems and third-party APIs into centralized data lakes, improving operational data flow and reducing manual intervention.

• Designed data models using Star and Snowflake schemas within Amazon Redshift and SQL Server, supporting data warehousing and analytical reporting initiatives across finance and risk management domains.

• Implemented Python-based validation, error handling, and audit logging, ensuring accuracy, lineage tracking, and regulatory compliance for financial datasets in alignment with SOX and GDPR guidelines.

• Automated data ingestion and transformation workflows using AWS Lambda, Step Functions, and Glue Crawlers, improving system reliability and ensuring daily data refreshes with minimal downtime.

• Collaborated closely with financial analysts, risk officers, and data architects in Agile environments to align data engineering deliverables with fraud detection, compliance, and operational reporting requirements.

• Participated in code reviews, documentation, and version control using Git, AWS CodePipeline, and Jenkins, maintaining consistency across dev, QA, and production environments.

• Leveraged Amazon Redshift Spectrum and AWS Athena for federated querying across data lake and warehouse environments, enabling cost-effective analytics without data movement and improving query flexibility for business reporting.

• Supported Power BI and Tableau reporting teams by preparing curated and aggregated datasets in Redshift and SQL Server, enabling timely delivery of dashboards for performance tracking, KPIs, and audit reporting.

• Worked with CloudWatch and AWS IAM for pipeline monitoring, access control, and proactive alerting to maintain data security and availability.

Project Impact: Strengthened Wells Fargo’s enterprise data ecosystem by implementing automated AWS-based pipelines that improved data reliability and reporting accuracy by 35%, enhanced compliance visibility, and reduced manual reconciliation and data preparation efforts by 40%. ACCENTURE HYDERABAD, INDIA Nov 2019 – May 2020

Role: Data Engineering Intern

Responsibilities:

• Assisted in building ETL pipelines using Python and Azure Data Factory, supporting data ingestion, transformation, and integration for client analytics projects.

• Supported data modeling and query optimization in Azure SQL and SQL Server, ensuring efficient data retrieval and reliability across development environments.

• Collaborated with data engineers and analysts in Agile sprints, contributing to user stories, testing, and documentation for pipeline deployment.

• Gained hands-on experience in Azure Data Lake, pipeline orchestration, and data quality validation, enhancing understanding of end-to-end cloud data workflows. Tools Used: Azure Data Factory, Azure Data Lake, Azure SQL Database, Python, SQL Server, and Git EDUCATION: Masters in Computer Science from University of Dayton

Contact this candidate