Reethika M
314-***-**** ***********@*****.***
Aurora, CO, USA
PROFESSIONAL SUMMARY
Azure Data Engineer with 4+ years of experience in designing and implementing scalable data solutions using Python, SQL, ETL, and Power BI. Expertise in building and maintaining robust ETL pipelines using Azure Data Factory and Azure Databricks, along with strong proficiency in writing complex SQL queries, stored procedures, and performance tuning for large-scale datasets. Experienced in Python (Pandas, NumPy, PySpark) for data transformation, automation, and big data processing. Hands-on experience with Azure services including Data Lake Storage, Blob Storage, and Synapse Analytics for cloud-based data architecture. Skilled in developing interactive Power BI dashboards and reports using DAX and advanced data modeling techniques such as star and snowflake schemas. Strong understanding of data warehousing, data integration, and data governance, with proven ability to implement data validation, quality checks, and monitoring frameworks. Adept at optimizing data pipelines and workflows to improve performance, scalability, and reliability in enterprise environments.
TECHNICAL SKILLS
• Cloud Platform: Microsoft Azure (Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Databricks, Blob Storage)
• Programming Languages: Python, SQL
• ETL Tools & Processes: Azure Data Factory, ETL Pipeline Development, Data Integration, Data Transformation
• Data Visualization: Power BI (DAX, Data Modeling, Dashboard Development)
• Databases: SQL Server, Azure SQL Database
• Big Data Technologies: Apache Spark (PySpark), Databricks
• Data Modeling: Star Schema, Snowflake Schema
• Version Control & Tools: Git, Azure DevOps
• Other Skills: Data Warehousing, Data Quality, Performance Tuning, Workflow Automation
EDUCATION
• Webster University, St. Louis, MO, USA 63119
Master of Sciences (MS) in Information Technology Management March 2025 PROFESSIONAL EXPERIENCE
Client: Biogen, Cambridge, MA
Role: Data Engineer
Date: Apr 2024 – Present
• Designed and implemented scalable ETL pipelines using Azure Data Factory, processing 10M+ records daily with 30% improved efficiency.
• Optimized complex SQL queries and indexing strategies, reducing query execution time by 40% and improving overall database performance.
• Developed interactive Power BI dashboards with advanced DAX, enabling stakeholders to track KPIs and reducing reporting turnaround time by 35%.
• Built end-to-end data solutions using Azure Data Lake, Synapse Analytics, and Blob Storage, supporting high-volume data processing and analytics.
• Automated data ingestion and transformation workflows, reducing manual effort by 50% and improving pipeline reliability.
• Implemented data quality checks and validation rules within ETL processes, increasing data accuracy by 25%.
• Designed optimized data models (star schema) for reporting, improving Power BI report performance by 30%.
• Migrated on-premise data pipelines to Azure cloud, reducing infrastructure costs by 20% and improving scalability.
• Integrated data from APIs, SQL databases, and flat files into centralized systems, improving data availability by 40%.
• Monitored and troubleshot ETL pipelines with logging and alerting, reducing pipeline failures by 30% and ensuring high availability.
• developed data processing scripts using Python and PySpark in Azure Databricks to handle large-scale data transformations.
• Built and automated ETL pipelines using Python, improving data ingestion efficiency and reducing manual effort.
• Utilized Python libraries such as Pandas and NumPy for data cleaning, validation, and preprocessing.
• Integrated Python scripts with Azure Data Factory for orchestrating end-to-end data workflows.
Client: HSBC, Hyderabad, India.
Role: Data Engineer
Date: Jan 2021 – Jan 2023
• Optimized ETL processes to reduce data load time and improve pipeline efficiency.
• Integrated multiple data sources including APIs, databases, and flat files into centralized data platforms.
• Ensured data quality and integrity by implementing validation rules and data cleansing techniques within ETL pipelines.
• Developed reusable data ingestion frameworks to standardize ETL development and reduce redundancy.
• Tuned SQL queries and indexing strategies to enhance database performance and reduce latency.
• Created Power BI reports with DAX calculations, KPIs, and drill-through capabilities for business insights.
• Managed incremental data loads and change data capture (CDC) processes for efficient data updates.
• Collaborated with cross-functional teams to gather requirements and deliver scalable data solutions.
• Implemented role-based access control (RBAC) and data security practices within Azure environments.
• Monitored and troubleshot data pipelines to ensure high availability and minimal downtime.
• Migrated on-premise data solutions to Azure cloud, improving scalability and reducing infrastructure costs.
• Documented data pipelines, workflows, and technical designs to support maintainability and knowledge sharing.
• Implemented data transformation and aggregation logic using PySpark, improving processing speed for big data workloads.
• Developed reusable Python modules and functions to standardize data engineering processes.
• Automated data quality checks and validation processes using Python, ensuring high data accuracy.
• Worked with REST APIs using Python to extract and load data into Azure data platforms.
• Optimized Python code for performance, reducing execution time of data pipelines.
• Used Python for log monitoring, error handling, and alerting within ETL workflows. PROJECT:
End-to-End Azure Data Engineering Pipeline
• Designed and implemented an end-to-end data pipeline using Azure Data Factory, Azure Data Lake Storage, and Azure Databricks to process both batch and near real- time data
• Ingested data from multiple sources including REST APIs, SQL databases, and streaming data into Azure Data Lake (Bronze layer)
• Built scalable data transformation pipelines using PySpark in Databricks, implementing bronze, silver, and gold architecture (Medallion Architecture)
• Developed incremental data loading and Change Data Capture (CDC) mechanisms to optimize data refresh cycles
• Orchestrated workflows using Azure Data Factory with parameterized pipelines and dynamic triggers
• Designed optimized data models (Star Schema) in the Gold layer for reporting and analytics
• Integrated transformed data with Azure Synapse Analytics for high-performance querying
• Created interactive dashboards in Power BI using DAX and advanced data modeling techniques
• Implemented data quality checks, logging, and monitoring to ensure pipeline reliability and data accuracy
• Optimized Spark jobs and SQL queries, improving processing performance by 30%
• Used Git and Azure DevOps for version control and CI/CD deployment of data pipelines