Pragati Singh
Azure Data Engineer
Ph: +44-787******* Email: ************.**@*****.***
SUMMARY
Azure Data Engineer with 5+ years of experience designing enterprise data platforms across BFSI, Payments, Retail Analytics, and Customer Insights domains.
Expert in building end-to-end Azure data pipelines using ADF, Databricks (PySpark/Spark SQL), ADLS Gen2, Synapse, and Snowflake.
Strong experience in hybrid data integration, migrating on-prem systems (Oracle, MySQL, Cassandra, Hadoop) to Azure cloud ecosystems.
Skilled in designing high-performance Spark workloads including partitioning, caching, bucketing, broadcast joins, and schema-driven transformations.
Hands-on expertise in Kafka, Spark Streaming, Airflow, and event-driven architectures used in financial, IoT, and customer analytics workloads.
Experienced in data modeling (Star/Snowflake schemas), SCD types, PolyBase, Parquet, Hive Metastore, and curated zone design for ML/BI readiness.
Proficient in CI/CD automation using Jenkins, Azure DevOps, Git, and test-driven ETL frameworks for production-grade deployments.
Strong exposure to GDPR-compliant pipelines, data masking, anonymization, metadata management, and access governance.
Delivered BI solutions using Power BI, SSRS, Tableau, enabling operational KPIs, customer analytics, and performance dashboards.
Solid experience supporting batch + streaming production pipelines with troubleshooting, optimization, and real-time SLA monitoring.
Proven track record collaborating with cross-functional teams across engineering, marketing, legal, and product to deliver scalable, compliant data solutions.
Technical Skills:
Category
Skills
Cloud & Data Platforms
Azure (ADF, ADLS Gen2, Databricks, Synapse, Functions, Logic Apps), Snowflake, Hadoop (HDFS, Hive), GCP (GCS)
ETL & Data Pipelines
ADF Pipelines, Airflow, NiFi, Kafka, Spark Streaming, Snowpipe, PolyBase, Batch/Streaming Orchestration
Big Data & Processing
PySpark, Spark SQL, Scala, Parquet, HiveQL, RDD/DataFrames, Partitioning, Bucketing, Caching
Programming
SQL, Python, Scala, Shell Scripting
Data Modeling & Warehousing
Star/Snowflake Schemas, SCD, Dimensional Modeling, Data Marts, Oracle/MySQL/MariaDB
Analytics & BI
Power BI, Tableau, SSRS, KPI Dashboards, Customer Behavior Analytics
DevOps & CI/CD
Jenkins, Git, Azure DevOps, Test-Driven ETL Deployment, Version Control
Security & Governance
Key Vault, GDPR Compliance, Data Masking, Anonymization, Access Control
Monitoring & Support
Azure Monitor, Pipeline SLA Tracking, Logging, Performance Tuning, Troubleshooting
Education:
Bachelor of Technology in Computer Engineering from Delhi Technological University, India
Master of Science in Data Science from University of Glasgow, Glasgow, UK
Professional Experience:
Role: Azure Data Engineer May 2025 – Till Now
Client: Finastra, London, UK
Responsibilities:
Integrated on-premises (MySQL, Cassandra) and cloud-based (Blob storage, Azure SQL DB) data using Azure Data Factory, applying transformations and loading data into Snowflake. Created ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory.
Collaborated with Azure Logic Apps administrators to monitor and resolve issues related to process automation and data processing pipelines.
Optimized code for Azure Functions to extract, transform, and load data from diverse sources, including databases, APIs, and file systems.
Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks.
Perform ETL using Azure Data Bricks, Migrated on premise Oracle ETL process to azure synapse analytics.
Worked on Migrating SQL database to Azure data lake, Azure data lake analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse.
controlling and granting database access and Migrating on Premise databases to azure data lake store using Azure Data Factory
Data transfer using azure synapse and Polybase.
Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development.
Developed enterprise level solution using batch processing and streaming framework (using Spark Streaming, apache Kafka.
Designed and implemented robust data models and schemas to support efficient data storage, retrieval, and analysis using technologies like Apache Hive, Apache Parquet, or Snowflake.
Developed and maintained end-to-end data pipelines using Apache Spark, Apache Airflow, or Azure Data Factory, ensuring reliable and timely data processing and delivery.
Collaborated with DevOps engineers to establish automated CI/CD and test-driven development pipelines using Azure, aligning with client requirements.
Managed end-to-end operations of ETL data pipelines, ensuring scalability and smooth functioning.
Implemented optimized query techniques and indexing strategies to enhance data fetching efficiency.
Utilized SQL queries, including DDL, DML, and various database objects (indexes, triggers, views, stored procedures, functions, and packages) for data manipulation and retrieval.
Demonstrated proficiency in scripting languages like Python and Scala for efficient data processing.
Executed Hive scripts through Hive on Spark and SparkSQL to address diverse data processing needs.
Actively participated in Agile ceremonies, including daily stand-ups and internationally coordinated PI Planning, ensuring efficient project management and execution.
Environment: ADF, ADLS Gen2, Databricks, Synapse, Snowflake, Kafka, Spark Streaming, Azure Functions, Logic Apps, SQL, Python, Scala, Hive, Jenkins, Git.
Role: Data Engineer Sep 2024 to May 2025
Client: Western Union, Pune, India (Remote)
Responsibilities:
Enhanced Spark performance by optimizing data processing algorithms, leveraging techniques such as partitioning, caching, and broadcast variables.
Implemented efficient data integration solutions to seamlessly ingest and integrate data from diverse sources, including databases, APIs, and file systems, using tools like Apache Kafka, Apache NiFi, and Azure Data Factory.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
Collaborated with cross-functional teams to gather requirements, design data integration workflows, and implement scalable data solutions.
Provided production support and troubleshooting for data pipelines, identifying and resolving performance bottlenecks, data quality issues, and system failures.
Processed the schema oriented and non-schema-oriented data using Scala and Spark.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Created Hive Generic UDF's to process business logic that varies based on policy.
Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).
Load and transform large sets of structured, semi structured, and unstructured data.
Written Hive queries for data analysis to meet the Business requirements.
Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities.
Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data.
Worked on RDD’s & Data frames (SparkSQL) using PySpark for analyzing and processing the data.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
Using JIRA to manage the issues/project workflow.
Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data.
Used Git as version control tools to maintain the code repository.
Environment: Azure, ADF, Databricks, Kafka, NiFi, PySpark, Scala, Hadoop, Hive, HBase, Spark SQL, Snowflake, Git, Jenkins, JIRA.
Role: Big Data Engineer June 2020 – Sept 2024
Client: Dunhumby, Gurugram, India
Responsibilities:
Customer Behaviour Data Ingestion Pipeline
Designed and implemented an end-to-end data ingestion pipeline to process and analyze customer behavior data, enabling business teams to derive actionable insights from customer purchase journeys.
Ingested data into a data lake, applied PySpark-based transformations, and distributed enriched datasets to data science and client-facing teams, supporting over 15 active clients.
Improved data availability and time-to-insight by 30%, enabling real-time advanced analytics and personalized customer strategies.
Environment: Apache Spark (PySpark), Apache Airflow, HDFS, GCS, MariaDB, Python
GDPR-Compliant Data Processing Frameworks
Developed GDPR-compliant data processing frameworks to ensure secure, privacy-aware analytics across European markets.
Designed and implemented data anonymization, masking, and access control measures, maintaining compliance while enabling business insights.
Collaborated with legal, product, and engineering teams to translate regulatory requirements into automated, scalable data workflows.
Environment: Python, Spark (PySpark), SQL, Airflow.
Relaunch of Shopper Thoughts UK and ROI
Led the relaunch of Shopper Thoughts projects in the UK and ROI after a 1+ year hiatus due to DPI removal.
Coordinated with external stakeholders to gather new data format requirements and designed updated ETL scripts and processes.
Developed and deployed new pipelines feeding data directly to Tesco’s API, ensuring accurate, timely, and compliant data delivery.
Improved operational efficiency and restored actionable insights for marketing and analytics teams.
Environment: Python, Spark (PySpark), Airflow, APIs, SQL, GCS