Azure Data Engineer with 5+ Years Experience

Location:

Coventry, West Midlands, CV3 1AD, United Kingdom

Salary:

45000

Posted:

December 12, 2025

Contact this candidate

Resume:

Shrinivas

Azure Data Engineer

Email:*****************@*****.***

Contact: +44-734*******

Professional Summary:

5+ years of experience in the field of IT industry with perseverance and diligence towards attaining challenging goals and strong knowledge in Azure Data Platform servicesAzure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, Databricks, NoSQL DB,SQL Server, Oracle, Data Warehouse etc. Build multiple Data Lakes.

Experience in creating optimized data models and schemas tailored for scalability and efficiency in databases and data lakes.

Experience migrating on-premises SQL databases to Azure Data Lake, Synapse Analytics, Azure SQL Database, and Databricks with Delta Lake, leveraging ADF pipelines and Mapping Dataflows for seamless integration.

Knowledge of SDLC approaches with hands-on experience in Agile/Scrum for delivering software in a structured and collaborative manner.

Experienced in handling datasets to unify and integrate data from systems like Infor, Salesforce, Netsuite, Cin7, and Xero.

Experience in orchestration tools such as Azure Data Factory and Databricks workflows.

Experienced in implementing Infrastructure-as-Code (IaC) using Terraform to automate and standardize Azure cloud resource provisioning.

Utilize Azure Data Factory (ADF) to create an automated, near real-time data integration solution for moving data from Data verse to Azure SQL, ultimately ensuring data is consistently up-to-date and ready for reporting.

Hands-on in writing scripts with Python API, PySpark API, and Spark API for large-scale data analysis, leveraging libraries like NumPy and PyTest and implementing Databricks Delta Live Tables and Unity Catalog for governance and reliability.

Experience working with AWS S3, Glue, EMR, Lambda, Athena, IAM, CloudWatch, Kinesis for scalable data ingestion, ETL transformations, security, and analytics operations.

Designed and implemented a data migration pipeline from on-premises systems to Azure with Synapse Analytics.

Working knowledge of the Azure cloud platform (HDInsight, Data Lake, Databricks, Blob Storage, Data Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer).

Experience building data pipelines using orchestration tools such as Airflow and AzureDataFactory.

Hands-on experience in data warehousing projects, including ETL design and development with SSIS and legacy migration exposure.

Experience working with Amazon Redshift for scalable analytical workloads, including data warehousing, SQL optimisation, and integration with S3-based pipelines.

Experience in end-to-end involvement in database and data warehouse platforms,particularly those leveraging MS SQL server in combination with wherescape RED.

Experience in Metadata management solutions like creating Data Dictionary and Data Ingestion/Quality rules, data warehouse migration projects to develop and implement data replication, data ingestion and data transformation solutions using Azure Data Factory, Snowflake.

Gained experience in SQL across several dialects (we commonly write SQL Server, Azure SQL, Synapse, Oracle.

Developed knowledge in designing and managing mappings, workflows, and sessions, as well as conducting database-level query testing, unit testing, and acceptance testing.

Implemented modern BI solutions using Power BI (Service, Desktop, DAX, Workspaces, and Gateways) along with SSIS, SSAS, and SSRS for dashboards, reporting, and advanced analytics.

Worked on data migration initiatives from Teradata and Oracle into SQL Server, creating automation scripts with UNIX shell scripting and SQL for Oracle/Teradata.

Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.

Experience in DevOps practices with Jenkins for CI/CD, Git for source control (including Git Flow), and Atlassian tools(Jira, Bit bucket, Source tree) for project management and version control.

Build Complex distributed systems involving huge amount data handling, collecting metrics building data pipeline, and Analytics.

Hands-on experience I creating and managing SQL Server database objects such as Tables, Constraints,Indexes, Views, IndexedViews, StoredProcedures, UDFs and Triggers.

Experience in leveraging Azure Data Factory (ADF) for orchestrating, scheduling, and automating complex data pipelines.

Skilled in configuring role-based access (RBAC), authentication, and encryption strategies to ensure secure data storage in Azure SQL Database and ADLS.

Skilled in leveraging Spark and various Hadoop ecosystem components to handle big data workloads for processing, storage, and analytical solutions.

Hands-on experience with Hive partitions and bucketing, along with designing optimized managed and external tables. Skilled in handling multiple file formats (Avro, Parquet, ORC, JSON, XML) and compression methods like Snappy and ZIP.

Skilled in applying Scrum principles to translate requirements into well-defined epics and stories.

Worked with Agile methodologies and participating in sprint planning, stand-up meeting, and retrospectives with Agile ensure effective testing within an iterative development environment.

Technical Skills:

Azure Cloud

Azure Data Factory (ADF), Azure Data Lake, Azure Databricks, Azure Synapse Analytics, Azure SQL Database, Azure Event Hubs, Azure Key Vault, Azure Stream Analytics, Azure Storage, Azure Analysis Services

ETL Tools

Azure Data Factory (ADF), SSIS, Data Integration with APIs (Python API, PySpark API)

Data Modeling

Dimensional Modeling, Star Schema, Snowflake Schema, Data Marts, OLAP, Kimball Methodology, Optimized Data Models & Schemas

BI Tools

Report Builder, Power BI

Spark Frameworks

Spark Core, Spark SQL

Tools

GIT, TFS, Azure Devops

Languages

Python, SQL, Pyspark, Pandas, Spark-SQL, Hive, Scala

Data Base

SQL Server, Azure SQL Database, PostgreSQL, Oracle, Netezza, DB2, Snowflake, Teradata, DynamoDB, Cosmos DB, MongoDB, Cassandra, HBase

Certification: Microsoft Certified: Fabric Data Engineer Associate

Professional Experience:

Company: WNS Global Services Client: Standard Chartered Bank

Role: Sr. Azure Data Engineer Duration: April 2024 – Till Date

Description: The project focuses on building a modern cloud-native Data Platform for Banking & Financial Services, enabling advanced analytics, regulatory reporting, and real-time risk monitoring. The solution leverages Azure Data Platform services (ADF, Databricks, Synapse, ADLS, Power BI) to design scalable ETL/ELT pipelines, integrate diverse structured & unstructured sources (transactional data, payments, market feeds), and implement real-time streaming ingestion using Kafka/Event Hub. The initiative also includes data modeling, data quality frameworks, CI/CD automation, and secures access control to support compliance (Basel, AML, and GDPR) and deliver self-service analytics for stakeholders.

Responsibilities:

Hands on experience in SDLC(Software Development Life Cycle), from initial requirements gathering all the way to deployment and ongoing maintenance.

Implemented spark context and spark SQL to process and query and semi-structured data.

Designed and implemented data integration pipelines to ingest Oracle-based and other enterprise sources into Azure (ADLS, AZURE SQL, and Synapse) for consolidated analytics.

Hands-on experience with modern data engineering tools, including Databricks Delta Lake, SQL, Spark, Dagster,DBT, Temporal, and Airflow.

Designed and deployed Infrastructure-as-Code (IaC) using Terraform to automate provisioning of Azure resources such as Storage Accounts, Data Factory, Virtual Networks, Key Vault, Databricks, and Synapse.

Implemented modular Terraform code for reusable, scalable, and consistent infrastructure deployments across multiple environments.

Developed CI/CD pipelines in Azure DevOps to automate Terraform plan, validate, and apply steps for seamless infrastructure releases.

Created Terraform templates for automated deployment of Azure Data Lake, ADF pipelines, Databricks clusters, Synapse workspaces, and networking components.

Utilized Pandas for data manipulation, NumPy for numerical operations, and Scikit-learn for building and implementing machine learning models.

Led PoCs supporting fraud detection by integrating AWS S3 ingestion with Databricks back-testing models to evaluate anomaly-based ML scoring.

Built distributed ETL pipelines using AWS Glue + PySpark to prepare transaction datasets for real-time and batch fraud analytics.

Experience includes using Pyspark to develop and fine-tune Spark applications, focusing on performance and scalability for large-scale data processing.

Integrated Kafka streaming AWS S3 raw zone, enabling secure high-volume storage for downstream model scoring.

Designed and implemented ETL pipelines from AWS S3 into Amazon Redshift to support scalable fraud analytics and reporting workloads

Implemented distribution styles, sort keys, and vacuum operations in Redshift to improve query performance for high-volume transactional datasets.

Integrated Kafka S3 Redshift ingestion patterns for near real-time data availability.

Implemented CI/CD pipelines with Jenkins & GitHub for automated deployment of AWS ETL jobs and infrastructure changes.

Applied AWS IAM, server-side encryption & policies to ensure regulatory compliance for sensitive banking data.

Designed and developed data pipelines that combined Kafka and Spark to process high-velocity streaming data.

Created and optimized tables, views, storedprocedures, and functions in AzureSQL Database to support reporting and analytics workloads.

Implemented CI/CD for Azure Data Factory and Databricks using Azure DevOps, ensuring automated deployments and version control of pipelines and notebooks.

Exposure to GCP (Big Query, Pub/Sub) for comparative analysis, while primarily focused on Azure cloud data engineering.

Having experience in working with Azure Blob Storage and Azure Data Lake Storage, and skilled at loading data into Azure Synapse Analytics(SQL DW).

Utilized snowflake cloud data warehouse to integrate data from multiple source systems, including nested JSON files, into snowflake tables.

Designed and executed data pipelines in Azure Data Factory by utilizing control flow features like Lookup, Until, Web, Wait, and If Condition to streamline complex integration tasks.

Built and optimized Databricks notebooks with SQL and Python, implementing automated workflows through Databricks Jobs to efficiently process large-scale datasets on a defined schedule.

Create and maintain optimal data pipelines architecture in cloud Azure using Data Factory and Azure Databricks.

Designed and created tables in Azure SQL Database to support business reporting and data visualization needs.

Experience in connecting Power BI to Spark, enabled the creation of dynamic, interactive visualizations and reports from large datasets.

Worked with Spark RDDs and DataFrames to build efficient data pipelines, performing transformations, aggregations, joins, and feature engineering to process large-scale datasets.

Designed and maintained efficient data pipelines architectures in Microsoft Azure using Azure Data Factory and Azure Data bricks for scable cloud-based data processing.

Environment:Azuredatafactory,SDLS,SSMS,Informatica,MySQL,Tableau,HIVE,AzureStorage,SQLpoolserver,SQLDatabase,SQLElasticPool,VirtualMachineSQLServer, Databricks,Spark,PowerBI,Pipelines,Python,Azure Synapse,Azure Blob storage,Data lake, REST API,Big Query, Kafka, ETL,S3, Glue, EMR, Lambda, Athena, CloudWatch, IAM,Amazon Redshift.

Company: Infosys Client: Citi Bank

Role:Azure Data Engineer Duration:July 2022 – April 2024

Description:Worked on building a scalable data integration and analytics platform for Citi Bank to consolidate enterprise banking data across multiple systems, including payments, loans, and customer transactions. The project focused on migrating legacy ETL workloads into Azure cloud services, enabling both batch and real-time processing with improved scalability, governance, and reporting capabilities.

Responsibilities:

Created several data frames and datasets using spark-SQL for the purpose of data preprocessing before modeling.

Experience in end-to-end ETL pipelines, specialize in using notebooks to build and optimize these processes, ensuring efficient data movement and reliable transformation logic.

Hands-on experience with key big data technologies, including the Hadoop ecosystems, spark, Kafka, and others, for building and managing data pipelines.

Experience in designing and implementing pipelines and data infrastructure in cloud environments, with a solid understanding of trade-offs between managed and custom-built solutions, using Python and SQL.

Managed end-to-end ETL design covering source system identification, source-to-target design, data profiling, cleansing, quality validation, and documentation.

Automated event-driven data quality checks using AWS Lambda & CloudWatch.

Queried and validated structured/semi-structured data using AWS Athena, improving audit reporting speed.

Conducted Terraform code reviews, enforced best practices, and ensured compliance with organizational cloud governance.

Experience developing and deploying robust data pipelines in Azure Data Factory (ADF). Involved creating pipelines to extract, load, and transform (ELT) data from diverse source systems.

Used variety of ADF activities to ensure the data was properly transformed and prepared for its destination.

Built and optimized DatabricksSparkpipelines with PySpark to execute transformations and table-to-table operations across large datasets efficiently.

Design and build ETL pipelines to move data into and out of a data warehouse. This is achieved using a combination of Python for complex logic and transformations,and Snow SQL for data manipulation within the warehouse.

Implemented automated ingestion pipelines with Power Automate and Azure Logic Apps to seamlessly transfer structured and unstructured data from SharePoint into Azure Storage.

Developed and executed ETL processes to move data from multiple source systems into Azure Data Storage. This involved in using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL on the Azure Data Lake Analytics platform.

Implemented CI/CD methodologies to accelerate development workflows and ensure reliable deployments.

Having experience in RDBMS data transformation. By building a date-time logic framework withdynamic parameters, able to streamline record processing, leading to improves efficiency and performance.

Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data or converted to Parquet using Data Frames in PySpark.

Designed and implemented a streamlined data ingestion process using SSIS. This solution automates the secure transfer of files from FTP/SFTP sources, including decryption and transformation, and includes integrated error handling and alerting to ensure data integrity and reliability.

Collected and processed data to generate monthly reports, followed by visualization using Tableau and Python.

Environment:AzureDataLake,ETL,Pipelines,AzurelogicApps,SQL,SSIS,FTP/SFTP,Pyspark,TSQL,USQL,Azurestorage,Python,DataBricks,BigData,SnowSQL,Hadoop,Kafka,notebooks,Agile,Scrum,RDBMS,CI/CD.

Company: IBN Technologies Client: HDFC Bank

Role: Data Engineer Duration:May 2020 – July 2022

Description:The project focused on building a modern cloud-native data platform on Azure to enable scalable ETL pipelines, real-time insights, and self-service analytics for enterprise operations. The solution integrated structured and unstructured data sources, established a centralized data warehouse, and implemented advanced data modeling and governance frameworks to support reporting, compliance, and business intelligence.

Responsibilities:

Designed, developed, and implemented modern data solutions on Azure PaaS to enable efficient and effective data visualization.

Built scable ETL pipelines to ingest and process data from source systems into Azure Data Lake and other Azure Storage services using Azure Data Factory, Databricks, and Spark SQL.

Developed tailorsUDFs using Python and Pyspark, enabling the completion of unique business objectives.

Actively monitored and resolved issues to ensure the high availability of ETL pipelines in Azure.

Developed and deployed Power BI dashboards to provide actionable intelligence for stakeholders, all while maintaining high data quality standards.

Built a scable data warehouse on Azure leveraging Blob Storage, Data Lake, and Synapse to efficiently manage and store massive datasets.

Ingested and modelled enterprise banking datasets into Amazon Redshift to support downstream BI and analytics platforms.

Performed ingestion of enterprise datasets into AWS S3, enabling downstream processing in Databricks & Snowflake.

Created and maintained analytics dashboards in Metabase to provide actionable insights through data reporting and visualization.

Developed and automated cloud-based data pipelines using Apache Airflow, integrating object storage and Snowflake, aligning with modern Azure PaaS ETL practices.

Experience in data mapping, which involves taking data from source systems, transforming it, and loading it into target datasets. Have knowledge in updating corresponding information within OLAP databases.

Create interactive documents in Jupiter Notebook that merge runnable code, visualizations, equations, and explanatory content.

Utilized Python and SQL to integrate data from diverse sources such as APIs, files, and databases, enabling comprehensive analysis.

Worked on custom Pig Loaders and storage classes to work with variety of data formats as JSON and XML file formats.

Environment:Azure PaaS, Azure Factory, API,JSON,XML,Datamapping,Jupiter Notebook,Data Bricks, DataLake,Metabase,S3,ApacheAirflow,SQL,Python,Snowflake,OLAPBlobstorage,PowerBIDashboards,Synapse,Amazon Redshift.

Contact this candidate