Laxminarayana Moru
Microsoft Azure and Fabric Certified Data Expert
Mobile: 945-***-****
Email: ******.****@*****.***
Professional Summary:
With over 13 years of experience in delivering data solutions—including design, architecture, and development—I have collaborated with clients such as MassDOT, Microsoft, A.J.G, Howden Insurance Group, Amazon, and Providence Healthcare.
I have deep expertise in Azure Synapse Analytics, Microsoft Fabric, Databricks, Apache Spark, Azure Data Storage, Azure Data Factory, Azure Key Vault, Data Build Tool (DBT), Azure SQL, Python, PySpark, Spark SQL, Data Warehousing, Apache Airflow, Snowflake, MS SQL Server, KQL, SSIS, and Git/GitHub with CI/CD. Additionally, I have strong experience in Dimension-Modeling using tools like ERWin, ER Studio, and Lucidchart. I also have extensive experience with various AWS services, including S3, Glue, QuickSight, Lambda, and Redshift.
Certifications:
DP - 700: Microsoft Certified Fabric Data Engineer
DP - 203: Microsoft Certified Azure Data Engineer
AZ - 204: Microsoft Certified Azure Developer
Summary:
Extensive experience in architecting, designing, developing, and deploying end-to-end ETL and ELT solutions using Azure Synapse Analytics, Microsoft Fabric, Databricks, Apache Spark, Azure Data Storage, Azure Data Factory, Azure Key Vault, Data Build Tool (DBT), Azure SQL, Python, PySpark, Spark SQL, Data Warehousing, Apache Airflow, Snowflake, MS SQL Server, KQL, SSIS, and Dimension-Modeling. Additionally, proficient in AWS services such as S3, Glue, QuickSight, Lambda, and Redshift, as well as Git/GitHub with CI/CD for version control and automation.
Implemented a knowledge graph using the Neo4j graph database, processing unstructured data with open-source LLMs like Mistral, and leveraging prompt templates for enhanced data processing.
Implemented a mechanism to query the knowledge graph in the Neo4j graph database by generating Cypher queries from natural language, utilizing prompt templates and leveraging open-source LLMs like Mistral.
Developed vector embeddings with Mongo Atlas collections to enable semantic search, employing Retrieval-Augmented Generation (RAG) for custom LLMs.
Skilled in ingesting and integrating raw data from heterogeneous sources into the Enterprise Data Warehouse (EDW) using Azure Data Factory, SSIS, and AWS Glue.
Proficient in building dynamic Azure Data Factory pipelines by parameterizing Linked Services and Datasets and implementing dynamic auto-mapping of source and sink using metadata configured in a control database, enabling efficient and scalable data integration.
Experienced in leveraging Azure Data Factory and SSIS for data integration and orchestration, efficiently loading data from diverse source formats, including SQL Server, Azure SQL, ISAM files, MySQL, JSON, Parquet, XML, CSV, REST API, SFTP, Salesforce, Snowflake, and HTTP, ensuring seamless data flow and optimized processing.
Expertise in utilizing Azure Blob Storage and Azure Data Lake Store to efficiently store various data formats, ensuring low-cost, high-performance analytical processing.
Designed and implemented Medallion architecture with Lakehouse in Azure Databricks, leveraging Delta Format Tables, Auto Loader, Data Build Tool (DBT), Unity Catalog, Workflows, and Delta Live Tables. Applied various Spark optimization techniques, including Partitioning, Bucketing, Caching, AQE, Data- Skipping using Statistics and Bloom-Filter Index, Deletion Vector with OPTIMIZE command (including Z- Ordering), VACCUM command, and Liquid Clustering.
Extensive hands-on experience with Azure Databricks Notebooks, utilizing PySpark and Spark SQL to process and transform large datasets, and orchestrating workflows using Databricks Workflows and Azure Data Factory.
Expertise in Azure Synapse Analytics, a massively parallel processing data warehouse, leveraging PolyBase, COPY command, OPEN ROWSET function, native and Hadoop-type external tables, Serverless SQL pool for ad-hoc querying, Dedicated SQL pool for persisted data warehousing, Distribution Types, Spark pool, Notebooks, and data pipelines.
I have strong expertise in Microsoft Fabric, particularly in its Data Engineering and Real-Time Analytics capabilities. Proficient in working with Lakehouses, Warehouses, Data Flows, Spark, Notebooks, Data Pipelines, SQL, KQL, KQL Query Sets, Event Streams, Real-Time Dashboards, and Event Houses, ensuring seamless data processing and analytics within the Fabric ecosystem.
Proficient in using Azure Key Vault to securely store connection strings and secrets, enabling double encryption for Azure Synapse Analytics and Azure Data Factory.
Experienced in orchestrating data workflows using Apache Airflow with Python, leveraging core components such as the Scheduler, Web Server, Executor, Worker, Metadata Database, Logs, Hooks, Sensors, DAG, Task, Task Groups, and Operators.
Expertise in designing OLTP systems using Snowflake schema and OLAP systems using Star schema data models, utilizing tools such as ERWin, ER Studio, and Lucidchart.
Strong technical expertise in MS SQL Server (T-SQL), with a focus on performance tuning including query optimization, reporting, and troubleshooting.
Strong technical expertise in KQL (Kusto Query Language), along with Azure Synapse and Microsoft Fabric, for analyzing semi-structured and time-series data.
Designed and implemented a near real-time dashboard along with a data pipeline, leveraging AWS services such as QuickSight, S3, Glue, Lambda,Redshift, and Python.
Proficient in Python with libraries such as Pandas, NumPy, Matplotlib, LangChain, Hugging Face, and others.
I have been working with Snowflake for over the past two years to process EPIC EHR data, enabling the development of Patient Experience (PEAP) and Registry Management System (RMS) models.
Strong expertise in designing and implementing ETL pipelines using Control Flow, Data Flow, Event Handlers, and Logging within SSIS packages.
Skilled in monitoring production SSIS packages and troubleshooting issues or failures.
Strong expertise in handling production issues, especially P0 (priority zero) issues.
Collaborated with Business Analysts and Product Owners to understand business requirements, recommend technical solutions, and document both functional requirements and technical designs/architectures.
Proficient in using Azure Git and GitHub for version control, code collaboration, and CI/CD integration. Experienced in Azure Boards, branching strategies, pull requests, and code reviews with Azure DevOps and GitHub.
Work Experience:
Working as a Lead Data Engineer on a contract position for MassDOT (Massachusetts Department of Transportation), based in Boston, Massachusetts, USA, since February 2025.
Worked as Principal Data Engineer/Data Architect at Providence, Hyderabad from January 2023 to January 2025.
Worked as Data Engineer at Amazon, Hyderabad from September 2022 to December 2022.
Worked as Lead Data Engineer/Data Architect at Accenture (Earlier known as Pramati Technologies), Hyderabad from Jan 2014 to Aug 2022.
Worked as Software Engineer at MAQ Software, Hyderabad from Sept 2011 to Jan 2014. Educational Qualification:
Master of Computer Applications (MCA) from Osmania University with 78% (2008-11).
Bachelor of Science from Kakatiya University with 83% (2005-08).
Board of Intermediate Education with 85% (2003-05).
SSC from K.H.S with 82% (2002-03).
Technical Skills:
Azure Data Factory, Apache Airflow, SSIS, and AWS Glue for Data Integration and Orchestration
Apache Spark with PySpark & Spark SQL, as well as T-SQL (MS SQL Server), for data processing and transformations.
Azure Synapse Analytics and Azure Databricks to build an end to end analytics solutions
Azure Synapse Analytics with PolyBase, COPY command, OPEN ROWSET function, native and Hadoop-type external tables, Serverless SQL pool for ad-hoc querying, Dedicated SQL pool for persisted data warehousing, Distribution Types, Spark pool, Notebooks, and data pipelines
Medallion architecture (Bronze, Silver, and Gold layers) with Lakehouse in Azure Databricks, leveraging Delta Format Tables, Auto Loader, Data Build Tool (DBT), Unity Catalog, Workflows, and Delta Live Tables. Applied various Spark optimization techniques with Delta format tables, including Partitioning, Bucketing, Caching, AQE, Data-Skipping using Statistics and Bloom-Filter Index, Deletion Vector with OPTIMIZE command
(including Z-Ordering), VACCUM command, and Liquid Clustering.
Azure SQL to support the OLTP applications.
Python with libraries such as Pandas, NumPy, Matplotlib, LangChain, Hugging Face, and others.
Azure Blob, Azure Data-Lake Gen2, and AWS S3 for cloud storage.
Microsoft Fabric with Lakehouses, Warehouses, Data Flows, Spark, Notebooks, Data Pipelines, SQL, KQL, KQL Query Sets, Event Streams, Real-Time Dashboards, and Event Houses.
Data-warehousing and Dimension-Modelling with Snowflake schema for OLTP and Star schema for OLAP.
AWS services like S3, Glue, Lambda, Quicksights, and Redshift.
Gen-AI features like Vector Embedding (Mango Atlas) and Semantic Model (Knowledge Graph using Neo4j).
Azure Git/GitHub code repositories with CI/CD integration. Project #12:
Feb 2025 to till date
Project Title: MassDOT Enterprise
Payroll Company: SMD Technosol (Yerralpha LLC)
Role: Lead Data Engineer/ETL Development
Client: MassDOT, Boston, U.S.A
Environment: Azure SQL, SQL Server, Azure Data Factory, Python, Azure Functions, Azure Data-Lake Gen2, DBT, and Snowflake.
Description:
MassDOT Enterprise handles diverse data sources including Crash Reports, Driver’s Licenses, Commute Statistics, ServiceNow, Vehicle Registration, Vehicle Inspection, Driver Training Providers, Interpretive Services, RMV Appointments, and more. Analytics and insights are delivered through Power BI dashboards, with some dashboards being publicly accessible and others restricted. Initially, data ingestion from various source systems was implemented using Azure Data Factory (ADF), and processed data was modeled and stored in Azure SQL. The architecture has since been migrated to Snowflake, establishing it as the new data lake. This transition leveraged Azure Data Factory for orchestration and DBT (Data Build Tool) for data transformation and modeling.
Additionally, Azure Functions were utilized to implement custom logic using Python—for example, converting state-based coordinate data into latitude and longitude values. Responsibilities:
Designed and implemented data pipelines using Azure Data Factory for ingesting data into Azure Data Lake Gen2 from both file-based and REST API-based source systems.
Developed transformation workflows using DBT (Data Build Tool) with SQL and YAML configurations, leveraging key DBT concepts such as Models, Macros, Sources, Seeds, Snapshots, and Tests.
Designed and implemented Azure Functions using Python to handle custom logic—for example, converting state-based coordinate data into latitude and longitude values.
Created and optimized complex SQL queries to process raw data and build a robust reporting layer for consumption in Power BI analytics dashboards.
Designed and implemented a dimensional data model in Snowflake aligned with business requirements to support analytics and reporting via Power BI dashboards. Project #11:
Dec 2023 to Jan 2025
Project Title: RMS (Registry Management System)
Payroll Company: Providence India
Role: Principal Data Engineer/Data Architect
Client: Providence Healthcare, U.S.A
Environment: Azure SQL, SQL Server, Python, Neo4j graph database, Mongo Atlas Vector database, Airflow, Azure Data-Lake Gen2, Azure Databricks, DBT, Medallion architecture, Lakehouse, Apache Spark, Spark SQL, Pyspark, Unity Catalog, Auto Loader, SQL, and Snowflake
Description:
The Registry Management System (RMS) deals with various disease registries such as Strokes, Heart Failure, Cardiovascular Disease (CVD), and Neonatal Intensive Care Units (NICU). In the USA, Healthcare facilities are required to submit patient health and treatment details to different regulatory bodies, such as GWTG, NCDR, and VON. We typically consume data from the EPIC EHR system and process it by loading it into UDM and registry-specific data models, both discrete and non-discrete, for review by an abstractor. The abstractor then submits the data to the regulatory body upon completion of the review process. Discrete fields come directly from the Clarity database, an intermediary layer between EPIC EHR and our system, while nondiscrete fields are extracted from patient notes using open-source large language models (LLMs) like Mistral, employing prompt templates.
Responsibilities:
Analyzed EPIC EHR data and performed gap analysis to identify the essential fields for each registry.
Designed the Unified Data Model (UDM) dimension model to integrate all subject areas, including Medication, Lab Results, Procedures, Admissions, Discharges, Vaccinations, Patient Demographics, and Common Reference Data, aligning with regulatory bodies' common data points across registries.
Developed a NICU registry-specific dimension model, leveraging the UDM for shared data points while incorporating unique registry-specific fields.
Built robust and resilient data engineering pipelines with Apache Airflow to ingest the identified fields from the EPIC EHR system into Azure Data Lake Gen2.
Designed and implemented a Medallion architecture in Azure Databricks, incorporating Notebooks, Spark, Delta format tables, Auto Loader, Data Build Tool (DBT), and Unity Catalog for data governance. This architecture refined and curated data at each layer (Bronze, Silver), built final model at Gold layer, and supported downstream models.
Mounted Azure Data Lake to the Databricks workspace for seamless ingestion of data into the Bronze layer.
Implemented transformations using Data Build Tool (DBT) with SQL and YAML configuration, leveraging DBT concepts such as Models, Macros, Sources, Seeds, Snapshots, and Tests.
Integrated Unity Catalog with Databricks workspace to ensure centralized, fine-grained data governance.
Implemented Notebooks with various transformations to process ingested large datasets from Azure Data Lake Gen2, refining data through different Medallion architecture layers.
Orchestrated Databricks Notebooks using Databricks Workflows.
Wrote complex SQL queries and stored procedures to apply business transformations on source data, refreshing the final data model in the EDW.
Processed unstructured patient notes using open-source LLMs like Mistral with prompt templates, extracting fields and loading the data into Neo4j graph database to build a knowledge graph.
Automated data accuracy checks against ground-truth data after development completion for each registry.
Led the data engineering team in both technical and functional activities, including design, development, and delivery.
Actively participated in all agile sprint ceremonies.
Managed a variety of production issues, implementing optimization techniques such as Partitioning, Bucketing, Caching, AQE (Adaptive Query Execution), Data Skipping using Statistics and Bloom-Filter Index, and Deletion Vector with the OPTIMIZE command (including Z-Ordering).
Utilized the VACUUM command and Liquid Clustering to ensure optimal performance. Project #10:
Jan 2023 to Nov 2023
Project Title: PEAP (Patient Experience Analytical Platform) Payroll Company: Providence India
Role: Principal Data Engineer/Data Architect
Client: Providence Healthcare, U.S.A
Environment: Python, Azure Data Factory, Snowflake, Databricks, DBT, Unity Catalog, Spark SQL, Python, Lakehouse with Medallion architecture, Neo4j graph database, Data-warehousing, Auto Loader, Dimensional Modelling, Apache Spark, Pyspark, Apache Airflow, SQL Server, and Azure Data Lake Description:
PEAP is a patient experience system which is designed to capture feedback and insights about the inpatients & outpatients and extract derivatives out of it to form mandatory questionnaire governed by US healthcare and then recapture the answers from the patients and run analytics on top of the obtained results. Responsibilities:
Analyzed requirements and designed a scalable Medallion architecture to support the timely and resilient processing of large datasets.
Designed the PEAP dimension model, incorporating survey information to be sent to patients and responses to be received.
Built robust and resilient data engineering pipelines using Apache Airflow to ingest patient experience- related fields from the EPIC EHR system into Azure Data Lake Gen2.
Designed and implemented a Medallion architecture with Lakehouse in Databricks, leveraging Notebooks, Spark, Delta format tables, Auto Loader, Data Build Tool (DBT), and Unity Catalog for data governance. This architecture refined and curated data at each layer (Bronze, Silver), built the final model at the Gold layer, and supported downstream models.
Implemented transformations using Data Build Tool (DBT) with SQL and YAML configuration, leveraging DBT concepts such as Models, Macros, Sources, Seeds, Snapshots, and Tests.
Integrated Unity Catalog with the Databricks workspace to ensure centralized and fine-grained data governance.
Processed unstructured survey response using open-source LLMs like Mistral with prompt templates, extracting fields and loading the data into Neo4j graph database to build a knowledge graph.
Mounted Azure Data Lake to the Databricks workspace for seamless ingestion of data into the Bronze layer.
Implemented Notebooks with various transformations to process large datasets from Azure Data Lake Gen2, particularly handling complex transformations for processing XML files received as part of survey responses.
Generated CSV extracts using Notebooks and uploaded them to Azure Data Lake Gen2, ensuring that the data was available for the internal Providence team to share with the survey vendor, in compliance with organizational regulations.
Orchestrated Databricks Notebooks using Databricks Workflows.
Wrote complex SQL queries and stored procedures to apply business transformations on source data, refreshing the final data model in the EDW.
Designed and implemented an automated acknowledgement process in Python to handle data rejected by the survey vendor, providing insights into erroneous data to improve data quality.
Led the data engineering team in both technical and functional activities, including design, development, and delivery.
Actively participated in all agile sprint ceremonies.
Managed a variety of production issues, including ongoing sprint enhancements and the onboarding of new service lines such as ED, Hospitalist, and Hospital At Home. Project #9: Sep 2022 to Dec 2022
Project Title: Automated Inventory Management
Payroll Company: Amazon India
Role: Data Engineer
Client: Amazon, U.S.A
Environment: Spark, Glue, RedShift, S3, SQL, Lambda, Cradle&Datanet (Internal tools), Quicksights, and Python. Description:
Automated Inventory Management is part of SCOT (Supply Chain Optimization Technology) and is designed to identify the defects across the SCOT four verticals (Forecasting, Sourcing, IPC and FO) and fix them in automated way to improve the Amazon sales across the WW.
Responsibilities:
Analyzed requirements and designed efficient data pipelines.
Implemented a serverless AWS Lambda function using Python to connect to the Datanet and Cradle APIs, consuming data pipeline run logs and ingesting data into AWS S3 buckets.
Utilized AWS Glue to refresh Redshift Spectrum external tables from AWS S3 buckets.
Wrote complex SQL queries to apply transformations on pipeline run logs data, refreshing the final data model in the Redshift.
Processed data from Redshift Spectrum tables to build the final model required for AWS QuickSight dashboards.
Designed and implemented QuickSight dashboards with various visualizations, including line charts, tabular reports, and pie charts, to provide insights into running pipelines and support on-call activities.
Provided proactive on-call support for data-related issues and pipeline monitoring. Project #8: Feb 2022 to Aug 2022
Project Title: Complete Source Migration
Payroll Company: Accenture India
Client: Howden Insurance Group, U.K
Role: Lead Data Engineer/Data Architect
Environment: Azure Data factory for Data Integration and Orchestration, Azure Data Lake, SQL Server, Python, Azure Databricks, PySpark, Spark SQL, Apache Spark, SQL Server, Lakehouse with Medallion architecture, Unity Catalog, Auto Loader, DBT, KQL, and Azure Synapse Analytics. Description:
As part of Partial Migration, we migrated the semi processed data from Accelerator 1.0 Staging into new Data Platform to meet the Business timelines to deliver key dashboards against the new Data Platform. Now, we need to switch to ingesting the raw data from source system itself and process it in Data Platform itself rather than semi-processed data from Accelerator 1.0 Staging.
Responsibilities:
Designed source ingestion pipelines to support the data migration process.
Created metadata-driven, generic Azure Data Factory (ADF) pipelines to ingest data into Azure Data Lake Gen2, promoting re-usability across source formats and reducing development effort for onboarding new data sources by configuring metadata in a control database.
Designed and implemented the Bronze and Silver layers as part of the Medallion architecture with Databricks including Auto Loader, processing data from Azure Data Lake, applying data validation, business transformations, and delta computations before loading into the Gold layer.
Developed dynamic Notebooks by configuring business rules in a control database to maximize re-usability and minimize development effort.
Orchestrated Databricks Notebooks using Databricks Workflows and Azure Data Factory.
Applied a variety of transformations using PySpark and Spark SQL, leveraging Spark and Delta table optimization techniques to enhance performance.
Wrote complex SQL queries and stored procedures to apply business transformations on source data, refreshing the final data model in the EDW.
Mounted Azure Data Lake to the Databricks workspace for seamless ingestion of data into the Bronze layer.
Implemented transformations using Data Build Tool (DBT) with SQL and YAML configuration, leveraging DBT concepts such as Models, Macros, Sources, Seeds, Snapshots, and Tests.
Integrated Databricks Unity Catalog for centralized data governance and managed access controls.
Refreshed the Azure Synapse Dedicated SQL pool after the Gold layer update to support downstream processes, including Power BI reports, Angular reports, ad-hoc querying, and data analysis by Business Analysts.
Created Native and Hadoop-type external tables to support ad-hoc and exploratory tasks for Data Analysts and Business Analysts.
Led a team of 5 developers, overseeing technical aspects of design and development, with hands-on involvement in ADF, Data Lake, PySpark, Spark SQL, Auto Loader, Unity Catalog, and Azure Synapse.
Proficient in analyzing data using SQL and KQL, and utilizing Azure Synapse Analytics for efficient processing of time-series and semi-structured data, enabling real-time insights.
Conducted regular reviews to ensure implementations met business requirements and adhered to design and coding standards.
Project #7: August 2021 to Jan 2022
Project Title: BU Re-design
Payroll Company: Accenture India
Client: Howden Insurance Group, U.K
Role: Lead Data Engineer/Data Architect
Environment: Azure Data factory for Data Integration and Orchestration, Azure SQL Server, Pyspark, Data- warehousing, Dimensional Modelling, Azure Data-Lake, Azure Synapse Analytics, Azure Databricks, Medallion architecure with Lakehouse, Spark SQL, SQL Server. Description:
Business Unit is a key classification of HIG Business, Customer more interested to analyze the data in Dashboards/Reports from Business Unit point of view. However, legacy design of Business Unit in Accelerator is become an over-headache to Business to handle certain business scenarios like business split into two, merging multiple business unit into one etc.… and ETL team is also suffering to manage every FY roll-over work for Business Unit due to the legacy design and taking lot of manual effort for every FY Roll-over. Hence Business decided to re- design the Business Unit module to make job easy for everyone including Data Stewards, ETL team, and Business team.
Responsibilities:
Designed a dimension model to streamline the Year-on-Year rollover for business unit structures, providing flexibility for data stewards to manage these structures with ease.
Created metadata-driven, generic Azure Data Factory (ADF) pipelines to ingest data from Azure SQL server into Azure Data Lake Gen2, promoting re-usability and reducing development effort for onboarding new business unit structures.
Mounted Azure Data Lake to the Databricks workspace for seamless ingestion of data into the Bronze layer.
Developed dynamic Notebooks by configuring transformations rules in a control database to maximize re- usability and minimize development effort.
Orchestrated Databricks Notebooks using Databricks Workflows and Azure Data Factory.
Wrote complex SQL queries and stored procedures to apply business transformations on source data, refreshing the final data model in the EDW.
Applied a variety of transformations using PySpark and Spark SQL, leveraging Spark and Delta table optimization techniques to enhance performance.
Refreshed the Azure Synapse Dedicated SQL pool after the Gold layer update to support downstream processes, including Power BI reports, Angular reports, ad-hoc querying, and data analysis by Business Analysts.
Conducted regular reviews to ensure implementations met business requirements and adhered to design and coding standards.
Project #6: Dec 2020 to August 2021
Project Title: DUAL
Payroll Company: Accenture India
Client: Howden Insurance Group, U.K
Role: Lead Data Engineer/Data Architect
Environment: Azure Data factory for Data Integration and Orchestration, Data-warehousing, Dimensional Modelling, Azure Data Lake store, Python, PySpark with Databricks, Spark SQL, SQL Server, Lakehouse with Medallion architecture, Databricks, KQL, and Azure Synapse. Description:
DUAL is an underwriting division of the Howden Insurance Group. It is the world’s largest international underwriting agency and Lloyd’s of London's largest international cover holder. Built dynamic pipelines using Azure Data Factory for Data Integration and Orchestration to load underwritings data from on premise SQL server source pre-staging Azure Blob Storage Azure Data Lake Staging layer enterprise Data Warehouse
(EDW). The data loaded into EDW is used for reporting in Dashboards. Responsibilities:
Designed and Implemented a scalable and efficient Medallion architecture with Lakehouse to support the DUAL project
Designed a dimension-model with Star schema considering the functional requirements, data volume, and historical data capture requirements.
Designed and Implemented metadata-driven, generic Azure Data Factory (ADF) pipelines to ingest data into Azure Data Lake Gen2, promoting re-usability across source formats and reducing development effort for on- boarding new data sources by configuring metadata in a control database.
Designed and implemented the Bronze and Silver layers as part of the Medallion architecture with Databricks including processing data from Azure Data Lake, applying data validation, business transformations, and delta computations before loading into the Gold layer.
Developed dynamic Notebooks by configuring business rules in a control database to maximize re-usability and minimize development effort.
Applied a variety of transformations using PySpark and Spark SQL, leveraging Spark and Delta table optimization techniques to enhance performance.
Mounted Azure Data Lake to the Databricks workspace for seamless ingestion of data into the Bronze layer.
Refreshed the Azure Synapse Dedicated SQL pool after the Gold layer update to support downstream processes, including Power BI reports, Angular reports, ad-hoc querying, and data analysis by Business Analysts.
Wrote complex SQL queries and stored procedures to apply business transformations on source data, refreshing the final data model in the EDW.
Orchestrated Databricks Notebooks using Databricks Workflows and Azure Data Factory.
Created Native and Hadoop-type external tables to support ad-hoc and exploratory tasks for Data Analysts and Business Analysts.
Led a team of 3 developers, overseeing technical aspects of design and development, with hands-on involvement in ADF, Databricks, Data Lake, PySpark, Spark SQL, and Azure Synapse.
Proficient in analyzing data using SQL and KQL, and utilizing Azure Synapse Analytics for efficient processing of time-series and semi-structured data, enabling real-time insights.
Conducted regular reviews to ensure implementations met business requirements and adhered to design and coding standards.
Project #5: Jan 2020 to Dec 2020
Project Title: Partial Migration Client:
Payroll Company: Accenture India
Client: Howden Insurance Group, U.K
Role: Lead Data Engineer/Data Architect
Environment: Azure Data Factory, Azure Data Lake, Python, PySpark, Azure SQL, Spark SQL, SQL Server, KQL, Azure Synapse Analytics, Azure Databricks, Lakehouse with Medallion Architecure, and SQL Server. Description:
Partial Migration is about migrating the existing data from On-premises system to brand new Data Platform which was built with cloud technologies as mentioned below. However, migration scope of this project is partial due to the foreseen business requirements for key dashboards/reports to be delivered against the new Data Platform for migrated data. Hence scope of the project was limited to semi