Azure Data Warehouse

Location:

San Francisco, CA

Posted:

March 25, 2025

Contact this candidate

Resume:

Kezia Susmitha Natha

+1-346-***-**** ***************@*****.***

https://in.linkedin.com/in/kezia-susmitha-a2393373

OBJECTIVE

Azure Data Architect with over 13 years of experience in designing, implementing, and managing data solutions in cloud environments. Proficient in data orchestration/ ETL Process, designing and building databases and data warehouses, optimizing data loads, Data modeling and reports. Seeking to leverage my expertise in optimizing its data strategy and fostering innovation.

PROFESSIONAL SUMMARY

•Hands-on experience on Azure cloud technologies like Azure Synapse Analytics, Azure data factory, Azure Data Bricks, Azure SQL database, Azure Fabric.

•Expertise in creating synapse pipelines that facilitate data flow for entire architecture.

•Well versed in optimizing data loads, setting up IR’s, parallelizing the load to gain maximum throughput

•Well versed in database designing, data storage strategy, indexing, data retrieval and query optimization techniques.

•Data warehouse designing, star and snowflake schema. Implemented proper distributions, indexes and partitioning for faster data retrieval

•Hands-on experience in creating synapse notebooks (Pyspark) for data transformation and data quality checks to cleanse and move the data to silver layer. Created re-usable/ scalable python methods for quality checks like primary key, column nullability and data type check. And for transformations data type conversions, delta table reload/ overwrite.

•Expertise on Azure Data Factory for data orchestration in various integration projects.

•Proficient in using PySpark for large-scale data processing and transformation. Experienced in manipulating DataFrames and RDDs for efficient data handling.

•Familiar with optimizing Python/ Py Spark applications for performance and resource management. Ability to perform SQL queries on structured data using Spark SQL.

•Understanding of distributed computing concepts and cluster management in Spark.

•Developed Apache Spark Applications using Spark RDD, Spark-SQL, and Data frame APIs.

•Data warehousing experience in Business Intelligence Technologies and Database with Extensive Knowledge in Data analysis, TSQL queries, ETL & ELT Process, Reporting Services (using SSRS, Power BI) and Analysis Services using SQL Server 2012/2014 /2016 SSIS, SSRS and SSAS, SQL Server Agent.

•Data warehousing experience in Business Intelligence Technologies and Database with Extensive Knowledge in Data analysis, TSQL queries, ETL & ELT Process using SSIS.

•Building multi-dimensional cubes using SSAS, measure and querying the cube using MDX queries. Built tabular models, Power BI data models and DAX measures. Created analytical reports.

•Well versed in using DevOps, CICD Release pipelines and GIT for source control system

•Worked on Agile project execution model

TOOLS AND TECHNOLOGIES

Databases

Azure SQL Database, Azure SQL Data Warehouse, SQL Server, Oracle, Postgres SQL

Cloud Technologies

Azure Synapse Analytics, Azure SQL Database, Azure SQL Data Warehouse, Azure Data Lake Gen 2, Logic Apps, Azure Fabric, One Lake

Programming/ Scripting Languages

Python, Scala, SQL, Spark SQL

Big Data Eco system

PySpark, Spark with Scala, Databricks, Hive

Source Control

TFS, GIT Hub

EDUCATION

Computer Science & Engineering Aug 2008 - Apr 2012

Jawaharlal Technological University, Kakinada

Bachelor of Technology with 7.7 GPA, specialization Computer Science & Engineering

PROFESSIONAL EXPERIENCE

Azure Data Architect Jun 2023 -- Present

Adventist Health, Roseville

CR Teck Consultancy

Client: Adventist Health: Adventist Health provides care in hospitals, clinics, home care, and hospice agencies in US West Coast and Hawaii area. Migrate on-premises enterprise data warehouse, hosting in oracle, to Azure Cloud.

Led over all migration plan to move the existing edw system to modern data platform application that would serve current business need and scalable for future data scientist path

Implemented medallion architecture that logically organizes data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data as it flows through each layer of the architecture (from Bronze Silver Gold layer tables)

Drove design discussions and business user interactions. Hand-holding sessions for client to ramp up on the new technology

Co-ordinated with offshore team for knowledge transfer, laying out the road map, domain related questions, architecture explanations.

Created synapse pipelines that loads data from different upstream like Oracle/ EPIC SQL Server, XML/ flat files and API.

Created synapse notebooks(pyspark) for data transformation and data quality checks to cleanse and move the data to silver layer. Created re-usable/ scalable python methods for quality checks like primary key, column nullability and data type check. And for transformations data type conversions, delta table reload

Created synapse notebooks(pyspark) to apply business logic and build Facts and dimensional tables to load to gold/ aggregated layer. Translated informatica workflows into Spark SQL equivalent data frames, optimized the queries and

Created Aggregated layer with Pyspark queries --- Informatica, Apache PySpark/ SPARK SQL.

Created lake database that supports ACID properties on Spark 3.3 version. Spark SQL MERGE operator to load data FULL/ INCR tables. Created DELTA tables on the saved parquet files.

Used data frame transaformation function like WithColumn, DropDuplicates, Subtract, filter, group by, sort, Window, PartitionBy functions

Used python queue object to parallelize data refresh

Pushed analytical tables into SQL Dedicated pool via APACHE SPARK SQL POOL CONNECTOR. Created objects with appropriate distribution (HASH/ ROUND-ROBBIN) and Indexes to optimize data loads for Power BI reporting

Created meta data driven work frame that would load data end-to-end that is robust and scalable. Used light-weighted Azure SQL Sever database to maintain the configurations.

Built error handling and alert notification mechanism using synapse pipelines and logic app

Built maintenance jobs to delete old version of delta tables using delta lake TIME TRAVEL concept and VACCUM command

Moved close to 8 TB of legacy data from existing system. Used dynamically partitioned pipelines to achieve maximum throughput. Used CETA’s queries on serverless pool to load historical files with corrected data types

Well versed with synapse dedicated pool MPP architecture. Implemented POLYBASE/ EXTERNAL Tables/ COPY into commands. Good knowledge on table distribution and indexing techniques.

Maintained Role based environment specific access levels for all team members via RBAC

Managed network peering for different environment located in separate virtual private networks

Used GIT hub for source control system

Created CICD pipelines for deploying azure synapse, SQL Analysis and database. Set up native deployment agent since the synapse resided in virtual private network

Involved in Azure Fabric POC that depicts one lake architecture. Use of SQL Warehouse endpoint that brings in TSQL functionality in data lake. And new capacity units that provide vivid flavors on compute cluster configuration

Trained offshore developers to ramp up as per project technical need. Acted as bridge for client and offshore team for business knowledge transfer

Built client trust through consistent effort and delivery.

Technical Lead Jul 2018- March 2023

Winwire Technologies Pvt. Ltd. • India

Client: Elanco – MoLink - An American pharmaceutical company which produces medicines and vaccinations for pets and livestock. A migration project that requires data population into an existing data model. This database contains research data that scientist used in experiments of new products with different molecular combination.

•Designed and developed end-to-end data pipeline that moves data from SQL Server on AWS Server to PostgreSQL.

•Created meta data driven data factory pipelines that pulls data from SQL Server to loads it into ADLS Gen 2

•Configured the data movement via RDCI framework (client layer that required meta data service logging which will be used by search engines). Involved in moving the data across the containers via API end-points provided by RDCI team, via bearer access authentication.

•Created Data bricks notebooks to validate the incoming data for quality and loading the data into a staging layer in Postgres SQL databases.

•Created stored procedures to populate data into existing data model.

•Handled the historical data and legacy data in existing system.

•Created an intact plan to switch the production endpoints to give zero down time experience to the end user

•Implemented security model with proper access on all data objects.

Client: Elanco – Product 360

Winwire Technologies Pvt. Ltd. • India - This project is targeted in integrating ERP, sales and marketing systems data and building a self-driven business model. Depicts 360-degree view of product life cycle from manufacturing till- delivery.

•Designed and developed Azure Data Factory pipelines to load data from SAP systems, Oracle, API endpoints to Azure Synapse. Maintained parent/ child architecture and parameters for re-usability.

•Developed Azure DATA BRICKS Scala notebooks, for data transformations. Maintained raw, stage and gold layers in data lake. Performed data validation, quality checks and data integrity checks while moving the data across layers.

•Created managed table in delta lake and maintained dimensions with scd type 2 structure. Created scheduled daily batch loads that refreshed delta lake tables adhering to ACID properties. REFRESH the table.Created fact table by appending data into delta lake layer.

•Used Spark SQL to implement complex business logic/ transformations. Optimized JOINs and reduced shuffle by defining the correction partitions on the data

•Pushed data into Azure SQL Dedicated pool via delta lake data set.

•Created stored procedures to populate the facts and dimension data from delta lake to Azure SQL dedicated pool.

•Designed and developed meta-data driven control framework that would drive the data pipelines and notebooks. This plug-n-play framework adapts new entity loads with ease there by reducing 60% of development effort.

•Created Azure data factory BLOB driven triggers to accommodate ad hoc data loads from external users. Streamlined the process using Logic apps that respond to events

•Developed CICD pipelines and releases to automate deployment

•Agile development model execution. Used Git Hub for source control and versioning

Client: Elanco Regulatory Data Warehouse

Winwire Technologies Pvt. Ltd. • India - This project is targeted in integrating ERP, sales and marketing systems data and building a self-driven business model. Unifying data for Regulatory Platform and Veeva migration, Material Domain and creating data model that helps in creating insightful business reports.

•Designed and developed Azure Data Factory pipelines to load data from SAP systems, Oracle, API end-points to Azure Synapse. Maintained parent/ child architecture and parameters for re-usability.

•Developed Azure data bricks notebooks, using SPARK SQL and Scala, for data transformations.

•Created External Tables in Azure synapse analytics with ADLS parquet file as data sources.

•Designed data model in Azure synapse analytics, created facts and dimensions with appropriate distribution and indexes.

•Configured all connections via Service principal or managed identity.

•Created paginated reports and dashboards for end-user as per business requirement.

•Developed CICD pipelines and releases to automate deployment.

•Agile development model execution. Used Git Hub for source control and versioning

Senior Consultant Nov 2014 - Jul 2018

Winwire Technologies Ltd. • India

Client: Austin Bridge & Road (Texas) One of the nation’s largest and diversified construction company offering civil, commercial, and industrial construction services in the U.S. Worked under portfolios that provided construction project bid predictions, automating Profit & Loss accounting reports.

•Extracted public and private historic data of Texas R&T, belonging to past 16 year from API’s, csv

•Cleansed and transformed the unstructured data to feed Machine learning models

•Supported data scientists with data analysis and identifying influencing leading indicators.

•Created descriptive analytical dashboard with complex measures and the Machine Learning model predictability time series

•Analyzed the existing system, Financial Manager’s Workbench (FMW) – Web edition of RAC Software Inc. Cracked most complex database structure/ objects used in rendering FMW reports

•Interacted with the stakeholders from all 3 verticals (Industrial, commercial & BR) and gathered information on various upstream data sets.

•Developed Azure Data Factory data pipelines to pull in various upstream data sets and automated them

•Built a report semantic layer in OLTP and Power BI data model reports that visualize finance measures sliced across various segments and allocation methods.

•Helped the business user time and cost, by creating an intact replica of the existing structure. Helps them download and continue using the metrics as before with no or minimal impact/ change.

•TFS as source control system

Senior Consultant Nov 2014 - Jul 2018

Neudesic Technologies Pvt. Ltd. • India

Client: Microsoft GD A project management tool that all technical/ non-technical projects under a vertical. Targeted for Project & delivery Managers. It’s a rewrite of existing system with enhanced KPI factors that compares and comprehends the advantage of funding and executing a project. Responsibilities include:

•Analyzed the existing system and cross verified with the new system requirements

•Created SSIS packages to import all historic data from the existing system. Migrated historic data from legacy system

•Created SSIS packages to pull project data for on-going projects. Automated the process

•Deployed the projects in database catalog and scheduled jobs to refresh the data

•Created power BI reports. Scheduled their refresh rates

•Created data factory(V1) pipelines to pull data from upstream sources. Used stored procedure and copy activities

Software Engineer Dec 2011 - May 2014

MAQ Software Hyderabad Pvt. Ltd • India

Client: Microsoft IT – Design and develop data mart to facilitate partner dashboard that detailing the competency of the partner, risks, expiration of their competencies, product licenses and free training hours.

•Involved in requirement gathering and preparing design documents

•Created ETL packages that sources data from OLTP (SQL queries), OLAP(MDX), Files, Web sites and Excel reports

•Used package elements like (Containers)for loop, for each loop, sequence, execute package, execute process, Execute SQL tasks, script task, (transformations) data flow tasks: conditional split, Copy column, Data conversion, derived column, Fuzzy look up, Look up, Merge, Merge join, Multi-cast, Union All, dynamic variables and Project parameters

•Worked on Project level(.ispac) deployment in SSIS DB Catalog

•Implemented security based on geographies

•Designed and created system that automates the process of data approval from data owners and pushed approved data from staging to production environment

•Involved in writing MDX queries for SSRS and HTML reports

•Upgraded SSIS 2008 ETL packages to SSIS 2012 and implemented project level parameters/ environment variables

•Implemented column store indexes in dimension tables of cube for performance improvement

•Handled support requests on report data and accessibility

AWARDS AND HONOURS

Delighted Customer 2014

Rookie of the Year 2019

Technology Leadership 2022

CERTIFICATIONS

•DP 700: Microsoft Certified: Fabric Data Engineer Associate

•DP-200: Implementing Azure Data Solution

•DP-201: Designing an Azure Data Solution

•MCSE in Querying Microsoft SQL Server 2012(70-461)

•MCSE in Administering a Microsoft SQL Server 2012 Database (70-462)

•MCSE in Implementing Data Warehouses with Microsoft SQL Server 2012(70-463)

•MCSE in Implementing Data Models and Reports w/MS SQL Server (70-466)

Contact this candidate