Azure Data Engineer

Location:

O'Fallon, MO

Posted:

February 21, 2025

Contact this candidate

Resume:

Nikitha Pateel

**********.****@*****.*** 314-***-****)

https://www.linkedin.com/in/nikitha-pateel/

Technical Summary:

* ***** ** ********** ** Data Engineering, Analytics, Data Modeling, and Data Architecture, specializing in the design, development, and implementation of OLTP and OLAP systems. Strong knowledge of Cloud Technologies, including Azure and Snowflake.

Proficient in software programming, with expertise in designing and developing applications using Python, .NET, and SQL.

Extensive experience as an Azure Cloud Data Engineer, working with Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics (SQL Data Warehouse), Azure SQL Database, Azure DevOps, and CosmosDB.

Utilized Microsoft Viva Insights to analyze and visualize employee wellbeing, productivity, and engagement using organizational data, enabling data-driven decisions that improved focus, work performance, and overall employee experience. Created Power BI dashboards to provide actionable insights on business resilience, onboarding, development, and wellbeing.

Strong background in database migrations to Snowflake, ensuring smooth transitions and optimized performance.

Skilled in writing complex SQL queries with SnowSQL and performing data analysis using Spark SQL and Hive Query Language (HQL).

Experienced in developing Spark applications using Databricks for data extraction, transformation, and aggregation from multiple file formats to gain insights into customer usage patterns.

Expertise in creating ADF pipelines, managing role-based access to Azure Data Lake, and securing data access based on organizational policies.

Proficient in Hive query optimization through Partitioning and Bucketing techniques to improve query performance.

Extensive experience in database design and development, including creating schemas, performance tuning, and writing stored procedures, functions, triggers, indexes, and views using PL/SQL and T-SQL in SQL Server.

Expertise in Data Modeling, Data Mapping, and using SQL Server Management Studio (SSMS), SQL Server Integration Services (SSIS), and SQL Server Reporting Services (SSRS) for ETL processes and reporting.

Experienced in creating and scheduling SQL jobs to automate data processes and tasks.

Skilled in data visualization using tools like Grafana and Power BI to provide business insights through interactive dashboards.

Proficient in using SSIS to automate ETL tasks, including creating packages to transform data from various sources such as Flat Files, Excel, and MS SQL.

Experience in managing Jobs, Alerts, SQL Mail Agent, and scheduling DTS Packages, along with implementing backup and disaster recovery procedures.

Expertise in using ADO.NET objects such as SQL Command, Data Reader, Data Set, and Data Adapter to interface with databases in application development.

Strong experience working in an Agile/Scrum environment, delivering projects in a fast-paced setting.

Proficient in version control using Team Foundation Server (TFS) and GIT for source code management and collaboration.

Technical Skills:

Programming Languages: Python, C#, .NET (4.0/3.5/3.0)

Big Data Tools & Technologies: Hadoop, PySpark, Spark, Sqoop

Cloud Technologies: Microsoft Azure (Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure SQL Database)

Data Warehousing: Snowflake, Azure Data Warehouse (Azure DW)

Databases: SQL, Hive, Cosmos DB

Reporting & Visualization Tools: SQL Server Reporting Services (SSRS), SSIS, Power BI, Tableau, Microsoft SQL Server (2014/2016)

Version Control Tools: Team Foundation Server (TFS 2013/2014/2015), GIT, BitBucket

Professional Experience:

MasterCard

St louis, MO Oct 2022 - Present

Role: Sr. Data Engineer

Responsibilities:

Assisted senior resources in the assessment, analysis and implementation of Microsoft Graph Data Connect collaborating with cross-functional teams (Platform Engineering, Operations, Data Science, Microsoft Support) to drive successful integration and delivery.

Leveraged Microsoft Viva Insights to analyze employee wellbeing, productivity, and engagement using organizational data, enabling actionable insights to improve work performance and mental health.

Utilized Viva Insights to create Power BI visualizations that empowered HR and management teams to track and optimize employee wellbeing, enhancing focus, engagement, and overall productivity across the organization.

Monitored and reported on employee wellbeing trends by extracting and analyzing data from Viva Insights, supporting leadership in developing strategies to enhance workplace culture and promote better work-life balance.

Led technical initiatives as the project team lead, ensuring timely and accurate execution of data engineering projects, and coordinating between various stakeholders for seamless solution delivery.

Engineered complex data transformations (joins, filters, aggregations) using PySpark, facilitating the seamless loading of critical data into Cosmos DB for real-time access by analysts.

Designed and developed scalable ETL processes in Python to integrate Workday data into Hive and SQL databases, ensuring robust data pipelines for analytical consumption.

Implemented Benevity data integration, enabling the creation of a comprehensive volunteerism dashboard to track and report on key metrics.

Delivered a unified data repository for Talent Acquisition (TA) metrics, providing a single source of truth for key candidate-level data insights, enhancing reporting capabilities for TA teams.

Developed custom scripts to automate data purging, data load processes, and optimize operational efficiency across systems.

Designed and implemented ETL pipelines using Hadoop, PySpark, Python, Hive, and SQL to efficiently extract, transform, and load large datasets across distributed systems.

Optimized data processing workflows in Hive and Hadoop, leveraging PySpark for high-performance transformations and reducing ETL job runtime.

Developed custom Python scripts for automation, data cleansing, and quality checks, ensuring smooth operations and data integrity throughout the ETL process.

Collaborated with cross-functional teams to define data requirements and integrate data from various sources into the Hadoop ecosystem, enabling scalable analytics.

Automated ETL processes for Qualtrics by writing custom Bash scripts, significantly reducing manual intervention and improving efficiency.

Developed data lineage solution that parses Workday logs to visualize data ingestion patterns, providing actionable insights into data flow and dependencies.

Built end-to-end data pipelines in Azure Data Factory (ADF) to extract, transform, and load (ETL) M365 data into CosmosDB, enabling improved analytics and data accessibility.

Implemented Shared Access Token (SAS) authentication for secure and controlled access to data stored in Azure Data Lake.

Defined and enforced role-based access control (RBAC) and policies for Azure Data Lake, ensuring compliance with business and security requirements.

Developed ADF pipeline to efficiently extract HR data and load it into Cosmos DB, streamlining data transfer and reporting capabilities.

Encrypted Personally Identifiable Information (PII) using Fernet keys to maintain data privacy and meet regulatory requirements.

Constructed organizational network analysis (ONA) using extracted MGDC data to analyze collaboration patterns between business units, improving cross-functional communication and decision-making.

Veterans United Home Loans

Columbia, MO Oct 2020 – Sep 2022

Role: Data Engineer

Responsibilities:

Planned and executed comprehensive migration strategies utilizing Azure Data Factory (ADF) to orchestrate the seamless transfer of data from diverse sources Snowflake, optimizing data workflows.

Designed and developed incremental and full load pipelines in ADF and implemented scheduling and monitoring systems to ensure continuous and efficient data flow.

Automated ETL job execution using a variety of ADF triggers (Event, Scheduled, and Tumbling), improving pipeline reliability and reducing manual intervention.

Configured Service Principal authentication for secure data access in Azure Data Lake, adhering to best practices for identity management and access control.

Created and managed roles in Azure Data Lake to ensure appropriate access levels, enhancing data security and compliance with organizational policies.

Mounted Azure Data Lake to DBFS using Service Principal credentials to securely access and manipulate data within Databricks Notebooks.

Designed and executed data transformation plans for Snowflake integration, considering data compatibility, performance, and optimization for large-scale migration.

Implemented robust quality control and validation processes to ensure data accuracy, integrity, and consistency during the migration from source systems to Snowflake.

Redesigned Snowflake views to optimize query performance, reducing processing times and improving reporting efficiency.

Defined and managed roles and privileges in Snowflake to control access to critical database objects, ensuring secure and efficient data management.

Optimized Snowflake virtual warehouse sizing to align with workload types and maximize performance while minimizing costs.

Developed Power BI dashboards to monitor ADF pipeline execution, providing real-time insights into data migration progress and conducting high-level testing on the migrated data.

SCOR Global Life Insurance

Kansas City, MO May 2019 - May 2020

Role: Data Engineer

Responsibilities:

Enhanced the ETL process and improved the performance by reducing the execution time to 75% using the script task.

Managed the API Rate limits through the scripting task.

Automated the process of providing customer lists to Google Ads which will decrease user error and increase cadence of fresh data.

Created Grafana Dashboard to monitor the performance of the processes.

Optimized Stored Procedures and long running queries by using indexing strategies and query-optimization techniques.

Extracted data from different sources and loaded it into SQL database.

Extracted and parsed the Json, XML data from the APIs and transformed data based on the business requirements and loaded it into the database tables to make it available for reporting.

Used GraphQL query to extract the data from the APIs.

Experience in complete SSIS life cycle in creating SSIS packages, building, deploying, and executing the packages in both the environments (Development and Production).

Developed and deployed SSIS packages, configuration files and schedules job to run the packages to generate data in CSV files.

Built a pipeline using Azure Devops to deploy the .NET Core application to the server.

Created pipelines, data flows and complex data transformations and manipulations using ADF and Databricks.

Created Linked service to land in Azure using Azure Data Factory v2 to get the data from disparate source systems using different Azure Activities like Move & Transform, Copy, filter, for each, Databricks, etc.

Maintained TFS Source Control server for versioning the Database Objects and production releases.

Scheduled Daily and Weekly Jobs and Alerting using SQL Server Agent.

Used Postman for visual display of the API query results.

Responsible for 24X7 Production support for critical processes.

TATA Consultancy Services Jun 2015 – July 2018

Role: Software Engineer

Responsibilities:

Created several stored procedures and wrote complex SQL syntax using case, having, connect by etc.

Created Stored Procedures, Views, Triggers and Complex T-SQL queries in SQL Server

Generated reports using SSRS tool for achieving modularization and faster execution.

Created Database Objects Tables, Views, Clustered and Non-Clustered Indexes, Primary key and Foreign Key, and Unique/Check Constraints, Functions, Stored Procedures, Temporary Tables, Common table expressions, SQL quires involving merge statements, joins, Sub-quires to make latest data available in tables.

Developed dashboards using Power BI to gain more insights of the organization data using DAX functions and power query.

Documented the performance of Power BI and Tableau and created the dashboards using Power BI and Tableau.

Designed and developed ETL workflow using SSIS packages.

Created Database Tables and have written SQL Queries, Stored Procedures for all the Transactions in and out of the Database using SQL server 2014, Developed PL/SQL triggers and master tables for automatic creation of primary keys.

Developed stored procedures, views for data manipulation, ensuring set up relations includes indexing, constraints, and foreign keys in SQL server.

Generated server-side SQL scripts for data manipulation and validation and materialized views for remote instances.

Used SQL Server 2014 for creation of database tables, stored procedures and used ADO.NET for Communication between the application and database, used built-in mechanisms of SQL Server 2014 for Data integrity like Default, Not Null, Check, Unique, Primary key, and foreign key etc.

Used SQL Profiler to trace and store the results in tables to analyse query performance.

Writing SQL queries, Cursors using embedded SQL, PL/SQL.

Performed Unit testing based on the requirements and development standards.

Projects:

Diabetic retinopathy Detection (Python, HTML, CSS) - Developed AI model using Convolutional Neural Networks to detect the stage of Diabetic retinopathy.

Data modelling with PostgreSQL and Apache Cassandra – Created database schema and built ETL pipeline (Python) to analyze the data of the Music streaming app.

Data Infrastructure on the AWS Cloud – Built an ETL pipeline to extract data from S3, stage them in Redshift and transform data.

Big Data with Spark – Loading data from S3 and processing it into analytical tables using spark and deployed the process on cluster using AWS.

Healthcare Content Management System (Angular JS, HTML, CSS, JQUERY, Java Script) - Uploaded the images to Fire base storage and retrieved them to display the contents to the user. Displayed the nearby Hospitals using the Four-Square API.

Data analytics visualization on Twitter data (Python, Spark SQL, Hadoop, Tableau) - Collected tweets focusing on technology in different domains using twitter streaming API. Developed and implemented analytical queries to explore the data using Spark SQL. Visualized the results using Tableau tool.

Beauty Salon - Developed an application to schedule an appointment at the required time slot and choose the beautician using ASP.NET and MS SQL.

MongoDB vs SQL Performed ETL process using Python and SSIS and compared the performance of MongoDB and SQL.

Education Details:

University of Missouri Kansas City August 2018 – May 2020

Masters in computer science GPA - 3.85

VNR Vignan Jyothi Institute of Engineering and Technology July 2011 - May 2015

Bachelors in Electronics and Communication. GPA - 3.5

Contact this candidate