Azure Data Engineer

Location:

Minneapolis, MN, 55438

Posted:

May 09, 2025

Contact this candidate

Resume:

SUMMARY

NIKITHA KANKANALAPALLI

( Azure Data Engineer – Databricks Spark ADF Snowflake Power BI Data Lake )

************@*****.*** 469-***-**** LinkedIn: nikithakankanalapalli

Eden Prairie, MN.

Around 6 years of experience in IT, I specialize in Azure Data Engineering, Databricks, SQL development, and Business Intelligence. I have extensive expertise in designing, developing, and optimizing ETL/ELT pipelines using Azure Data Factory and Databricks, automating workflows to process and transform large datasets from various sources.

I have hands-on experience in Data extraction (including schema definition and corrupt record handling and parallelized code), transformations, and loads (utilizing user-defined functions and optimizing joins). In production, I optimize and automate the Extract, Transform, and Load (ETL) processes.

My expertise includes Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, and Delta Lake with Python and Spark SQL.

I Designed Spark streaming pipelines integrating Azure Event Hub, merging both batch and streaming functionalities seamlessly.

Wrote complex SQL queries to analyze trends, segment customer cohorts, and derive performance metrics.

Delivered compelling visual narratives using Power BI/Tableau, enabling C-level executives to act on trends and insights.

Collaborated with stakeholders to define business KPIs and deliver actionable insights for strategic decisions.

Performed EDA to uncover trends, outliers, and patterns in large datasets using Pandas and Matplotlib.

I have orchestrated data movement and transformations within Azure Data Factory Pipelines.

I have worked on the development and productionization of multiple Delta Live Tables (DLTs).

I have worked on streaming pipelines in Databricks using Azure Event Hub, handling live complex JSON data, processed via PySpark notebooks or Scala code, and operationalized those Spark streaming jobs.

Configured Spark streaming for real-time data reception from Azure Event Hub, with Scala or Pyspark code utilized to store the streaming data in an Azure delta table. Data Lake served as the repository for processing various data types, with the creation of Spark DataFrames.

I Implemented data ingestion from sources like HTTP, REST API and Azure Blob Storage into Azure Data Lake Gen2 through Azure Data Factory (ADF) and loaded into ADLS Gen2.

Proficient in writing complex SQL queries and developing complex business logic in SQL Server, Experience in SQL Server Integration Services (SSIS) to build Data Integration and Workflow Solutions, Extract, Transform and Load (ETL) solutions for Data Warehousing Applications.

I Implemented continuous monitoring of the Spark Cluster using Log Analytics, enhancing cluster stability.

I have worked extensively with various file formats such as delimited Parquet, Text files, JSON files, and XML Files. I am skilled in using different columnar file formats like RC, ORC, and Parquet.

My experience includes migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, and Azure SQL Database.

I possess end-to-end knowledge of Azure Synapse Analytics and its development lifecycle.

Proficient in using IDEs like Eclipse and IntelliJ for coding, debugging, and performance tuning of large-scale applications.

My knowledge encompasses Spark Streaming, Spark SQL, DataFrame API, Dataset API, and Spark RDD.

I am an expert in Power BI and Business Intelligence solutions, with a deep understanding of the product and its operational requirements.

I have experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats, enabling the analysis and transformation of data to uncover insights into customer usage patterns.

I am well-versed in various Azure services, including Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL databases, and SQL Server.

My extensive knowledge extends to MPP Systems like Azure Synapse Analytics, Databricks, and Hadoop.

I have extensive experience with T-SQL statements, Joins, Constraints, Views, Tables, and Stored procedures.

I have developed visualization dashboards using Power BI, incorporating analysis views with drill-down options and various chart types like Bar, Line, Scatter, Donut, and Map.

I have developed analysis reports and visualizations using DAX functions, including table functions, aggregation functions, and time functions.

I am adept at writing Stored Procedures and Queries for fetching data for SSRS reports.

I possess knowledge of the CI/CD life cycle and relevant tools such as Git, GitHub, Docker, and Azure DevOps.

I have utilized Jira, Rally Kanban boards and Scrum methodologies, following an Agile approach.

SKILLS

Languages : Python, SQL, Scala, Java.

Big Data Services : Azure - Databricks, Data Factory (ADF), Synapse Analytics, MS Fabric, Event hub, Key vault, Logic apps, Functions.

Apache Tools : Apache-Kafka, Airflow, Spark.

Data warehouses : Snowflake, Azure Synapse, Databricks.

Databases : MS SQL Server, MySQL, MongoDB, PostgreSQL.

Reporting Tools : Power BI, Tableau, SSRS, Excel.

ETL Tools : Informatica, SSIS.

Version Control : GitHub, Azure Devops.

Cloud Services : Azure.

File Formats : Json, Parquet, Avro, CSV, Text.

Others : Kubernetes, Docker, Jenkins, Project Management.

EXPERIENCE

Azure Data Engineer; Constellation Energy - Contract April 2023 – Present

Developed data pipelines, datasets, and optimized performance using Azure Databricks, Azure Data Lake Storage Gen2, Azure Event Hub services, and the Azure platform.

Designed Spark streaming pipelines integrating Azure Event Hub, merging both batch and streaming functionalities seamlessly.

Implemented data ingestion from sources like HTTP and Azure Blob Storage into Azure Data Lake Gen2 through Azure Data Factory (ADF) and loaded into ADLS Gen2.

Developed streaming pipelines in Databricks using Azure Event Hub, handling live complex JSON data, processed via PySpark notebooks or Scala code, and operationalized those Spark streaming jobs.

Utilized tools such as Azure SQL Server, Data Factory, and Databricks to construct end-to-end data pipelines for collecting, cleansing, and processing client data.

Conducted performance tuning on large datasets in snowflake, optimizing query performance using partitioning. Clustering and caching techniques.

Constructed Directed Acyclic Graphs (DAGs) in Apache Airflow for scheduling ETL processes, integrating Apache Airflow components like Pool, Executors, and multi-node capability to enhance workflow efficiency.

Implemented continuous monitoring of the Spark Cluster using Log Analytics, enhancing cluster stability.

Leveraged Azure Synapse for workload management, facilitating data delivery for analytics and business intelligence purposes.

Integrated ADF with Scala to perform intricate data conversions and manipulations.

Migrated on-premises data systems to Azure, leveraging Azure Data Factory, Azure SQL Database, and Azure Blob Storage for seamless data transition.

Orchestrated data movement and transformations within Azure Data Factory Pipelines.

Implemented a real-time analytics dashboard using Azure Stream Analytics and Power BI, providing stakeholders with up-to-date insights into business operations.

Spearheaded the development and productionization of multiple Delta Live Tables (DLTs).

Worked on enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation/support.

Created various visualizations and practice marketplace timeline slicers, hierarchy slicers, drill down and drill up, text filters, and word clouds in Power BI.

Written required DAX queries to generate computed columns in Power BI.

Created role-based access in Power BI and provide access to dashboards in Power BI Service based on requests from business analysts.

Experienced in writing time travel queries and recovering deleted or incorrect data using Delta Lake tables.

Integrated Databricks with Synapse SQL Pools (Dedicated & Serverless) for optimized querying.

Ingested and transformed data from Oracle databases and third-party sources.

Automated ingestion of multiple Parquet files using control tables and Delta tables.

Orchestrated workflows using Azure Data Factory and Integration Runtime.

Ensured data security, governance, and compliance across pipelines.

Collaborated with data scientists, analysts, and engineers to deliver end-to-end solutions.

Worked with Azure Data Factory, Azure Databricks, Azure SQL, and SQL Server Azure Data Lake.

Developed custom Python scripts and ETL (Extract, Transform, Load) pipelines to process and cleanse large volumes of data, ensuring data quality and accuracy.

Built pipelines to copy data from source to destination within Azure Data Factory (ADF V2).

Created and monitored triggers and activities in Azure Data Factory.

Created pipelines in ADF using Linked Services, Datasets, and Pipeline components to extract, transform, and load data from various sources such as Azure SQL, Blob storage, and Azure SQL Data Warehouse.

Developed Spark applications using PySpark and Spark SQL for data extraction, transformation, and aggregation.

Utilized Python’s multiprocessing and parallel processing capabilities to optimize data processing workflows, significantly improving pipeline performance.

Expertly create and chain Databricks notebooks and manage Databricks clusters, including creating and scheduling notebooks.

Created Spark configurations for accessing Azure storage/containers.

Worked with file formats including Parquet, JSON, CSV, and Delta.

Apply aggregations, filters, joins, and window functions with proficiency.

Used Delta Lake features like schema enforcement and schema evolution for reliable data handling.

Created Spark Jobs using a scheduler in Databricks as well as Azure Data Factory.

Built Window functions & Window definitions to handle the data ingesting lately from the source systems.

Accountable for estimating cluster size, monitoring, and troubleshooting the Spark Databricks cluster.

Ensured data quality and data governance by developing and implementing data validation and verification procedures.

Participated in code reviews and provide constructive feedback to improve code quality and maintainability.

Azure Data Engineer; Cooper’s Hawk Winery and Restaurants - Contract Sep 2021 – Mar 2023

Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

Data ingestion was directed to one or more Azure Services, including Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, with subsequent data processing conducted in Azure Databricks.

Developed and maintained end-to-end operations of ETL data pipelines and worked with large datasets in Azure Data Factory.

Configured Azure Data Factory (ADF) to ingest data from diverse sources, both relational and non-relational databases, tailored to meet specific business functional requirements.

Deployed an ADF pipeline to Azure Data Factory’s Dev, Test, and Production environments, allowing users to execute it from anywhere.

Prepared metadata for each batch in Azure SQL Databases to trigger and generalize ADF pipelines for different data sources.

Scheduling ADF jobs, Parameterizing Azure components.

Performed performance tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Wizard.

Developed workflows that read data like fixed width files, CSV from ADF into snowflake.

Collaborated with cross-functional teams to develop and maintain data pipelines and integration processes, resulting in a 25% increase in data processing efficiency in Snowflake.

Developed and implemented a data quality process that reduced data errors by 50% and improved overall data accuracy by 30% in Snowflake.

Created Delta Lake tables using Databricks notebook with ADLS as underlying storage system.

Developing Databricks notebooks in spark SQL and python to transform the ingested data and feed data to analytics system.

Developed PySpark programs using Databricks on Azure to consume business data and load data into Snowflake after performing transformation, standardization, filtering of the data

Leveraged various aggregation techniques offered by the Spark framework within the transformation layer, employing Apache Spark RDDs, Data Frame APIs, and Spark SQL.

Build data transformation logic using Azure Databricks and Spark for data processing and analytics.

Applied expertise in optimizing Spark applications, adjusting parameters such as batch interval time, level of parallelism, and memory allocation to enhance processing speed and efficiency.

Implemented migration of data from existing applications to Azure DW and Databricks through the creation of PySpark notebooks.

Designed and executed end-to-end data solutions encompassing storage, integration, processing, and visualization components within the Azure environment.

Managed batch processing of data sources utilizing Apache Spark.

Prepared comprehensive ETL design documents detailing database structure, Change Data Capture mechanisms, error handling procedures, and strategies for restart and data refresh.

Developed Power BI visualizations and dashboards to facilitate data analysis and interpretation.

Engaged in unit testing and resolution of various bottlenecks encountered throughout the data engineering process.

Demonstrated proficiency in applied statistics, exploratory data analysis (EDA), and visualization techniques using Power BI, Tableau, and Matplotlib.

Implemented strategies for different incremental data loads such as tumbling window, sliding window, high watermarks, etc.

Built both data quality & data validation frameworks with Org to provide the quality curated data.

Built complex SQL queries using Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design performance optimization.

Deep dived discussions on Indexing and partitioning for any given table based on cardinality.

Handled the Gitlabs to write and maintain Gitlabs for supporting CI/CD pipelines.

Collaborated with team to create branching strategies in Devops for development and release to higher environments.

Experienced working on different message formats such as Parquet, Avro, ORC and handled different Schemas registered for a given data table.

Integrated Databricks with Synapse SQL Pools (Dedicated & Serverless) for optimized querying.

Designed impactful reports and dashboards in Power BI and Tableau for executive management.

Managed BI functions, ensuring accurate and insightful data representation in Power BI reports.

Debugged, troubleshooted, designed, and implemented solutions to complex technical issues in the spark streaming pipelines.

Developed and implemented security measures to protect pipelines from unauthorized access.

Handled day-to-day issues and fine-tuned applications for optimal performance.

Collaborated with team members and stakeholders in designing and developing the data environment.

Utilized Confluence for collaborative documentation and team knowledge sharing.

Data Engineer; RapidIT, Inc Aug 2019 – Jul 2021

Designed and developed robust data integration data flows and pipelines within Azure Data Factory by understanding the logical flow of data and leveraging ADF’s capabilities.

Developed an ADF pipeline to extract data from a legacy system and load it into a new data warehouse like Snowflake or Azure DW.

Developed PySpark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats to analyze and transform data to uncover insights into customer usage patterns.

Maintained and updated an ADF pipeline to reflect changes in the product catalog.

Worked on designing and developing data integration solutions using ETL tools such as Azure Data Factory, Informatica and/or SSIS.

Implemented ETL and data movement solutions using Azure Data Factory, SSIS create and run SSIS Package ADF V2 Azure-SSIS IR.

Designed & implemented migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools).

Developed complex T-SQL code, optimized MySQL queries, and implemented SSIS for ETL processes.

Created packages to Extract, Transform and Load data using SQL Server Integration Services (SSIS) into local databases to facilitate reporting operations. Loaded the Fact and Dimension tables. Developed Stored Procedures to support ETLs and Reporting.

Used different Control Flow Tasks and Data flow Task for Creating SSIS Packages different types of Transformations for Data Conversion, Sorting and data cleansing from different Sources into Company formats.

Ensured seamless data movement and orchestration between different sources and destinations.

Worked on creating dependencies between activities in Azure Data Factory.

Created stored procedures and scheduled them in the Azure environment.

Integrated Python applications with Azure services, including Azure Data Factory, Azure Databricks, and Azure SQL Data Warehouse, to orchestrate data workflows in the cloud.

Leveraged Python to interact with Azure Data Lake Storage and Azure Blob Storage, enabling efficient data extraction and storage operations.

Successfully created linked services for both source and destination servers.

Moved data from Azure Data Lake to Azure SQL DB using pipelines and data flows.

Transferred data between SQL DB and ADLS using flows.

Created automated workflows with the help of triggers.

Developed and maintained multiple Power BI dashboards/reports and content packs.

Created POWER BI Visualizations and Dashboards as per the requirements.

Performed DB ADMIN activities on the Server by taking Back-up & restoring the DB, providing user access privileges, creating linked servers.

Modelled the Conceptual/Logical/Physical Data Modeling & guided my expertise in Relational and Dimensional Data Modeling.

Involved in understanding client requirements and preparing design documents.

Provided support for production applications by troubleshooting issues, developing, testing, and migrating databases.

Collaborated with DevOps Engineers to develop automated CI/CD pipelines for Data Factory pipeline deployment from Dev to Prod stages.

EDUCATION

Master’s: Applied Computer Science

Northwest Missouri State University, US.

Bachelor’s: Computer Science & Engineering

GITAM University– Visakhapatnam, India.

Contact this candidate