Data Engineer

Location:

Dallas, TX

Posted:

July 29, 2025

Contact this candidate

Resume:

Neha Burri

Sr. Cloud Data Engineer

Phone: +1-972-***-****

Email: ***********@*****.***

Linkedin:www.linkedin.com/in/nehaburri

PROFESSIONAL SUMMARY:

●Accomplished Data Engineer with 10 years of IT experience, adept in designing and deploying cloud ETL solutions on Microsoft Azure (Azure Data Factory, Azure Databricks, Azure Data Lake Storage) and AWS platforms using services such as AWS Glue, AWS Lambda, and Amazon Redshift to build scalable, secure, and efficient data pipelines.

●8 years of dedicated experience in Azure & AWS Cloud Solutions and Big Data technologies, designing and establishing optimal cloud solutions for efficient data migration and processing.

●Expert in Data Warehousing, building efficient ETL pipelines, optimizing SSIS for seamless data integration and SSRS for comprehensive reporting solutions.

●Designed and built scalable data ingestion pipelines using Azure Data Factory and AWS Glue coupled with AWS Lambda, integrating data from diverse sources including relational databases, Azure Blob Storage, S3 buckets, and REST APIs to enable seamless and flexible ingestion.

●Vast expertise in working with Azure Data Lake Storage Gen 2 for efficient storage and retrieval of unstructured and semi-structured data.

●Secured sensitive cryptographic keys and secrets by leveraging Azure Key Vault, ensuring robust data protection and compliance in cloud-based environments.

●Developed distributed data transformation workflows using Azure Databricks notebooks with PySpark and SparkSQL, alongside AWS EMR with Apache Spark, enabling efficient processing of large-scale datasets for analytics and machine learning use cases.

●Implemented robust data quality checks and transformation logic using Azure Synapse pipelines and AWS Glue jobs leveraging PySpark, ensuring data accuracy and consistency across ingestion and transformation layers.

●Demonstrated expertise in deploying Azure Event Hubs for real-time streaming data ingestion.

●Developed end-to-end data workflows through Azure Logic Apps, Azure Functions, and serverless solutions.

●Leveraged Power Query to seamlessly transform and shape data within Microsoft Azure environments, ensuring efficient and streamlined data workflows.

●Worked with scripting languages such as Python, PySpark, PL/SQL and Scala, enabling seamless integration of custom functionalities into data pipelines.

●Adept at designing cloud-based data warehousing solutions on Snowflake, optimizing warehouse schemas, tables, and views for streamlined data storage and retrieval.

●Shown expert-level proficiency in leveraging SnowSQL to retrieve and manipulate large datasets in Snowflake data warehouses.

●Configured and implemented roles and access controls to ensure controlled access to various database objects within the Snowflake ecosystem.

●Successfully built end-to-end, scalable data pipelines by integrating Azure Event Hub with Azure Databricks and Kafka, orchestrated via Apache Airflow and monitored through Azure Monitor—ensuring real-time ingestion, processing, and alerting on high-volume data streams.

●Highly skilled in big-data technologies of Hadoop, HDFS, MapReduce, Hive, and Spark for efficient ETL tasks, and real-time data processing.

●Proficient in designing scalable NoSQL solutions, efficiently supporting diverse data formats in cloud environments.

●Exceptional command over Kafka streaming technology and its distributed messaging capabilities for constructing resilient and high-performing data flows.

●Expertise in deploying Spark Streaming and AWS EMR Spark Streaming to build and streamline real-time data pipelines that process large volumes of data from a variety of sources.

●Integrated Apache Sqoop for seamless import and export of data between HDFS and Hive, and configured data workflows through Apache Oozie, Control-M for effective scheduling and management of Hadoop jobs.

●Expert in optimizing query performance in Hive and Spark by designing and implementing Bucketing and Partitioning strategies to enable efficient data retrieval and storage optimization.

●Developed streamlined data ingestions and integrations on large-scale big data ETL tasks using Tez.

●Configured and implemented Zookeeper to ensure efficient coordination and synchronization of distributed data processing systems.

●Demonstrated expertise in implementing advanced serialization techniques to optimize data storage, transfer, and deserialization processes.

●Optimized performance tuning for OLAP/OLTP in Azure environments, enhancing query execution and data retrieval efficiency.

●Designed and deployed scalable data models in Azure Cosmos DB and MongoDB for high-throughput, low-latency applications, enabling efficient storage and querying of semi-structured and unstructured data in real-time analytics environments.

●Scheduled and monitored data workflows with Control-M and Apache Airflow for coordinated execution of complex tasks.

●Automated ETL pipelines using AWS Step Functions and CloudWatch Events, significantly reducing manual intervention and streamlining data delivery by over 40%, complementing Azure Logic Apps and Azure Functions for serverless workflow orchestration.

●Exceptional command in working with diverse file formats like Parquet, CSV, JSON, Avro and ORC for efficient storage and exchange within data pipelines.

●Facilitated adoption of DevOps appropriate practices and implemented Version Controls (GIT, GitHub, Repo), and supported the set-up of automated CI/CD pipelines for faster software delivery across multiple development environments.

●Experienced working along the Agile Scrum Methodology and extensively participated in Sprint Planning, Daily Scrum updates and Retrospective meetings.

Technical Skills:

Azure Services

Azure Data Factory, Azure Data Bricks, ADLS Gen2, Azure Fabric, Azure Cosmos DB Azure SQL Database, Azure Logic Apps, Azure Functional App, Azure DevOps, Azure Key Vaults, Azure HDInsight, Azure Event HUB, Azure Monitor, Azure Service Bus,

AWS Services

S3, Redshift, EMR, SNS, Athena, Glue, CloudWatch, Kinesis, Glue, Lambda.

Big Data Technologies

Hadoop, MapReduce, Hive, Pig, spark, Kafka, Oozie, Sqoop, Zookeeper, Airflow, YARN, Flume, Impala, NiFi.

ETL / BI Tools

Informatica, SSIS, Tableau, Power BI, SSRS, QlikView.

Programming

Python, SQL, PL/SQL, Scala, Shell Scripting.

Databases (RDBMS/NoSQL)

MS SQL Server, Azure SQL DB, Oracle, PostgresSQL, HANA,

Cassandra, Mongo DB, Azure Cosmos DB.

Data Warehouse

Snowflake, Azure Synapse Analytics and RedShift.

DWH Schemas

Star schema and Snowflake schema.

Version Control

GIT, GitHub.

CICD

Azure DevOps, Jenkins.

SDLC

Agile, Scrum, waterfall, Kanban.

Data Formats

CSV, Text, XML, JSON, Avro, Parquet.

Operating Systems

Linux, Windows, Ubuntu, Unix.

EDUCATION:

•Bachelor’s in computer science from JNTUH- 2014

WORK EXPERIENCE:

Role: Sr. Cloud Data Engineer Sept 2023 – Till Now

Client: Amgen, Tampa, USA

Responsibilities:

●Designed and implemented end-to-end data pipelines using Azure Data Factory to facilitate efficient data ingestion, transformation, and loading (ETL) from diverse data sources into Synapse data warehouse.

●Orchestrated robust data processing workflows utilizing Azure Databricks and harnessing the power of Apache Spark for seamless large-scale data transformations and advanced analytics.

●Developed real-time data streaming capabilities into Synapse by seamlessly integrating Azure Event Hubs and Azure Functions, enabling prompt and reliable data ingestion.

●Deployed Azure Data Lake Storage as a reliable and scalable data lake solution, implementing efficient data partitioning and retention strategies to store and manage both raw and processed data effectively.

●Employed Azure Blob Storage for optimized data file storage and retrieval, implementing advanced techniques like compression and encryption to bolster data security and streamline storage costs.

●Optimized performance and reduced latency by tuning partitioning strategies, indexing policies, and throughput provisioning in Azure Cosmos DB.

●Integrated Azure Logic Apps seamlessly into the data workflows, ensuring comprehensive orchestration and triggering of complex data operations based on specific events, enhancing overall data pipeline efficiency.

●Enforced data governance and comprehensive data quality checks using Azure Data Factory and Synapse, guaranteeing the highest standards of data accuracy and consistency.

●Implemented robust data replication and synchronization strategies between Synapse and other data platforms leveraging Azure Data Factory and Change Data Capture techniques, ensuring data integrity and consistency across systems.

●Designed and implemented efficient data archiving and retention strategies utilizing Azure Blob Storage and leveraging Synapse Time Travel feature, ensuring optimal data management and regulatory compliance.

●Developed and deployed Azure Functions to handle critical data preprocessing, enrichment, and validation tasks within the data pipelines, elevating the overall data quality and reliability.

●Worked on Azure Machine Learning and Snowflake to architect and execute advanced analytics and machine learning workflows, enabling predictive analytics and data-driven insights.

●Developed custom monitoring and alerting solutions using Azure Monitor and Synapse Query Performance Monitoring (QPM), providing proactive identification and resolution of performance bottlenecks.

●Integrated Snowflake seamlessly with Power BI and Azure Analysis Services to deliver interactive dashboards and reports, empowering business users with self-service analytics capabilities.

●Optimized data pipelines and Spark jobs in Azure Databricks through advanced techniques like Spark configuration tuning, data caching, and data partitioning, resulting in superior performance and efficiency.

●Implemented comprehensive data cataloging and data lineage solutions using Azure Purview and Apache Atlas, enabling in-depth understanding and visualization of data assets and their interdependencies.

●Architected and optimized high-performing Snowflake schemas, tables, and views to accommodate complex analytical queries and reporting requirements, ensuring exceptional scalability and query performance.

●Collaborated closely with cross-functional teams including data scientists, data analysts, and business stakeholders, ensuring alignment with data requirements and delivering scalable and reliable data solutions.

Environment: Azure Data Factory, Azure Databricks, Azure Synapse data warehouse, Azure Event Hubs, Azure Cosmos DB, Azure Functions, Azure Data Lake Storage, Azure Blob Storage, Azure Logic Apps, Azure DevOps, Azure Monitor, Power BI, Azure Analysis Services, Apache Purview, Apache Atlas.

Role: Azure Data Engineer July 2021 – Aug 2023

Client: TIAA, NC, USA

Responsibilities:

●Designed and implemented scalable data ingestion pipelines in Azure Data Factory, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs.

●Orchestrated seamless migration from Big Data and SSIS/SSRS servers to Snowflake, deploying Azure Data Factory and custom migration scripts.

●Developed robust data processing workflows leveraging Azure Databricks and Spark for distributed data processing and transformation tasks.

●Successfully migrated large-scale data sets from on-premises HDFS stores and MapReduce to Azure Cloud storage, using PolyBase and ADF.

●Deployed and implemented Azure Blob Storage effectively, optimizing data storage, accessibility, and retrieval for streamlined data engineering operations.

●Leveraged Snowflake to seamlessly integrate big data processing and transformations.

●Designed and orchestrated Snow pipes for continuous data ingestion, with the aim of seamlessly loading data from Azure Data Lake Storage onto Snowflake data warehouse.

●Enhanced data processing by implementing performance tuning on OLAP/OLTP processes and utilizing Azure Key Vault for secure access key management.

●Developed and implemented SnowSQL scripts to interact with and manage Snowflake Datawarehouse.

●Incorporated Azure Data Lake Storage Gen 2 into data workflows for streamlined handling of vast datasets from diverse sources, to maintain a seamless migration of data from big data to Azure cloud.

●Automated data pipelines and workflows by configuring event-based triggers and scheduling mechanisms, streamlining data processing and delivery which resulted in 48% reduction in manual intervention.

●Implemented comprehensive data lineage and metadata management solutions, ensuring end-to-end visibility and governance over data flow and transformations.

●Identified and resolved performance bottlenecks within data processing and storage layers, optimizing query execution and reducing data latency.

●Efficiently handled diverse big data file formats, including Parquet, Avro, and ORC, while optimizing storage and ensuring seamless processing of large datasets.

●Enforced advanced techniques such as partitioning, indexing, and caching in Azure services to enhance query performance and reduce processing time.

●Demonstrated proficiency in scripting languages like PySpark and Scala, facilitating effective data manipulation and seamless incorporation of customized data processing functionalities.

●Developed and fine-tuned high-performance Spark jobs to handle complex data transformations, aggregations, and machine learning tasks on large-scale datasets.

●Deployed Apache Airflow and Control-M for scheduling and managing data workflows.

●Created and updated Views, Stored Procedures, Triggers, User-Defined Functions, and Scripts using T-SQL, ensuring a robust and efficient database system.

●Designed and optimized Snowflake DW schemas, tables, and views for streamlined data storage and retrieval, tailored to accommodate advanced analytics and reporting needs.

●Executed HiveQL scripts through Hive on Spark and SparkSQL, ensuring data integrity during ETL tasks, facilitating a smooth migration to Azure Cloud.

●Proficiently worked within Agile methodologies, actively participating in daily stand-ups and coordinated planning sessions, contributing to streamlined project execution

Environment: Azure Data Factory, Azure Databricks, Snowflake, Azure Blob Storage, ADLS Gen2, Logic Apps, Functional App, SparkSQL, SQL, T-SQL, Scala, PySpark, Airflow, HDFS, MapReduce, PolyBase, Spark, Hive, Kafka;

Role: Data Developer Oct 2019 – June 2021

Client: LA County, CA, USA

Responsibilities:

●Designed and built scalable data ingestion pipelines using AWS Glue and AWS Lambda, integrating data from diverse sources including relational databases, S3 buckets, and REST APIs.

●Developed distributed data transformation workflows using AWS EMR with Apache Spark, enabling efficient processing of large-scale datasets for analytics and machine learning use cases.

●Implemented robust data quality checks and transformation logic using AWS Glue jobs and PySpark, ensuring data accuracy and consistency across ingestion layers.

●Leveraged Amazon Redshift for scalable data warehousing, optimizing data models and query performance to support fast, business-critical reporting and dashboarding.

●Automated ETL pipelines using AWS Step Functions and CloudWatch Events, significantly reducing manual intervention and streamlining data delivery by 40%+.

●Utilized Amazon Kinesis and Apache Spark Streaming for real-time data processing and analytics, improving time-to-insight and supporting streaming use cases such as log and event monitoring.

●Architected and deployed a cloud-native data warehouse solution using Snowflake on AWS, enabling scalable storage, processing, and advanced analytics for cross-functional teams.

●Fine-tuned Spark jobs on AWS EMR to perform complex aggregations, joins, and transformations on multi-terabyte datasets, reducing processing times and improving pipeline efficiency.

●Improved performance and scalability of data pipelines through best practices like data partitioning, indexing, and query caching in Snowflake and AWS Redshift.

●Developed end-to-end data workflows using Kafka, Spark, and Hive on EMR, enabling reliable and scalable batch and stream data processing pipelines.

●Implemented comprehensive data lineage and cataloging using AWS Glue Data Catalog and AWS Lake Formation, ensuring governance, discoverability, and traceability of data assets.

●Collaborated with data analysts and stakeholders to model data effectively within Redshift and Snowflake, aligning technical solutions with business intelligence needs.

●Conducted in-depth performance tuning and cost optimization of data infrastructure across EMR, Redshift, and S3, balancing scalability with budget efficiency.

●Built resilient batch and streaming ETL pipelines using a combination of Lambda, Kinesis, and Glue, supporting both scheduled and event-driven data workflows.

●Applied Agile best practices, contributing to sprint planning, retrospectives, and daily stand-ups to ensure timely delivery and continuous improvement of data engineering tasks.

Environment: AWS Glue, AWS Lambda, AWS EMR, Amazon S3, Amazon Redshift, AWS Step Functions, Amazon Kinesis, Snowflake on AWS, AWS Lake Formation, AWS Glue Data Catalog, Apache Spark, PySpark, Scala, SparkSQL, Apache Hive, Apache Kafka, CloudWatch, SQL, Agile.

Role: Big Data Developer March 2018 – Sept 2019

Client: SoCalGas, CA, USA

Responsibilities:

●Design the data flow to import data (which may be structured, unstructured or semi-structured data) from multiple data sources like Aster, Teradata, Vertica and SAS. into the Hadoop data lake using SOAP Web Services, File Transfer Protocols, Sqoop, Map Reduce, Hive and Pig.

●Perform crucial transformations and query the loaded data using Hive, SparkSQL and build reporting tables.

●Coordinated with business customers to gather business requirements. And interact with other technical peers to derive technical requirements.

●Analysed and transformed data to uncover insights from multiple file formats with data transformation and aggregation with spark applications using PySpark and SparkSQL.

●Working with different file formats like Text, Sequential, Avro, ORC and Parquet and compression libraries like Snappy, Gzip2, etc. to identify best compression and serialization format depending on the type of data for efficient storage and processing.

●Utilized regular expressions in Snowflake for pattern matching and data extraction tasks.

●Automated data pipelines, ETL processes, and data transformations using Snowflake scripting.

●Managed and oversaw various types of Snowflake tables, including transient, temporary, and persistent tables, to meet specific data storage and processing requirements.

●Facilitated data consolidation, health information, analytics-driven decision-making using Ab Initio for data integration and analysis.

●Implemented advanced partitioning techniques in Snowflake, significantly improving query performance and accelerating data retrieval.

●Defined comprehensive roles and access privileges in Snowflake to enforce rigorous data security and governance protocols.

●Developed and executed Snowflake scripting solutions to automate critical data pipelines, ETL processes, and data transformations.

●Writing Hive and Pig scripts for data pre-processing, cleansing and transformations. Extending the core functionality of Hive and Pig by writing custom Python, Scala, Java UDFs (User defined functions) to use on top of it.

●Perform Ingestion of structured, unstructured, and semi structured data from multiple data sources into Hadoop Distributed environment using Apache Sqoop and loading the data into Hive tables, HBase tables after preprocessing.

● Perform Text Cleansing by applying various transformations using Spark Data frames and RDDS.

● Perform Text mining and build predictive models in PySpark using NLTK packages and Spark Machine Learning Library.

●Derive data insights and perform data reporting and visualization using tools like Power BI.

● Automate the above data loading, cleansing, mining and reporting jobs using Oozie.

●Extensively involved in the Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, SQOOP and SPARK with SCALA.

●Involved in converting Pig/Hive/SAS SQL queries into Spark transformations using Apache Spark in Scala.

●Perform incremental data load into Hadoop data lake based on the input data frequency.

●Provide work updates and discuss the issues faced in the biweekly Sprint meetings.

Environment: Horton Works, Hadoop, MapReduce, HIVE, PIG, Sqoop, Oozie, DB2, Shell scripting, JSON, Teradata, SAS, Oracle 11g, PL/SQL, Aster, Spark, Scala, Python, UNIX, Power BI, Kerberos.

Role: Data Warehouse Developer June 2014 - April 2017

Client: Tiger Analytics - Chennai

Responsibilities:

●Actively participated in Agile Scrum Methodology, engaging in daily stand-up meetings. Proficiently utilized Visual SourceSafe for Visual Studio 2010 for version control and effectively managed project progress using Trello.

●Implemented advanced reporting functionalities in Power BI, including Drill-through and Drill-down reports with interactive Drop-down menus, data sorting capabilities, and subtotals for enhanced data analysis.

●Employed Data warehousing techniques to develop a comprehensive Data Mart, serving as a reliable data source for downstream reporting. Developed a User Access Tool empowering users to create ad-hoc reports and execute queries for in-depth analysis within the proposed Cube.

●Streamlined the deployment of SSIS Packages and optimized their execution through the creation of efficient job configurations.

●Demonstrated expertise in building diverse Cubes and Dimensions using different architectures and data sources for Business Intelligence. Proficiently utilized MDX Scripting to enhance Cube functionality and support advanced analytics.

●Automated report generation and Cube refresh processes by creating SSIS jobs, ensuring the timely and accurate delivery of critical information.

●Excelled in deploying SSIS Packages to production, leveraging various configuration options to export package properties and achieve environment independence.

●Utilized SQL Server Reporting Services (SSRS) to author, manage, and deliver comprehensive reports, both in print and interactive web-based formats.

●Developed robust stored procedures and triggers to enforce data consistency and integrity during data entry operations.

●Leveraged the power of Snowflake to facilitate seamless data sharing, enabling quick and secure data exchange without the need for complex data pipelines.

●Map sources to targets using a variety of tools, including Business Objects Data Services/BODI. Design and develop ETL code to load and transform the source data from various formats into a SQL database.

●Worked extensively on different types of transformations like source qualifier, expression, filter, aggregator, rank, lookup, stored procedure, sequence generator and joiner.

●Created, launched & scheduled tasks/sessions. Configured email notification. Setting up tasks to schedule the loads at required frequency using Power Centre Server manager. Generated completion messages and status reports using Server manager.

●The Administrated Informatica server ran Sessions & Batches.

●Developed shell scripts for automation of Informatica session loads.

●Involved in the performance tuning of Informatica servers.

Environment: Windows server, MS SQL Server 2014, SSIS, SSAS, SSRS, SQL Profiler, Power BI, Performance Point Server, MS Office, SharePoint, Unix script, Oracle 8.0, SQL, PLSQL, Informatica 5.1, MS Excel.

Contact this candidate