Senior Azure Data Engineer with 10+ years in scalable data pipelines

Location:

Beavercreek, OH

Posted:

May 04, 2026

Contact this candidate

Resume:

Venkata Naga Rupasri Darsi

Sr. Azure Data Engineer

Phone: +1-326-***-****

Email: *************@*****.***

PROFESSIONAL SUMMARY

Senior Azure Data Engineer with 10+ years of experience in architecting, developing, and optimizing large-scale data pipelines, ETL workflows, and analytical solutions across Healthcare, Banking, Telecom, and Government domains.

Expert in designing end-to-end data integration and transformation solutions using Azure Data Factory (ADF), Synapse Analytics, Databricks, and Azure SQL for cloud-based data modernization.

Strong expertise in SQL, PL/SQL, and advanced queries for data extraction, manipulation, and performance optimization across Oracle, SQL Server, PostgreSQL, MySQL, DB2, and Snowflake.

Skilled in data visualization and BI using Tableau, Power BI, Looker, Qlik, Spotfire, Grafana, and Plotly, creating interactive dashboards and executive reports to communicate insights to stakeholders.

End-to-end onboarding and validation of Azure VM and hardware SKUs.

Control-plane integration and lifecycle management for Azure VMs.

Host configuration validation: firmware (BIOS/BMC), drivers, platform settings.

Guest configuration validation: OS images, VM sizing, and feature compliance.

SKU qualification, test planning, and execution across private/public previews and GA.

Production readiness gating and go/no-go decision support for SKU launches.

Familiarity with Azure DevOps (ADO) for work items, test plans, and defect tracking.

Experience collaborating with compute, network, storage, fabric, and capacity teams.

Hands-on experience with hyperscale cloud hardware platforms.

Experienced in statistical techniques including hypothesis testing, regression analysis, correlation, clustering, A/B testing, ANOVA, and time-series forecasting to support data-driven strategies.

Hands-on experience in ETL development and data pipeline design using Informatica, Talend, SSIS, dbt, Airflow, AWS Glue, and Azure Data Factory, ensuring seamless data integration across enterprise systems.

Built scalable data pipelines using GCP BigQuery, Dataflow, and Pub/Sub

Migrated data pipelines from Azure/AWS to GCP architecture

Worked with Teradata for large-scale data warehousing and query optimization

Designed BigQuery-based data warehouse with optimized partitioning & clustering

Designed pipelines integrating MongoDB/Cassandra for semi-structured data

Adept at working with big data platforms including Hadoop, Hive, Spark, PySpark, and Databricks to analyze large-scale structured and unstructured datasets.

Knowledgeable in data governance, lineage, and quality frameworks such as Collibra, Unity Catalog, Purview, Great Expectations, and Apache Atlas, ensuring compliance with HIPAA, GDPR, and regulatory standards.

Skilled in real-time analytics using Kafka, AWS Kinesis, and Azure Event Hubs, enabling streaming dashboards and proactive business monitoring.

Proficient in Excel (Pivot Tables, Power Query, Macros, VBA) for advanced reporting and quick business analyses.

Recognized for bridging the gap between technical and business teams, translating raw data into actionable strategies that improve revenue growth, cost optimization, and operational efficiency.

Strong background in predictive analytics and forecasting using ARIMA, Prophet, and statistical modeling to support sales, finance, and supply chain planning.

Designed and developed data pipelines using Microsoft Fabric components including Dataflows Gen2, Fabric Pipelines, and Lakehouse architecture to support scalable analytics workloads.

Understanding of server hardware platforms (CPU, memory, NICs, GPUs, storage).

Hardware platform engineering, server design, or SKU definition workflows.

Knowledge of virtualization, VM platform roles, and firmware/driver dependencies.

Compute hardware engineering concepts for cloud-scale platforms.

Implemented metadata-driven and parameterized data pipelines enabling reusable and dynamic data ingestion across multiple Fabric workspaces.

Built and optimized Fabric Lakehouse and Warehouse solutions leveraging OneLake storage for centralized enterprise data management.

Developed semantic data models for Power BI and Microsoft Fabric to support self-service analytics and reporting across business teams.

Implemented row-level security (RLS) and role-based access controls (RBAC) to ensure secure data access across enterprise data assets.

Designed and maintained ELT workflows using SQL, Python, and Spark to process large-scale datasets within the Fabric ecosystem.

Integrated data from APIs, relational databases, flat files, and streaming sources into Fabric Lakehouse environments.

Collaborated with analytics and BI teams to deliver Power BI dashboards and enterprise reporting solutions leveraging Fabric data models.

Implemented data governance, lineage tracking, and monitoring using Microsoft Purview and Fabric governance features.

Automated deployment and version control of data pipelines using Git, Azure DevOps, and CI/CD pipelines.

Monitored and optimized data pipeline performance, query execution, and storage utilization within Microsoft Fabric environments.

Hands-on experience in analytics engineering using DBT Core/Cloud, implementing modular data models, testing frameworks, and automated documentation for scalable ELT pipelines.

Supported Agile development processes, including sprint planning, backlog grooming, and cross-team collaboration for data platform enhancements.

Adept at designing KPI dashboards and executive scorecards, enabling leadership to track business performance, market trends, and operational efficiency in real-time.

Skilled in data storytelling and presentation for C-level executives, simplifying complex analytics into visual insights and business recommendations.

Experienced in data security, masking, and role-based access management ensuring compliance while handling sensitive financial and healthcare data.

Demonstrated success in mentoring junior analysts, implementing best practices in SQL optimization, dashboard development, and advanced analytics workflows.

Proven ability to work in cross-functional teams, collaborating with data engineers, scientists, and business stakeholders to deliver scalable and impactful data solutions.

Strong focus on automation and efficiency, reducing manual reporting time by implementing self-service analytics solutions and automated ETL processes.

Recognized for driving data-driven digital transformation initiatives, helping organizations transition from descriptive analytics to predictive and prescriptive decision-making.

Cross-functional collaboration and effective communication.

Strong organizational skills with focus on deadlines and project goals.

Analytical problem-solving and mentorship in technical teams.

TECHNICAL SKILLS

Category

Tools & Technologies

Programming & ML

Python, R, PySpark, Scala, Java, SQL, PL/SQL, SAS, TensorFlow, PyTorch, Keras, Scikit-Learn, XGBoost, LightGBM, CatBoost, HuggingFace Transformers, NLTK, SpaCy

Data Analysis & Reporting

Excel (Pivot Tables, VLOOKUP, Power Query, Power Pivot, Macros), Google Analytics

Big Data & Cloud

Hadoop (HDFS, Hive, Pig, Sqoop, Oozie, MapReduce), Spark, Delta Lake, Databricks, Snowflake, AWS Redshift, Azure Synapse, GCP BigQuery, Azure Data Lake, AWS S3, GCP Storage, Teradata, Netezza

Streaming & Real-Time

Apache Kafka, Spark Structured Streaming, Flink, AWS Kinesis, Azure Event Hubs, Google Pub/Sub, RabbitMQ, IBM MQ, Confluent Kafka

Data Engineering & ETL

Informatica, Talend, SSIS, dbt, Airflow, AWS Glue, Azure Data Factory, GCP Dataflow, Matillion, Control-M, Logic Apps

Data Governance & Quality

Great Expectations, Collibra, Unity Catalog, Apache Atlas, Purview, Informatica DQ, Data Lineage Tools

Visualization & BI

Tableau, Power BI, Looker, Qlik Sense, QlikView, Spotfire, Grafana, Plotly, Matplotlib, Seaborn, D3.js

Databases & Querying

Oracle, SQL Server, PostgreSQL, MySQL, DB2, MongoDB, Cassandra, Cosmos DB, Amazon Aurora

DevOps & Infra

Git, GitHub Actions, Jenkins, Bitbucket, Terraform, Azure DevOps, Docker, Kubernetes, Helm, Ansible, DBT Core, DBT Cloud (data modeling, testing, documentation, lineage),CI/CD Pipelines, Azure VM & Hardware SKU Onboarding, Host & Guest Configuration Validation (BIOS/BMC, firmware, drivers, VM sizing), Control-Plane Integration, Virtualization, Server Architecture (CPU, Memory, NICs, GPUs, Storage), Hardware Platform & Compute Lifecycle Management

PROFESSIONAL EXPERIENCE

Client: General Dynamics Information Technology (GDIT) Nov 2025 – Present

Role: Senior Azure Data Engineer

Responsibilities:

Designed and developed enterprise-scale ETL pipelines for ingesting structured and semi-structured data from multiple government data sources.

Built scalable data transformation workflows using Python, SQL, PySpark, and Spark for high-volume datasets.

Developed batch and real-time data pipelines using AWS Glue, Azure Data Factory, and Apache Airflow.

Implemented data cleansing, validation, and transformation logic to ensure high data quality and integrity.

Designed dimensional data models and optimized Snowflake/Redshift/Synapse data warehouses for analytics and reporting.

Integrated data from APIs, flat files, databases, and cloud storage into centralized data lake environments.

Automated ETL deployments using CI/CD pipelines with GitHub Actions, Jenkins, and Docker.

Participated in Agile ceremonies including daily stand-ups, sprint planning, and retrospectives using Jira and Azure DevOps.

Developed and managed scalable data transformation pipelines using DBT Core, implementing modular models, reusable macros, and source-to-target mappings aligned with analytics engineering best practices.

Implemented DBT testing frameworks (schema tests, data quality checks) and automated documentation to ensure data reliability and lineage tracking.

Integrated DBT workflows with CI/CD pipelines using Azure DevOps and GitHub Actions for automated deployment and version control.

Implemented data governance controls using Unity Catalog and role-based access policies.

Optimized ETL performance by tuning SQL queries, partitioning large datasets, and implementing parallel processing strategies.

Collaborated with data analysts and BI teams to support reporting through Power BI and Tableau dashboards.

Ensured compliance with federal data security and privacy standards.

Client: Goken America – Houston, TX

Role: Sr. Azure Data Engineer Feb 2024 – Oct 2025

Responsibilities:

Designed and deployed interactive dashboards in Tableau, Power BI, and Spotfire for predictive maintenance, asset health, and operational performance monitoring.

Conducted data extraction, cleaning, transformation, and integration of IoT sensor, ERP, and manufacturing datasets using SQL and ETL pipelines.

Built real-time anomaly detection dashboards powered by streaming data from Kafka and AWS Kinesis, enabling immediate response to equipment issues.

Developed Excel-based forecasting models for production scheduling, downtime planning, and resource optimization.

Partnered with engineering teams to visualize root-cause analyses, delivering actionable insights to reduce equipment failures.

Optimized high-volume sensor data ingestion pipelines using PySpark and Delta Lake for efficient storage and processing.

Delivered executive-level scorecards highlighting operational KPIs, cost savings, asset reliability, and predictive maintenance outcomes.

Automated report refresh cycles leveraging Power BI Service and Tableau Server, ensuring stakeholders always have updated insights.

Implemented data governance and access control frameworks using Unity Catalog for secure, compliant reporting.

Designed and implemented ELT workflows using DBT Core for transforming IoT and manufacturing datasets, improving data model consistency and reusability.

Built and maintained DBT models to support downstream reporting in Power BI and Tableau, ensuring standardized and governed data layers.

Supported real-time alerting and notification systems embedded in dashboards for proactive operational decision-making.

Designed scalable data pipelines for IoT telemetry and manufacturing data, improving performance and reducing latency.

Integrated multi-cloud datasets from AWS, Azure, and GCP to enable centralized analytics and reporting.

Ensured data accuracy, consistency, and lineage across dashboards and reporting platforms for audit-ready compliance.

Collaborated with cross-functional teams to translate operational challenges into actionable analytics solutions.

Enhanced Azure infrastructure knowledge: Participated in VM sizing assessments, SKU onboarding validation, host/guest configuration checks, and control-plane integration for cloud-scale data workloads.

Client: Nam IT Solutions – New York, NY Aug 2021 – Jan 2024

Role: Sr. Azure Data Engineer

Responsibilities:

Designed and developed scalable data pipelines using Microsoft Fabric components including Dataflows Gen2, Fabric Pipelines, and Lakehouse architecture to support enterprise analytics and reporting.

Implemented metadata-driven and parameterized ETL/ELT workflows enabling reusable and dynamic data ingestion across multiple Fabric workspaces.

Built and maintained Fabric Lakehouse and Warehouse solutions leveraging OneLake storage to centralize structured and semi-structured enterprise datasets.

Developed and optimized SQL and Spark-based transformation processes to process large-scale operational and log data efficiently.

Designed Power BI semantic models integrated with Microsoft Fabric, enabling self-service analytics and reporting for business and IT operations teams.

Implemented row-level security (RLS) and role-based access control (RBAC) to ensure secure and compliant access to enterprise data assets.

Integrated data from multiple sources including APIs, relational databases, log systems, and streaming platforms into Fabric Lakehouse environments.

Built batch and streaming data pipelines using Azure Event Hubs, Kafka, and Spark, enabling near real-time monitoring and analytics.

Applied data governance, lineage tracking, and compliance controls using Azure Purview and Fabric governance features.

Optimized query performance and data partitioning strategies to improve data retrieval speed and analytical processing efficiency.

Implemented CI/CD pipelines using Azure DevOps and GitHub Actions to automate deployment and version control of Fabric data pipelines.

Collaborated with analytics and reporting teams to deliver Power BI dashboards and enterprise reporting solutions aligned with business requirements.

Monitored and troubleshot production data pipelines, resolving performance issues and ensuring reliable data processing workflows.

Supported Agile/Scrum development practices, participating in sprint planning, backlog grooming, and cross-functional collaboration.

Enhanced data quality and validation frameworks, ensuring accuracy, consistency, and reliability of datasets across Fabric environments.

Expanded Azure infrastructure expertise: Engaged in host and guest configuration validation, firmware and driver checks, VM sizing, control-plane integration, and SKU qualification for production readiness.

Environment: Microsoft Fabric, Dataflows Gen2, Fabric Pipelines, Lakehouse, OneLake, Power BI, SQL, Python, Spark, Azure Data Factory, Synapse Analytics, Databricks, Event Hubs, Kafka, Azure Purview, Azure DevOps, GitHub Actions, Azure VM & SKU onboarding

Client: State of Nebraska – Lincoln, NE

Role: Data Engineer Aug 2019 – July 2021

Responsibilities:

Designed and implemented fraud detection dashboards for Medicaid claims and unemployment data, enabling timely identification of anomalies and compliance risks.

Built forecasting models and reports to predict state revenue, healthcare outcomes, and policy impacts, supporting data-driven decision-making.

Automated cross-agency ETL pipelines using Azure Data Factory and AWS Glue, improving data processing efficiency and reducing manual workload.

Delivered interactive Tableau dashboards for public health, taxation, and social services, enhancing accessibility of key metrics to stakeholders.

Ensured data security and compliance (HIPAA, PHI) through encryption, masking, and role-based access controls.

Implemented bias detection and fairness checks in healthcare and social services reporting dashboards to support equitable policy decisions.

Partnered with state policymakers to provide actionable insights for Medicaid optimization and public service improvements.

Migrated legacy on-premises workloads to Azure and AWS cloud environments, enabling scalable and modernized data operations.

Developed data quality and validation reports leveraging Collibra and Purview, ensuring accuracy and consistency across datasets.

Published daily, weekly, and monthly performance scorecards for state leadership, providing transparency into operational metrics.

Designed data models and pipelines to integrate structured and semi-structured data for analytics and reporting purposes.

Supported real-time monitoring of public datasets using Delta Lake and Snowflake for timely analytics and reporting.

Environment: SQL, Excel, Tableau, Power BI, Azure Data Factory, AWS Glue, Snowflake, Synapse, Delta Lake, Collibra, Purview

Client: T-Mobile – Bellevue, WA

Role: Data Analyst Oct 2017 – July 2019

Responsibilities:

Designed and deployed customer churn dashboards for executive leadership, enabling proactive retention strategies and reducing churn by identifying at-risk segments.

Conducted customer segmentation and behavioural analysis using SQL to optimize targeted marketing campaigns and improve upsell/cross-sell effectiveness.

Developed real-time call quality dashboards integrating Kafka and Spark Streaming with Tableau and Power BI, providing near-instant visibility into network performance issues.

Built NPS sentiment reporting dashboards to track customer satisfaction trends, identify pain points, and recommend service improvements.

Partnered with marketing, product, and operations teams to visualize campaign ROI, revenue trends, and identify high-impact business opportunities.

Delivered KPI dashboards covering billing, network usage, service quality, customer engagement, and retention metrics for data-driven decision-making.

Developed self-service BI frameworks empowering business users to generate ad-hoc and customized reports without dependency on IT.

Implemented data quality and validation frameworks using Great Expectations, ensuring accuracy, consistency, and reliability of reporting and analytics outputs.

Automated monthly and quarterly revenue, churn, and operational reports using SQL, Excel, and BI tools, reducing manual effort and improving timeliness.

Collaborated with executives and stakeholders to embed dashboards and analytics into strategic workflows, supporting operational and business decisions.

Conducted ad-hoc and predictive analyses to identify network optimization opportunities, improve marketing ROI, and inform pricing and retention strategies.

Developed trend analysis and forecasting reports to support capacity planning, campaign effectiveness measurement, and customer engagement initiatives.

Maintained documentation and data governance standards, ensuring auditability and compliance for customer and operational data.

Monitored data pipelines and ETL processes, identifying and resolving issues proactively to ensure uninterrupted reporting.

Trained and mentored junior analysts on SQL, Power BI, Tableau, and best practices for dashboarding and data visualization.

Client: Google – Hyderabad, India

Role: Data Engineer Jun 2012 – Mar 2016

Responsibilities:

Analyzed petabyte-scale clickstream and search log datasets using Hadoop, Hive, and Pig to identify trends and optimize search engine relevance.

Built interactive dashboards in Tableau for ad relevance, personalization, and user engagement analytics across multiple Google products.

Designed and executed A/B and multivariate testing frameworks to optimize CTR, ad ranking, and personalized recommendation algorithms.

Conducted SQL-driven query intent classification and behavior analysis to support targeted ad campaigns and improve user experience.

Partnered with ML and data science teams to prepare clean, high-quality datasets for predictive modeling and algorithm training.

Developed automated ETL pipelines using Hive, BigQuery, and scripting, reducing manual reporting effort by 50% and improving data reliability.

Implemented performance monitoring dashboards for search and ad systems, tracking KPIs, anomalies, and engagement metrics in near real-time.

Delivered executive-level insights and reports to senior leadership, influencing product strategy, ad placement decisions, and UX enhancements.

Collaborated with UX and product teams to visualize user behavior trends and improve cross-platform product engagement.

Maintained data governance and quality checks, ensuring accuracy, consistency, and compliance with internal standards.

Streamlined ad campaign reporting processes, integrating multiple data sources for consolidated insights.

Provided training and mentorship to junior data engineers on data processing, Hive queries, BigQuery, and dashboard development.

Contact this candidate