Post Job Free
Sign in

Senior data engineer

Location:
Kansas City, MO
Posted:
September 12, 2025

Contact this candidate

Resume:

Nishith Gupta

Irving, TX ***************@*****.*** 816-***-**** www.linkedin.com/in/nishith-guptha/ Summary

• Over 5 years of experience as a versatile Data Engineer and BI Developer, building scalable, modern data infrastructure across Azure and AWS ecosystems.

• Began career in SQL and data warehousing, progressing into cloud-native ETL, real-time streaming, and analytics engineering.

• Designed and deployed end-to-end data pipelines using Azure Data Factory, Databricks, AWS Glue, and Apache Airflow, handling batch and real-time processing for structured and semi-structured data.

• Built and optimized data warehouses in Snowflake, Redshift, and Synapse Analytics using star and snowflake schemas, materialized views, and clustered indexes.

• Developed CDC frameworks using Snowflake Streams and Tasks, and handled schema evolution with tools like Delta Lake and Unity Catalog.

• Delivered business intelligence solutions via Power BI, Tableau, and Amazon QuickSight, building interactive dashboards, KPI reports, semantic models, and implementing RLS for secure reporting.

• Enabled self-service BI across teams in sales, operations, and finance with reusable datasets, report templates, and governed data models.

• Integrated AI/ML capabilities into the data stack by implementing vector databases and embeddings using Faiss, SageMaker, and OpenSearch k-NN for similarity search and recommendation systems.

• Architected Microsoft Fabric workloads, ensuring smooth data flow between Synapse, Power BI, Real-Time Analytics, and governance platforms like Microsoft Purview and Entra ID.

• Developed metadata registries, tagging strategies, and lineage tracking to support discoverability, observability, and governance of enterprise data assets.

• Worked with real-time data pipelines using Kafka, Event Hubs, and Stream Analytics for anomaly detection, predictive analytics, and operational monitoring.

• Collaborated cross-functionally with product managers, data scientists, BI developers, and DevOps to deliver trusted, discoverable, and production-grade data products.

• Experience spans the entire data lifecycle: data ingestion, processing, modeling, warehousing, visualization, quality enforcement, and compliance.

• Familiar with agile and enterprise-level development, delivering reliable and scalable data systems that support business impact and long-term analytics strategy

Skills

Cloud Platforms:

Azure (Data Factory, Synapse Analytics, Databricks, ADLS Gen2, Azure SQL, Cosmos DB, Blob Storage, Stream Analytics, Event Hub, Microsoft Fabric, Azure Monitor, Azure DevOps, Azure Kubernetes Service (AKS), Azure Resource Manager (ARM), Microsoft Entra ID, Microsoft Purview, Key Vault, Power BI) AWS (S3, DynamoDB, Redshift, RDS, Aurora, Glue, EMR, Athena, Lambda, Step Functions, Kinesis Firehose, SNS, SQS, EC2, CloudFormation, CloudWatch, Route 53, SageMaker, API Gateway, AWS KMS) Programming & Scripting:

Python (Pandas, NumPy, Seaborn, Matplotlib), PySpark, Scala, Java, SQL (T-SQL, PL/SQL, SnowSQL), Shell scripting, DAX, MDX

Big Data & Processing:

Apache Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, Pig, Sqoop, YARN, Zookeeper, Oozie), Kafka, Apache Flink, Apache Pulsar, Delta Lake, Delta Live Tables (DLT), Unity Catalog, HDInsight ETL & Orchestration Tools:

Azure Data Factory, AWS Glue, Databricks, Apache Airflow, Informatica, SSIS, Talend, dbt, Azure Logic Apps, Snowpipe, CDC (Change Data Capture)

Databases & Storage:

Snowflake, Azure Synapse, Azure SQL Database, PostgreSQL, MySQL, MongoDB, Cassandra, Oracle (PL/SQL), SQL Server, Teradata, Redshift, DynamoDB, HBase, Cosmos DB Data Modeling & Warehousing:

Dimensional modeling (Star/Snowflake schema), OLAP/OLTP systems, Medallion Architecture, SSAS, SSRS, SSIS, Semantic Models, Materialized Views, Data Marts

BI & Visualization:

Power BI (DAX, RLS/OLS, semantic models, KPIs, drill-through), Tableau, Amazon QuickSight, SSRS, New Relic, Splunk

Monitoring, Security & Governance:

Azure Monitor, AWS CloudWatch, Microsoft Sentinel, Azure Key Vault, AWS KMS, Microsoft Defender for Cloud, Microsoft Purview, IAM, Access Policies, Audit Logging, Compliance & Encryption DevOps & CI/CD:

Terraform, ARM Templates, Bicep, Docker, Kubernetes, GitHub Actions, Jenkins, GitLab, Bitbucket, Azure DevOps, Shell Scripts

Workflow Automation & Scheduling:

Apache Airflow, Azure Logic Apps, AWS Step Functions, Apache Oozie, Cron Jobs, KNIME Data Formats & Integration:

Avro, Parquet, ORC, JSON, XML, CSV, REST APIs, Snowflake Streams & Tasks, Real-Time Streaming (Kafka, Event Hubs), API Gateway

AI/ML & Analytics Support:

AWS SageMaker, Azure Machine Learning, Faiss, OpenSearch k-NN, Embeddings, Vector Databases, AI-ready data pipelines, Predictive Analytics, Real-Time Anomaly Detection Education

University of Missouri, Kansas City, Master’s in Computer Science Jan 2023 – Jul 2024

• GPA: 3.7/4.0

Certifications

Microsoft Certified: Azure Data Engineer Associate (DP-203) AWS Certified Data Analytics – Specialty

Microsoft Certified: Power BI Data Analyst Associate (PL-300) Publications

Crop Recommendation System DOI: 10.1109/OCIT53463.2021.00068 Experience

Azure Data Engineer/ Fabric Administrator/, Mckesson, Irving, TX Nov 2024 – Present Built end-to-end scalable data pipelines using Azure Data Factory (ADF), Databricks, and Apache Airflow, supporting both batch and real-time ingestion from structured and semi-structured sources.

• Leveraged Delta Lake and Unity Catalog in Azure Databricks to manage schema evolution, enable ACID transactions, and maintain data lineage and access control.

• Implemented Microsoft Fabric workloads including Synapse Data Engineering, Real-Time Analytics, Data Warehouse, Data Science, and Power BI for unified data integration.

• Designed and maintained enterprise data warehouses in Azure Synapse and Snowflake, using star/snowflake schemas and optimizing with clustered columnstore indexes and distribution strategies.

• Developed CDC mechanisms using Snowflake Streams, Tasks, and Python in Databricks to enable real-time change tracking and historical analysis.

• Built real-time streaming pipelines integrating Kafka, Azure Event Hubs, Stream Analytics, and Azure Machine Learning for predictive analytics and anomaly detection.

• Automated CI/CD deployments using Azure DevOps, GitHub Actions, ARM templates, Terraform, and Bicep, enabling smooth and secure production releases.

• Collaborated with business stakeholders to gather and translate complex business requirements into scalable data solutions, ensuring alignment with organizational KPIs and analytics goals.

• Enabled data governance and security via Microsoft Entra ID, Purview, Defender for Cloud, and implemented RLS/OLS policies over Power BI semantic models.

• Integrated Snowflake with Power BI and Azure services for seamless reporting, utilizing features like Time Travel, Multi-Cluster Warehouses, and materialized views.

• Optimized Spark jobs and data transformation logic using PySpark, DBT, and Spark SQL in Databricks, enhancing scalability and performance for large-scale workloads.

• Designed Medallion architecture (bronze, silver, gold layers) on ADLS Gen2 to standardize data processing stages, ensuring quality and traceability.

• Built and monitored data quality, validation, and alerting workflows using Azure Monitor, Logic Apps, Sentinel, and custom Python scripts.

• Created secure secrets management and credential access using Azure Key Vault integrated into pipelines, CI/CD, and Kubernetes workloads.

• Resolved inconsistencies and missing values in critical business data through root cause analysis and collaborative troubleshooting with upstream data producers.

• Used Azure Power Apps to develop user-friendly interfaces for triggering data workflows and interacting with Fabric reports and data catalogs.

• Supported both business users and AI-driven systems with data APIs, metadata registries, and vector-ready database schemas for AI/ML workloads.

• Implemented data governance best practices using Microsoft Purview and Glue Data Catalog, including data classification, tagging, and audit logging.

• Worked with complex data formats such as Avro, Parquet, ORC, JSON, and XML while maintaining high- throughput processing and compression efficiency.

• Mentored junior engineers and collaborated with data scientists, analysts, and DevOps teams to deliver AI-ready, governed, and high-performance data infrastructure.. Data Engineer, UMKC – Kansas City, MO Aug 2023 – Jul 2024

• Designed and implemented large-scale data processing solutions using AWS services including EMR (Elastic MapReduce), S3, AWS Glue, Redshift, Lambda, Step Functions, and SageMaker.

• Worked on DevOps pipelines using GitLab and Jenkins for CI/CD automation, ensuring smooth deployment and testing.

• Optimized batch job scheduling and real-time data processing using Apache Airflow, AWS Lambda, and Step Functions, automating data migration and transformation workflows.

• Optimized data warehouse solutions in AWS Redshift, ensuring seamless data integration from multiple sources.

• Developed scalable data processing integrations using AWS Glue and EMR, ensuring high availability and fault tolerance.

• Designed and maintained relational database models using SQL Server, PostgreSQL, and Oracle, enforcing referential integrity and normalization standards for transactional and analytical workloads.

• Optimized large-scale data pipelines with PySpark/Scala and Spark SQL on AWS Glue and EMR, improving performance and processing efficiency.

• Implemented Snowflake data warehouse solutions on AWS, enhancing data storage and processing capabilities for improved performance and scalability.

• Integrated Snowflake with AWS services like AWS Glue, AWS Lambda, and AWS Redshift, enabling smooth data workflows for ingestion, transformation, and analytics.

• Explored and optimized Spark on AWS EMR, improving performance of existing algorithms using Spark Context, Spark-SQL, DataFrames, and Pair RDDs.

• Estimated EMR cluster size, monitored, and troubleshot AWS Glue and Databricks clusters for optimized performance.

• Created Databricks notebooks using SQL, Python, and automated Databricks Jobs for data processing.

• Provisioned and configured high-concurrency Spark clusters on AWS EMR and Databricks, enhancing data preparation efficiency.

• Utilized AWS Redshift for advanced analytics and data warehousing, integrating structured and unstructured data for streamlined data processing.

• Created metadata documentation and data dictionaries to ensure business and technical users clearly understood dataset definitions, KPIs, and lineage.

• Performed SQL-based querying on large datasets in Redshift and Snowflake, enabling real-time analytics and business intelligence.

Data Engineer/BI Developer, OpenText – India Jul 2022 – Dec 2022

• Migrated extensive datasets from on-premises legacy systems to AWS using Amazon S3 and DynamoDB, enhancing data security, archival storage, and accessibility during the SDLC.

• Designed and implemented Redshift clusters integrated with Amazon QuickSight and Power BI, delivering interactive analytics dashboards, drill-through reports, and cross-functional KPI visualizations.

• Integrated Snowflake models and deployed Snowpipes for automated data loading from S3 to Snowflake, streamlining ingestion for BI tools like Power BI and Tableau.

• Built comprehensive semantic models in Power BI using DAX, supporting dynamic aggregations, calculated columns, and time intelligence metrics.

• Created reusable Power BI datasets and report templates for Finance, Sales, and Operations teams, supporting self-service BI across departments.

• Implemented Row-Level Security (RLS) in Power BI to restrict access based on user roles, improving compliance and enabling secure collaboration.

• Developed custom Power BI visuals and enhanced performance using techniques such as aggregations, incremental refresh, and direct query where applicable.

• Transformed traditional workflows into serverless pipelines using AWS Lambda, SNS, SQS, and Step Functions, supporting real-time insights for analytics and automation systems.

• Utilized Amazon Athena for querying large datasets in S3 and developed Power BI dashboards based on Athena and Glue Data Catalog sources.

• Built robust ingestion pipelines using AWS Data Pipeline and API Gateway to unify batch and event-driven data flows into central stores like Redshift, S3, and DynamoDB.

• Created and maintained Power BI dashboards for anomaly detection, SLA tracking, and operational health, pulling from APIs, S3, Snowflake, and other structured sources.

• Enabled data enrichment and fast retrieval by integrating high-dimensional vector embeddings into OpenSearch k-NN and Faiss, powering AI-ready dashboards and recommendation engines.

• Applied SageMaker and Python (Pandas, NumPy, Seaborn) to generate embeddings for structured and unstructured data, integrating results into Power BI reports for classification and trend prediction.

• Developed and documented data models using Star and Snowflake schemas to support fast, scalable Power BI reporting and enable easy data slicing and drill-down.

• Used Power BI service to configure refresh schedules, alerts, and workspace access policies, ensuring consistent delivery of up-to-date insights.

• Conducted end-to-end data modeling including logical and physical schema designs, leveraging Star and Snowflake schemas for enterprise data warehouses.

• Improved PostgreSQL and Redshift query performance through schema optimization, partitioning, and vacuum management to reduce report load times in Power BI.

• Implemented encryption, monitoring, and compliance via AWS KMS, CloudTrail, and Shield to ensure secure access and protect sensitive analytics data.

• Used AWS Glue Data Catalog to track data lineage and enable metadata tagging, enhancing data discoverability within Power BI's external data source connectors

• Developed automated data pipelines that ensured timely delivery of cleansed and enriched data to data marts, dashboards, and reporting layers..

• Leveraged CloudFormation and CloudWatch for provisioning, monitoring, and alerting across BI infrastructure, ensuring uptime and rapid issue resolution.

• Performed advanced data manipulation using Python (Pandas, NumPy) and SQL (window functions, CTEs, temp tables) to cleanse, transform, and join large datasets from diverse sources.

• Partnered with cross-functional teams including data science, DevOps, and business stakeholders — to develop end-to-end analytics solutions that integrate Power BI, AWS, and Snowflake.

• Provided user training and best practices for Power BI adoption, promoting data literacy and self-service analysis across business units.

Data Analyst / Data Warehouse Developer, Quantiphi – India Aug 2020 – Jul 2022

• Designed and implemented ETL data pipelines using SSIS, Informatica, Azure Data Factory, and Databricks to extract data from SQL Server, Snowflake, Redshift, PostgreSQL, and Oracle for downstream reporting and analysis.

• Developed dashboards and KPI reports in Power BI, Tableau, and Amazon QuickSight to support sales, marketing, and operations teams, enabling real-time decision-making.

• Built and optimized T-SQL, PL/SQL, and Snow SQL queries across Synapse, Snowflake, Teradata, and Athena, reducing query time by 40% through indexing and partitioning strategies.

• Modeled complex business data using Star and Snowflake schemas in Azure SQL Data Warehouse and Snowflake, enabling slice-and-dice reporting and historical trend analysis.

• Created SSAS cubes and implemented measures, KPIs, aggregations, and partitions for multidimensional reporting; used MDX for advanced calculations.

• Developed stored procedures, functions, views, and triggers to support efficient data transformation, integrity, and automation across the enterprise data warehouse.

• Integrated structured and semi-structured data from sources like ADLS, Cosmos DB, Hive, HDInsight, and S3 using formats such as JSON, XML, Avro, and Parquet.

• Automated recurring workflows using Azure Monitor, Logic Apps, AWS CloudWatch, and cron jobs to ensure on-time data delivery and system reliability.

• Performed exploratory data analysis (EDA) and statistical trend analysis using Python (Pandas, NumPy, Seaborn, Matplotlib) to drive business decisions and optimize campaigns.

• Developed OLAP cubes, materialized views, and aggregation tables to support drill-through and ad hoc analysis.

• Implemented Change Data Capture (CDC) for incremental load and historical tracking using SSIS and Snowflake Streams & Tasks.

• Conducted root cause analysis for inconsistent metrics and data quality issues using SQL and Python, resolving long- standing reporting gaps.

• Maintained access control and compliance by integrating Azure Key Vault, AWS KMS, and user-level permissions into data pipelines and reporting layers.

Created automated reporting frameworks using KNIME and Logic Apps, reducing manual reporting tasks by 60%.

• Collaborated cross-functionally with BI developers, product managers, QA, and DevOps teams to gather requirements, run UAT cycles, and roll out dashboards.

• Built and maintained high-performing, production-ready data marts such as Consolidated Data Store, Actuarial Data Mart, and Reference Database to support analytical use cases.

• Scheduled batch and real-time jobs on Unix/Linux environments, and integrated Git for version control across workflows and SQL code.

• Used Erwin Data Modeler to create dimensional models and document metadata for governance and discovery.



Contact this candidate