Sadhana Prathika Nellikanti
Senior Data Engineer Data Infrastructure & Cloud Data Platform Specialist Phone: +1-940-***-**** Email: ***************.************@*****.*** PROFESSIONAL SUMMARY
Accomplished Senior Data Engineer with 6+ years of experience designing and delivering distributed, scalable, and high-availability data platforms across enterprise, financial, and consulting domains. Specialized in cloud data infrastructure, data lakehouse modernization, and metadata-driven ETL/ELT pipelines leveraging Azure, Databricks, Apache Spark, Delta Lake, and Synapse. Proven expertise in building real-time streaming pipelines, orchestrating enterprise data integration, and developing robust transformation frameworks supporting analytics, BI, and regulatory reporting. Adept at implementing data governance, lineage, and observability solutions while optimizing big data workloads for performance and cost efficiency. Strong background in automation, CI/CD, and multi- cloud data engineering, with a consistent record of improving data reliability, accelerating delivery, and enabling scalable analytics across diverse business ecosystems. CORE SKILLS & TECHNICAL EXPERTISE
Data Engineering & Big Data: Apache Spark, PySpark, Spark SQL, Databricks, Delta Lake, Hadoop, Hive,Kafka, Event Hub, Pub/Sub, Apache Airflow, Apache NiFi, Delta Live Tables, dbt (Data Build Tool), Fivetran, Great Expectations, Apache Deequ, Presto / Trino, HBase, Sqoop, Oozie
Cloud Platforms: Azure (ADF, ADLS Gen2, Synapse, Databricks, Azure SQL), GCP (BigQuery, Dataproc,Composer), AWS (S3, Glue basics), Azure Key Vault, Azure Functions, AWS Lambda, GCP Dataflow, Kubernetes (Basic) ETL/ELT & Orchestration: Azure Data Factory, Airflow/Cloud Composer, AWS Glue, Modular ETL Frameworks, Streaming ETL, Batch Pipelines
Data Warehousing & Modeling: Synapse Analytics, Snowflake, BigQuery, Star & Snowflake Schema, Dimensional Modeling, OLAP Optimization
Programming: Python, SQL, Scala, REST API Integrations, Pandas, NumPy DevOps & CI/CD: Git, GitHub, Azure DevOps, Jenkins, Terraform (fundamentals), Docker, GitHub Actions, YAML
Monitoring & Observability: Azure Monitor, Log Analytics, CloudWatch, Databricks Metrics BI Analytics: Power BI, Tableau, SSRS
PROFESSIONAL EXPERIENCE
Role : Senior Data Engineer
Thrivent Finance — Dallas, Texas March 2024 – Present Responsibilities:
• Designed and deployed scalable end-to-end ETL/ELT pipelines using Azure Data Factory, Databricks, and Synapse to process financial, policy, and claims data from diverse source systems.
• Architected and implemented a comprehensive medallion data lake structure (Raw Bronze Silver Gold) using ADLS Gen2 and Delta Lake to support high-performance analytics and governance.
• Developed advanced PySpark frameworks for large-scale data cleansing, transformation, schema evolution, incremental processing, and data quality validation.
• Built and optimized real-time streaming pipelines using Event Hub and Structured Streaming for fraud detection, real-time reporting, and low-latency financial monitoring.
• Implemented Delta Lake ACID transactions, versioning, partitioning, and ZORDER optimization to enhance data reliability and query performance.
• Created analytics-ready fact/dimension models, materialized views, and performance-optimized datasets in Synapse for business intelligence and CFO-level dashboards.
• Automated data quality checks, reconciliation logic, audit frameworks, and exception handling to ensure accuracy across multiple layers.
• Developed CI/CD pipelines using Azure DevOps for automated deployment of notebooks, SQL objects, and ADF pipelines across environments.
• Built monitoring dashboards and logging frameworks using Azure Monitor and Databricks, improving data pipeline stability and SLA compliance.
• Collaborated with finance SMEs, analysts, and governance teams to define KPIs, SLAs, documentation, and compliance standards.
• Implemented metadata-driven and parameterized ETL frameworks to standardize ingestion across environments.
• Designed distributed and fault-tolerant data pipelines ensuring autoscaling and high availability.
• Enabled advanced data lineage and governance through integration with Azure Purview / metadata catalogs.
• Optimized Lakehouse performance using AQE, ZORDER, optimized joins, and cluster autoscaling.
• Established automated orchestration workflows using Databricks Workflows and ADF dependencies. Role : Data Engineer
PwC — Hyderabad, India December 2020 – July 2023
Responsibilities:
• Designed and executed multi-cloud ETL/ELT solutions across Azure, AWS, and GCP to standardize enterprise data integration for multiple clients.
• Developed PySpark-based batch and streaming pipelines for high-volume data transformations, cleansing, enrichment, and business logic execution.
• Implemented lakehouse architectures using ADLS, S3, and GCP Storage with structured zone layers (Raw Refined Curated) ensuring scalability and governance.
• Engineered Delta Lake and Parquet data models with schema enforcement, ACID reliability, and optimized file structures.
• Built enterprise data warehouse models in Synapse, Snowflake, and BigQuery including facts, dimensions, materialized views, and performance-optimized SQL queries.
• Orchestrated complex workflows using ADF, AWS Glue, and Cloud Composer, incorporating reusable, parameterized, and modular ETL components.
• Implemented event-driven streaming pipelines using Kafka, Event Hub, and Pub/Sub for real-time analytics and operational reporting.
• Developed CI/CD automation using Azure DevOps, GitHub Actions, and Jenkins for seamless code deployment across multi-environment pipelines.
• Conducted extensive performance tuning of Spark jobs, SQL queries, and MPP warehouse models, achieving up to 60% faster processing.
• Delivered Power BI and Tableau dashboards enabling regulatory reporting, compliance monitoring, and executivelevel insights.
• Ensured strong data governance via RBAC, encryption, data masking, PII handling, and audit logging in line with PwC standards.
• Built reusable transformation modules and standardized pipeline components to accelerate multi-client delivery.
• Designed enterprise data integration frameworks supporting cross-cloud migration (Azure, AWS, GCP).
• Developed streaming-first architectures enabling near real-time data pipelines for operational analytics.
• Implemented advanced data validation rules using Great Expectations and custom Python frameworks.
• Drove cloud cost optimization through cluster tuning, job consolidation, and storage optimization. Role: Data Engineer
Surifiy — India February 2019 – November 2020
Responsibilities:
• Developed cloud-based ETL/ELT pipelines to integrate APIs, third-party applications, relational systems, and semistructured datasets.
• Built and optimized PySpark pipelines for large-scale data normalization, transformation, aggregation, and enrichment.
• Designed data lake architectures using ADLS and S3 with standardized storage layers for efficient ingestion and processing.
• Created Delta Lake and Parquet datasets with partitioning, schema evolution, and ACID enforcement for reliable analytics.
• Implemented real-time data ingestion using Kafka/Event Hub and streaming pipelines for operational analytics.
• Built analytical data models using SQL Server, Snowflake, and BigQuery including optimized fact/dimension structures.
• Implemented data quality checks, error-handling, automated alerts, logging, and monitoring frameworks.
• Performed Spark and SQL performance tuning using caching, broadcast joins, file compaction, and cluster optimization.
• Delivered executive dashboards and curated datasets in Power BI/Tableau for business insights.
• Ensured compliance and security through role-based access, encryption, and secure data handling practices.
• Developed robust metadata-driven ingestion patterns improving scalability and maintainability.
• Built automated monitoring and observability solutions enabling anomaly detection and SLA management.
• Designed lakehouse models with advanced schema evolution and ACID transaction handling.
• Performed workload optimization reducing pipeline runtime and cloud compute costs. EDUCATION
Bachelor of Technology (B.Tech) – Computer Science TOOLS & TECHNOLOGIES
ADF Databricks Synapse Spark Kafka Event Hub Snowflake BigQuery SQL Server Terraform Git Azure DevOps Jenkins Delta Lake Parquet Avro Power BI Tableau Cloud Composer