Post Job Free
Sign in

Senior Data Engineer & Architect - Multi-Cloud Lakehouse Expert

Location:
Voorhees, NJ
Posted:
April 05, 2026

Contact this candidate

Resume:

Yadaiah Goud Chinnaveeranna

Sr. Data Engineer / Data Architect · Multi-Cloud (Azure & AWS) · Lakehouse & Data Governance · 18+ Years New Jersey, USA · +1-908-***-**** · ******@*****.*** · linkedin.com/in/yadaiah-goud-chinnaveeranna-89953060 PROFESSIONAL SUMMARY

I've spent 18 years in data - starting with enterprise integration and BizTalk, moving into cloud data warehousing, and spending the last six years deep in Databricks and the modern Lakehouse stack. Most of my recent work has been on AWS and Azure Databricks, writing PySpark and SQL every day across full Medallion pipelines. I enjoy the full lifecycle: designing the architecture, building the pipelines, tuning performance, and getting governance right. In the past year I've picked up Microsoft Fabric hands-on - OneLake, DirectLake, Eventstream, Fabric Pipelines - and done real implementation work with Microsoft Purview for enterprise governance. On the side I've been building with LangChain, FAISS, and the Claude API, mostly to understand where Agentic AI fits into data platform work. DATABRICKS & DELTA LAKE - 6+ YEARS HANDS-ON

• Designed and built Medallion Architecture (Bronze / Silver / Gold) from scratch on both AWS and Azure - Auto Loader for ingestion, PySpark and Spark SQL for transformation, Gold-layer aggregations serving Power BI and Athena.

• Day-to-day work includes PySpark DataFrame API, window functions (ROW_NUMBER, RANK, LAG/LEAD), multi-table joins, UDFs, and broadcast joins across healthcare, pharma, education, and energy datasets.

• Set up Unity Catalog on new and migrated workspaces - catalog/schema hierarchy, GRANT/REVOKE SQL, row-level filters, and column masking for PII protection.

• Built Structured Streaming pipelines (readStream / writeStream / foreachBatch) consuming CDC events from Kinesis and Event Hubs into Delta Lake with exactly-once semantics.

• SCD Type 1 and Type 2 using Delta Lake MERGE INTO - full history with effective/expiry dates and is_current flag across master data domains.

• Performance tuning via Spark UI: shuffle bottlenecks, data skew, OPTIMIZE/ZORDER, executor memory, join strategy selection.

• Built Delta Live Tables pipelines with @dlt.table definitions and data quality expectations at each Medallion layer to quarantine bad records before Gold.

• Designed cross-cloud pipelines reading DynamoDB Streams via Kinesis in AWS Databricks, writing to Delta Lake on Azure - handling schema evolution and cross-cloud consistency. AZURE DATA FACTORY

• Designed enterprise ADF pipelines for end-to-end ERP ingestion - metadata-driven, parameterized frameworks to avoid repeating pipeline work for every new source.

• ADF + Databricks integration: passing runtime parameters dynamically to notebooks and jobs for complex transformation workloads.

• Incremental load patterns using watermark tables, change tracking, and last-modified timestamps for efficient ERP extraction.

• Managed Integration Runtimes (Azure IR and Self-Hosted IR) for on-premises ERP systems behind corporate firewalls.

• Error handling, retry logic, alerting, and monitoring via ADF Monitor and Azure Monitor. DATABRICKS DEPLOYMENT - TERRAFORM & AZURE CI/CD

• IaC using Terraform (Databricks provider) to deploy notebooks, DLT pipelines, and Workflow Jobs - modular structure with independently versioned modules per artefact type.

• Remote state via S3 backend with DynamoDB state locking for safe concurrent deployments across dev, staging, and production.

• DLT pipeline modules provision Delta Live Tables with configurable catalog, target schema, IAM instance profiles, and S3/SQS or Kinesis source connectivity.

• Job module creates Databricks Workflow Jobs, wiring DLT pipeline IDs as task dependencies with scheduled execution, failure notifications, and RBAC permissions.

• Environment-aware defaults - production gets high-throughput streaming settings; non-production uses conservative resource allocation automatically.

• Azure CI/CD via ARM templates for ADF, Databricks, ADLS Gen2, Event Hubs, Service Bus, Logic Apps, Functions, and Key Vault. Notebooks deployed via Databricks CLI inside Azure Pipelines. AGENTIC AI & RAG - HANDS-ON WORK

• Built a working RAG agent with LangChain - document loading, chunking, HuggingFace embeddings (all-MiniLM-L6-v2), FAISS vector store, and agent-driven retrieval. Runs against both Groq (Llama 3.1) and local Ollama (Llama 3.2) using the same codebase.

• Implemented the ReAct agent loop using llm.bind_tools - the agent decides each turn whether to search the knowledge base, answer directly, or summarize, with full conversation history passed back each time.

• MathSnake: personal side project combining a Snake game with an AI math tutor powered by the Claude API, aimed at Grades 4–7. Adjusts difficulty dynamically and explains answers in plain language - built mainly to get practical experience with the Claude API and prompt engineering. TECHNICAL SKILLS

Primary Stack Azure Databricks, AWS Databricks, Medallion Lakehouse, Unity Catalog, Delta Lake, PySpark, Python, SQL, Azure Synapse Analytics, Azure Data Factory, AWS Kinesis/S3/Glue/Athena, Microsoft Fabric, Microsoft Purview

Cloud - Azure ADF, ADLS Gen2, Azure Databricks, Synapse Analytics, Event Hubs, Service Bus, Logic Apps, Azure Functions, API Management, Key Vault, Microsoft Fabric, Purview, Azure DevOps Cloud - AWS Databricks, S3, Glue (ETL & Catalog), Athena, Kinesis, DynamoDB, Lambda, Step Functions, EventBridge, QuickSight, CloudWatch, SNS, SES, IAM, EC2, VPC Data Architecture Medallion (Bronze/Silver/Gold), Lakehouse, Data Warehouse, Star/Snowflake Schema, ERP Data Modeling, MDM, SCD Type 1/2, CDC

Governance Unity Catalog, Microsoft Purview, Row-Level Security, Column Masking, Dynamic Data Masking, ABAC, Azure AD, Databricks Secret Scopes

Streaming Databricks Structured Streaming, AWS Kinesis, Azure Event Hubs, DynamoDB Streams, Delta Live Tables, Auto Loader

Scripting & IaC PySpark, Python, Scala, Pandas, NumPy, PowerShell, YAML, Terraform, JSON Agentic AI / LLM LangChain, LangGraph, Agentic RAG, FAISS, ChromaDB, HuggingFace Embeddings, Groq API, Ollama, Claude API, ReAct pattern, Vector Search, Prompt Engineering Reporting Power BI, DirectLake, Synapse Serverless SQL, QuickSight Integration Microsoft BizTalk (2006 R2–2013 R2), Azure iPaaS, ESB Toolkit, BAM, BRE, WCF, REST API, GraphQL, OData

DevOps Azure DevOps, GitHub, Jenkins, Ansible, Terraform, Docker, ARM Templates Databases SQL Server 2005–2019, Azure SQL, Oracle, DynamoDB Formats Parquet, Delta, JSON, CSV, XML, XSLT, EDI, Flat Files CERTIFICATIONS

• DP-203 - Azure Data Engineer Associate

• Azure Solutions Architect Expert (AZ-300, AZ-301)

• MCTS: .NET Framework 2.0 - Web & Windows Applications

• 70-595: Developing Business Process and Integration Solutions - BizTalk Server 2010 WORK HISTORY

Organization Role Duration

IPivot Sr. Data Engineer / Data Architect Dec 2022 – Present Maestro Technologies Sr. Data Engineer / Data Architect Dec 2021 – Dec 2022 Tech Mahindra Ltd Technical Architect Mar 2021 – Dec 2021 Tech Mahindra Ltd Lead Associate Oct 2014 – Mar 2021 Infosys Ltd Technical Lead Jan 2011 – Oct 2014

PMAM IT Services Technology Specialist Oct 2009 – Jan 2011 A-J Technologies Sr. Software Engineer Apr 2007 – Oct 2009 Innovative Software Solutions Software Engineer May 2006 – Mar 2007 PROJECT PROFILE

NAEP - National Assessment of Educational Progress ETS (Education Testing Services), USA IPivot Team: 40 Duration: Dec 2022 – Present Stack: AWS Databricks, Delta Live Tables, Unity Catalog, S3, DynamoDB, Kinesis, Lambda, CloudWatch, EventBridge, Glue, Athena, QuickSight, SNS, Step Functions, Microsoft Fabric, Microsoft Purview ETS administers national-level assessments across thousands of US schools. I designed and delivered the cloud data platform on AWS Databricks end-to-end, and later led the evaluation of Microsoft Fabric and Purview as additional governance and analytics layers.

• Designed the full data platform - ingestion through Gold-layer aggregations powering QuickSight dashboards and Athena ad-hoc reporting.

• Implemented Medallion Architecture using Delta Live Tables with @dlt.table and data quality expectations at each layer to catch bad records before Gold.

• Configured Unity Catalog - catalog/schema hierarchy, team-level GRANT/REVOKE, and column masking to protect student PII.

• Real-time pipeline consuming DynamoDB change events via Kinesis into Structured Streaming, landing in Delta Lake with exactly-once semantics.

• SCD Type 2 for school and student master data using Delta Lake MERGE INTO with effective_date, expiry_date, and is_current tracking.

• Performance tuning: Spark UI analysis, OPTIMIZE/ZORDER on hot Delta tables, Auto Loader checkpoint optimization.

• Led hands-on Microsoft Fabric evaluation - assessed OneLake, DirectLake mode for Power BI, Fabric Pipelines as an ADF migration path, and Eventstream vs. Kinesis + Structured Streaming.

• Implemented Microsoft Purview for cataloging, lineage tracking, and sensitivity classification, integrated with Unity Catalog governance.

Microsoft Fabric & Purview - Enterprise Analytics Modernization IPivot (Internal / ETS Advisory) Team: 3 Duration: 2024 Stack: Microsoft Fabric, OneLake, Fabric Lakehouse, Fabric Data Warehouse, DirectLake, Eventstream, Fabric Pipelines, Microsoft Purview, Power BI

Hands-on evaluation of Microsoft Fabric as a unified analytics platform - understanding where it fits alongside existing Databricks and Synapse deployments, and standing up Purview as the governance layer.

• Evaluated OneLake, Fabric Lakehouse, and Fabric Data Warehouse - tested interoperability with existing Delta Lake tables.

• Benchmarked DirectLake mode for Power BI against Import and DirectQuery - DirectLake showed material query performance improvements on large national assessment datasets.

• Explored Eventstream as a streaming alternative to Kinesis + Databricks Structured Streaming for near real-time reporting.

• Evaluated Fabric Pipelines as an ADF migration pathway for ERP ingestion workloads - documented trade-offs and effort estimates.

• Connected Purview to both Fabric and Databricks Unity Catalog for cross-platform lineage and sensitivity classification.

• Produced a practical trade-off analysis comparing Fabric vs. Synapse + Databricks with migration recommendations tied to the enterprise roadmap.

RedRiver & SourceCode - IT Services Integration

Cerberus Technologies (Maestro), USA Team: 4 Duration: Dec 2021 – Dec 2022 Stack: Azure Data Factory, Azure Data Lake Storage, Docker, Azure Container Registry, REST APIs, GraphQL, PowerShell, Azure DevOps

Cerberus integrates newly acquired businesses into their IT ecosystem. The work here was mainly about pulling PO, Sales Order, and Inventory data from vendors - Cisco, Ingram, Synnex - into a unified platform.

• Metadata-driven ADF framework for vendor data ingestion - parameterized so new sources can be onboarded without writing new pipelines each time.

• Python REST API services connecting vendor systems, containerized with Docker and deployed through Azure Container Registry.

• SQL transformation scripts to reconcile RedRiver and SourceCode schemas into the Cerberus data model - preceded by data profiling to surface issues before building fixes.

• Azure DevOps CI/CD with environment-staged Docker deployments across dev, staging, and production. CHNA Analytics & Next Best Action (NBA)

GlaxoSmithKline (GSK), USA Lead Data Architect Team: 6 Duration: Aug 2019 – Dec 2021 Stack: Azure Databricks, ADF, SSIS, Logic Apps, ADLS Gen2, Service Bus, Azure Functions, Power BI, AWS S3, Azure SQL DW, GitHub, Azure DevOps, Google Analytics

GSK is a global pharma company. This project improved revenue management by integrating contract, revenue, and compliance data across manufacturers, wholesalers, trading partners, and pharmacies.

• PySpark ETL pipelines in Azure Databricks processing pharma contract, revenue, and compliance data from mixed file formats - incremental load via watermark timestamps.

• Data access managed through Databricks Table ACLs and Key Vault secret scopes - PII masking and hashing in PySpark notebooks for HIPAA compliance (Unity Catalog wasn't available at this point).

• Complex revenue and rebate calculations using PySpark window functions across manufacturers, wholesalers, and pharmacies, landing results in ADLS Gen2 and AWS S3.

• SCD Type 1 and Type 2 for trading partner master data using PySpark DataFrame operations on Hive-backed Parquet tables.

• Maintained SSIS packages bridging legacy on-premises flat-file ingestion with the cloud pipeline until full ADF migration was complete.

• Automated CI/CD via Azure DevOps; used Google Analytics data to support demand planning analytics. Chevron Oil and Gas - Enterprise Integration

Chevron, USA Technical Analyst / Data Architect Team: 13 Duration: Nov 2018 – Aug 2019 Stack: Azure Databricks, BizTalk Server, Logic Apps, Service Bus, Event Grid, ADF, Azure Functions, API Management, Jenkins, Ansible, Azure DevOps

• Built real-time streaming services for user session processing using PySpark and Spark SQL on Azure Databricks.

• Automated ETL processes - pipeline optimisation and parallelisation cut data processing time by roughly 40%.

• Implemented failover and disaster recovery for Chevron applications to maintain production availability.

• Integrated Databricks with SRA, SART, Credit, and ZEMA operational systems for unified analytics; BizTalk Server used for data flow orchestration.

EARLIER PROJECTS (2006 – 2018)

Logic Apps Lead - Survitec Group, UK May 2018 – Nov 2018

• Architected an Azure iPaaS integration framework using Logic Apps, Event Grid, Service Bus, Function Apps, and API Management for a global marine safety manufacturer. Logic Apps Lead - WSS Warehouse Management, Singapore Nov 2017 – May 2018

• Led integration of IFS ERP with Dynamics 365 WMS via Azure Logic Apps, Service Bus, and API Management for global shipping operations.

BizTalk SME & Migration - BaneDanmark Railway, Denmark Feb 2017 – Oct 2017

• Led migration of 22 BizTalk integrations across 30 systems for Denmark's national railway timetabling platform from on-premises to PaaS.

BizTalk Tech Lead - Wilhelmsen Shipping Services, Norway Jul 2016 – Jan 2017

• Led BizTalk development for global shipping financial transactions, supply chain, and inventory across 2,200 ports in 125 countries.

EDUCATION

Degree Specialization University Year

B.Tech Electrical & Electronics (EEE) JNTU 2006

Intermediate MPC BOI 2002

SSC Regular SSC Board 2000

PERSONAL DETAILS

Date of Birth: October 18, 1983 Languages: English, Telugu, Hindi



Contact this candidate