Senior Data Engineering & AI Infrastructure Lead

Location:

Carrollton, TX

Posted:

January 25, 2026

Contact this candidate

Resume:

Uday B

972-***-**** *********@*****.***

SUMMARY

Software Engineer with 10+ years of specialization in Data Engineering, AI/ML infrastructure, and large-scale distributed data systems, delivering cloud-native, fault-tolerant platforms under full SDLC and Agile frameworks.

Deep expertise in Software Development Life Cycle (SDLC) - requirements, design, coding, automated testing, CI/CD, deployment, and post-production observability - ensuring reliability, maintainability, and auditability of data platforms.

Designed and implemented multi-cloud lakehouse ecosystems across AWS (S3, Glue, Redshift, EMR, Athena, Kinesis, Lake Formation), GCP (BigQuery, Dataflow, Dataproc, Pub/Sub, Vertex AI, Dataform), and Azure (Synapse, ADF, Databricks, Event Hubs, Purview) using Delta Lake, Apache Iceberg, and Apache Hudi with ACID guarantees and cross-region replication.

Engineered petabyte-scale pipelines for batch and streaming workloads using Apache Spark (PySpark, Scala), Apache Flink, Ray Data, and Kafka Streams, orchestrated with Airflow, Prefect, and Dagster to achieve >5 TB/day throughput and exactly-once processing.

Built real-time ingestion and CDC frameworks with Kafka Connect, Debezium, AWS DMS, Fivetran, Airbyte, and StreamSets, streaming OLTP data from PostgreSQL, MySQL, and Oracle into Snowflake, BigQuery, and Synapse with schema evolution via Schema Registry and Protobuf/Avro contracts.

Automated ELT/ETL transformations through dbt (Core & Cloud), SQLMesh, and Dataform, embedding modular SQL, data testing, lineage graphs, and documentation synchronized with DataHub and Marquez/OpenLineage.

Applied advanced data-modeling techniques (Kimball, Data Vault 2.0, Data Mesh) to design domain-specific data marts and semantic layers optimized for analytics and AI.

Programmed data services and utilities in Python 3.11, SQL, Scala, Java, and Go, leveraging Pandas, Polars, NumPy, Dask, PySpark, and exposing REST/gRPC APIs for cross-service data access.

Containerized workloads with Docker and orchestrated via Kubernetes (EKS, GKE, AKS) using Helm and Argo Workflows for scalable Spark, Flink, and ML jobs with autoscaling and cost-aware resource allocation.

Provisioned reproducible multi-cloud infrastructure using Terraform + Terragrunt, Crossplane, and Pulumi, establishing network peering, IAM federation, secret management (KMS, Key Vault, Secret Manager), and policy enforcement through OPA, Azure Policy, and GCP Org Policies.

Established CI/CD and GitOps pipelines with GitHub Actions, Jenkins, CircleCI, and Argo CD, automating testing, build, and deployment of data and ML pipelines.

Delivered real-time analytics with Kinesis Data Analytics, Flink on Dataflow, and Materialize, enabling sub-second stream aggregations and reactive dashboards.

Engineered AI/ML feature pipelines and feature stores using Feast, Tecton, Vertex AI Feature Store, and SageMaker Feature Store, synchronizing offline parquet datasets and online Redis/Feast stores for inference consistency.

Implemented MLOps automation using Kubeflow Pipelines, Vertex AI Pipelines, SageMaker Pipelines, and MLflow, handling experiment tracking, model versioning, approval workflows, and model registry integrations.

Built vector-data and retrieval-augmented-generation (RAG) pipelines with LangChain, LlamaIndex, and OpenAI / Cohere / Anthropic embeddings, persisted in Pinecone, Weaviate, Milvus, Chroma, and Qdrant, and exposed through FastAPI + gRPC for semantic search and generative-AI inference.

Integrated data-versioning, lineage, and reproducibility using DVC, LakeFS, DataHub, Apache Atlas, and OpenLineage, linking datasets, pipeline runs, and ML artifacts for end-to-end traceability.

Enforced data quality and observability with Great Expectations, Soda Core, Monte Carlo, Bigeye, and Deequ, integrating validations in orchestration DAGs and surfacing metrics to Prometheus, Grafana, and Datadog.

Established comprehensive data governance and compliance using Collibra, Alation, AWS Lake Formation, Azure Purview, and GCP Data Catalog; implemented column-level security, tokenization, and masking with Apache Ranger and Macie / DLP API.

Designed and tuned warehouse and query performance using Snowflake micro-partitioning, BigQuery slot scheduling, Synapse materialized views, Z-ORDER clustering, and compression (ZSTD/Snappy) achieving 30-40 % cost savings.

Delivered synthetic-data generation pipelines using Gretel.ai, Mostly AI, and Synthea, validating statistical parity and privacy guarantees for ML training and testing.

Implemented model- and data-drift monitoring via Evidently AI, Arize AI, WhyLabs, and Neptune.ai, triggering automated retraining and redeployment through Pub/Sub + Airflow sensors.

Created real-time dashboards and federated queries using Apache Superset, Grafana Loki, Trino / Starburst, DuckDB, and ClickHouse, enabling low-latency analytics across Snowflake, BigQuery, and Synapse.

Adopted Data Mesh and Data Fabric principles—domain-owned data products, federated governance, shared metadata graph, and automated lineage via OpenMetadata—to scale decentralized data ownership.

Led and mentored cross-functional teams in data-centric SDLC, unit/integration testing, code reviews, performance tuning, Agile ceremonies, and architecture reviews across multi-region deployments.

Delivered resilient, secure, cost-optimized, and AI-ready cross-cloud data ecosystems powering analytics, predictive models, and generative-AI applications with measurable improvements in data latency, reliability, and business value.

TECHNICAL SKILLS

Programming & Scripting

Python 3.11, SQL, Scala, Java, Go, Bash, REST/gRPC API development, Pandas, Polars, Dask, NumPy, PySpark

Data Processing & Pipelines

Apache Spark (Structured Streaming), Apache Flink, Ray Data, Beam, Airflow, Prefect, Dagster, dbt (Core & Cloud), SQLMesh, Dataform

Cloud Platforms (Multi-Cloud)

AWS: S3, Glue, Redshift, Kinesis, EMR, Athena, Lake Formation, Bedrock

GCP: BigQuery, Dataflow, Dataproc, Pub/Sub, Vertex AI, PaLM API

Azure: Synapse, ADF, Databricks, Event Hubs, Purview, Azure OpenAI

Data Storage, Lakes & Warehousing

Delta Lake, Apache Iceberg, Apache Hudi, Snowflake, BigQuery, Redshift, Synapse, DuckDB, ClickHouse, MongoDB, Cassandra

Streaming & Ingestion

Apache Kafka, Kafka Connect, Debezium, Kinesis Data Streams, Pulsar, Redpanda, Materialize, AWS DMS, Airbyte, Fivetran, StreamSets

AI/ML, MLOps & LLMOps

MLflow, Kubeflow, Vertex AI Pipelines, SageMaker Pipelines, Feast, Tecton, Weights & Biases, Evidently AI, Arize AI, WhyLabs, LangChain, LlamaIndex, Pinecone, Weaviate, Milvus, Qdrant, Chroma

Governance, Quality & Observability

Great Expectations, Soda Core, Monte Carlo, Bigeye, Deequ, DataHub, OpenMetadata, Collibra, Alation, Atlas, Ranger, Lake Formation, Purview, OpenLineage, Marquez

Infrastructure, DevOps & Automation

Terraform, Pulumi, Crossplane, Docker, Kubernetes (EKS/GKE/AKS), Helm, Argo CD, Jenkins, GitHub Actions, CircleCI, Prometheus, Grafana, Datadog, OpenTelemetry

Analytics, BI & Data Activation

Looker, Power BI, Tableau, Superset, Mode, Hex.tech, Trino/Starburst, Hightouch, Census, RudderStack

Emerging & Supporting Technologies

Data Mesh, Data Fabric, Data Contracts (OpenAPI/Avro/Protobuf), Synthetic Data (Gretel.ai, Mostly AI), Streaming Lakehouse, DuckDB for local analytics, Cost/FinOps (Kubecost, CloudZero)

PROFESSIONAL EXPERIENCE

PEAC Solutions Mount Laurel, NJ

Software Engineer/ Data Engineer June 2023 – Present

Description: Driving PEAC Solutions’ digital transformation by architecting and delivering scalable data and AI platforms across Azure and Google Cloud. Spearheading the modernization of enterprise data ecosystems, integrating real-time data pipelines, lakehouse architectures, and MLOps frameworks that power advanced analytics, credit-risk modeling, and AI-driven customer insights. Leading cross-functional initiatives to unify data governance, improve data quality, and enable self-service analytics through Databricks, BigQuery, Synapse, and Vertex AI. Playing a pivotal role in transforming PEAC’s data operations into an AI-ready, real-time decision intelligence platform that enhances operational efficiency, compliance, and customer experience.

Roles & Responsibilties:

Architected and implemented a dual-cloud data platform spanning Azure Synapse, Azure Data Factory (ADF), Databricks, Azure Data Lake Storage Gen2 (ADLS2) and Google BigQuery, Dataflow (Apache Beam), Dataproc, and Pub/Sub, unified through Delta Lake and Apache Iceberg with ACID transactions, schema evolution, and cross-cloud replication via Storage Transfer Service.

Designed and developed end-to-end ETL/ELT pipelines using Apache Spark (PySpark/Scala) and Apache Flink for both batch and streaming workloads; orchestrated pipelines using Apache Airflow and Dagster, achieving 70% latency reduction and automated failure recovery.

Built data ingestion and change-data-capture (CDC) frameworks using Kafka Connect + Debezium, Pub/Sub, and Dataflow streaming pipelines (Apache Beam SDK) to replicate data from PostgreSQL, SQL Server, and Salesforce into BigQuery and Synapse with full schema and metadata propagation.

Designed and implemented data lakehouse zone architecture (raw curated semantic layers) leveraging ADLS2, GCS, Databricks Delta, and BigQuery external tables for unified query access via Trino / Starburst.

Automated data transformation workflows using dbt Core + Cloud, integrated with GitHub Actions for CI/CD, and enforced SQL testing, lineage, and documentation synchronized with DataHub + OpenLineage metadata lineage services.

Built real-time feature-engineering pipelines and feature stores using Feast, Tecton, Vertex AI Feature Store, and Databricks Feature Store, creating online and offline feature parity for machine learning and model retraining workflows.

Implemented MLOps automation across Vertex AI Pipelines, Kubeflow, MLflow, and Azure Machine Learning, enabling experiment tracking, model versioning, hyperparameter tuning, and scheduled retraining triggered by data drift metrics.

Developed retrieval-augmented-generation (RAG) pipelines using LangChain + LlamaIndex + OpenAI/Cohere embeddings, persisted vector representations in Pinecone, Weaviate, and Chroma, exposed via FastAPI microservices running on GKE / AKS for semantic document search and internal AI assistants.

Integrated data versioning and lineage tracking using DVC and LakeFS, linked to DataHub and Apache Atlas, ensuring reproducibility across dataset, model, and pipeline versions.

Enforced data quality and observability using Great Expectations, Soda Core, Monte Carlo, and Bigeye, embedding validation steps in Airflow DAGs; published operational metrics (throughput, latency, error rate) to Prometheus + Grafana dashboards.

Designed and implemented data governance and compliance frameworks following GDPR and SOC2 principles using Azure Purview, GCP Data Catalog, and Collibra; configured Key Vault + KMS for encryption and RBAC / AAD IAM for access control.

Built real-time analytics APIs with Trino / Starburst on top of BigQuery and Synapse, exposing aggregated financial metrics and KPIs through FastAPI endpoints integrated into internal applications.

Tuned warehouse query and compute performance by optimizing BigQuery slots, Synapse materialized views, Delta Z-Ordering, and Databricks Photon execution, improving query performance by ~45% while lowering cost.

Deployed containerized Spark, Flink, and ML workloads using Docker, orchestrated with Kubernetes (GKE + AKS), managed by Helm and Argo Workflows for autoscaling, canary deployments, and rollback safety.

Automated infrastructure provisioning via Terraform + Pulumi, configuring VPC peering, private endpoints, service accounts, and identity federation across Azure and GCP to enable secure cross-cloud data flow.

Built CI/CD and GitOps pipelines using GitHub Actions and Argo CD, integrating code linting, data testing, and automatic deployment of dbt packages, MLflow artifacts, and Airflow DAGs across multiple environments.

Established comprehensive observability and alerting systems via Datadog, Stackdriver, and Azure Monitor, integrating metrics from Dataflow, Databricks, and Synapse with PagerDuty notifications for proactive issue response.

Partnered with data scientists to operationalize credit-risk scoring and fraud-detection ML models, implementing data drift and concept drift monitoring using Evidently AI, Arize AI, and WhyLabs for continuous model improvement.

Spearheaded Data Mesh adoption—developed domain-owned data products with published contracts (OpenAPI / Avro / Protobuf), SLA definitions, and federated governance via OpenMetadata + DataHub, improving cross-domain collaboration and self-service analytics.

Modernized legacy on-prem ETL pipelines to serverless cloud-native Dataflow and ADF solutions, reducing infrastructure cost by 42% and increasing deployment cadence from quarterly to weekly.

Led cost-optimization and FinOps initiatives with Kubecost + CloudZero, implementing workload right-sizing, autoscaling, and slot-sharing strategies to reduce compute spend across GCP and Azure by ~30%.

Mentored junior data engineers on data-centric SDLC, IaC practices, modular data-pipeline design, and advanced SQL/DBT transformation techniques; implemented code review standards and performance benchmarking.

Collaborated cross-functionally with InfoSec, AI/ML, and Data Science teams to ensure data privacy, lineage transparency, and reproducible experimentation pipelines compliant with enterprise risk controls.

Delivered a robust, scalable, and secure multi-cloud (Azure + GCP) data platform supporting advanced analytics, real-time decision systems, and AI-powered applications across credit, finance, and customer experience domains.

ByteXL Hyderabad, India

Software Developer/ Data Engineer Nov 2018 – Dec 2021

Description: Contributed to the modernization of ByteXL’s data and analytics ecosystem by building scalable, hybrid-cloud data platforms across AWS and Azure. Developed and deployed real-time data pipelines, lakehouse architectures, and serverless ETL workflows integrating academic, engagement, and CRM data into unified repositories. Collaborated with cross-functional teams to enable AI-driven analytics, predictive modeling, and self-service BI, enhancing operational efficiency and data-driven decision making across the organization.

Roles & Responsibilties:

Designed and implemented scalable, fault-tolerant data pipelines in Python, SQL, and PySpark, automating the ingestion, transformation, and delivery of learning platform data using Apache Airflow and Azure Data Factory (ADF).

Built and optimized hybrid data architectures leveraging AWS (S3, Glue, Redshift, Lambda) and Azure (Synapse, Databricks, ADLS Gen2), integrating on-prem and cloud data into a unified analytics lakehouse supporting 10+ internal products.

Developed real-time streaming pipelines using Kafka, AWS Kinesis, and Azure Event Hubs, processing live engagement, course, and telemetry data to power dashboards and predictive models.

Engineered ETL/ELT transformations in dbt Core and PySpark, implementing modular SQL logic, dependency management, schema evolution, and automated testing; reduced manual data-prep time by 60%.

Designed and maintained dimensional, Data Vault, and denormalized data models to support analytical workloads and reporting requirements, improving downstream query performance by 40%.

Built Python-based ETL applications using Pandas, NumPy, and PySpark, handling 500M+ records daily with multi-threaded optimization and efficient memory management.

Partnered with data scientists to curate and deliver ML-ready datasets for student retention, engagement, and recommendation models, improving model accuracy by 20%.

Integrated Great Expectations and custom validation layers into Airflow DAGs to enforce schema and data-quality checks, with automated alerts routed through CloudWatch and Azure Monitor.

Deployed serverless data transformations using AWS Lambda and Azure Functions, achieving near-zero maintenance overhead and faster event-driven processing.

Implemented infrastructure as code (IaC) with Terraform to provision AWS and Azure data services, IAM roles, VPC peering, and security policies, ensuring environment consistency and automated disaster recovery.

Built CI/CD pipelines for data workflows using Jenkins and GitHub Actions, automating testing, container builds, and deployment rollouts to multiple environments.

Tuned data warehouse performance through Redshift sort and distribution keys, Synapse partitioning, and query-optimization techniques, reducing query runtime by 45%.

Delivered interactive BI dashboards in Tableau, Power BI, and Amazon QuickSight, integrating live data connections to Redshift and Synapse for real-time executive reporting.

Applied enterprise security and compliance standards using AWS IAM, Azure Active Directory, Key Vault, and KMS, ensuring encryption-at-rest, fine-grained RBAC, and regulatory adherence.

Migrated on-prem ETL jobs to cloud-native orchestration on Airflow + ADF, reducing infrastructure management effort by 40% and improving job reliability.

Deployed observability and alerting frameworks using Prometheus + Grafana for pipeline performance metrics and latency monitoring.

Implemented data lineage and cataloging via Apache Atlas and Azure Purview, improving transparency and auditability of core business metrics.

Collaborated cross-functionally with analytics, DevOps, and product teams to align data architecture with application development, ensuring low-latency API data delivery and near real-time reporting.

Conducted cost and performance optimization initiatives using AWS Cost Explorer and Azure Cost Management, reducing compute and storage expenses by ~25%.

Mentored junior developers on data engineering design patterns, version control (Git), code reviews, and Agile practices, strengthening overall team capability and delivery quality.

Techno Machine Tools Hyderabad, India

Junior Engineer/Data Analyst June 2014 – Oct 2018

Description: Supported the digital transformation of manufacturing operations by developing data analytics, automation, and reporting solutions across production, quality, and supply-chain functions. Contributed to building data pipelines, dashboards, and statistical models that improved decision-making, predictive maintenance, and production efficiency. Enabled the transition from manual, spreadsheet-based workflows to automated analytics and cloud-enabled data systems on Azure and AWS, enhancing accuracy, transparency, and speed of business insights.

Roles & Responsibilties:

Developed and maintained data-driven applications and analytics dashboards for production, quality, and inventory teams using Python, SQL, and .NET, improving reporting speed and data accuracy across departments.

Designed and optimized ETL workflows to collect and process manufacturing data from PLCs, ERP systems, and IoT-enabled sensors into centralized SQL Server and Azure SQL databases.

Created interactive dashboards and visual reports in Power BI and Tableau, providing real-time insights into production efficiency, downtime, and resource utilization for operations leadership.

Built data-processing scripts in Python (Pandas, NumPy) to automate data cleaning, aggregation, and KPI computation, reducing manual data preparation time by 70%.

Developed stored procedures, triggers, and views in SQL Server to streamline data extraction and integration across multiple production databases.

Assisted in the implementation of predictive maintenance analytics by preparing time-series datasets and statistical features for early fault detection and equipment health monitoring.

Supported the migration of legacy Excel-based reports to automated pipelines and BI dashboards, increasing reporting frequency from monthly to daily.

Worked with cross-functional teams to digitize quality control and material tracking workflows, integrating barcode and IoT data streams into operational analytics systems.

Implemented data validation, error-checking, and reconciliation processes, ensuring consistency between ERP data and shop-floor production logs.

Collaborated with IT and DevOps teams to deploy SQL Server Integration Services (SSIS) packages and Azure Data Factory (ADF) pipelines for secure, automated data transfer between on-prem and cloud systems.

Created ad hoc queries and data extracts to support decision making in procurement, production planning, and financial forecasting.

Developed internal automation scripts using VBA and Python to process invoices, sensor logs, and performance data, improving turnaround time by 40%.

Documented data flows, database schemas, and reporting logic as part of SDLC deliverables, ensuring transparency and maintainability of analytics systems.

Collaborated with business analysts to define KPIs, data definitions, and reporting standards, improving metric alignment across departments.

Participated in code reviews, Agile sprints, and unit testing, ensuring software reliability and continuous improvement of data-driven applications.

EDUCATION

Master of Science in Information Systems Technologies, April 2023

Wilmington University, New Castle, Delaware

Contact this candidate