AI Architect and Cybersecurity

Location:

San Pablo, CA

Posted:

April 28, 2026

Contact this candidate

Resume:

Daemeon Reiydelle

Email: ********@*****.***

Phone: 415-***-****

Address: San Francisco, California (Berkeley area)

https://www.linkedin.com/in/daemeonreiydelle

I get called in when stuff just isn’t working as expected.

I am an Observability driven Site Reliability Engineer. I work on large, complex HPC and AI-HPC clusters on prem (Dell, HP), and in the cloud (AWS, Azure, Google, even some OCP and Alibaba): architect, implementation, and 3rd level troubleshooter. My work involves big data (petabyte, HPC and AI HPC data flows), HPC and AI enabled HPC, InfiniBand, HSE, from HP, Dell, Nvidia, converged infrastructures, etc.

I have been a developer, including the implementation of Hadoop at Yahoo, NAS/SAN virtualization at Network Appliance, part of the Kubernetes development team at Google (as the new layer on Borg and the Google Open Sourcing of same), SME for Nvidia kubernetes HPC (Bright/Base cluster manager, base & superpods), including installation validation (Dell), and performance optimization (Dell, Lucid, Bosch, Mercedes, and others). However, I no longer do pure dev work, more infra config/debug/optimization on prem and cloud: Nvidia stack, GCP Tensor stack, AWS, Terraform, K8S, MLFlow, Slerm/AirFlow, ML inference optimization, no/low schema & vector databases, high performance AI networking, GPU optimization, AI Cybersec of prod systems, etc.

Certifications

Accenture Smart Lean (Value Steam Mapping & Process Flow)

Agile Certified Practitioner (DSDM)

Agile SAFe

AWS Certified Solutions Architect

Azure Certifications: Azure Trainer, Azure Architect, Azure Security Engineer, Azure DevOps, Azure AI, Data Engineer, SAP HANA on Azure

Checkpoint Firewall Engineer

Cisco Certified Network Engineer

Cisco UCS/vBlock (Cisco + NetApp) Certified Engineer

Dell Nvidia DGX (H100, A100) PowerEdge foundation, AI and Infra Associate, Dell Reference Design for AI

DevOps (OneOps, Azure DevOps, GCP DevOps)

GCP Certification: Certified GCP Architect

HortonWorks Certified HDP & HDF Architect, Admin

Kubernetes CKA

Microsoft Azure Partner Program Certified Solution Architect

NVidia Partner Certified Associate: AI/DC, Mellanox/Infiniband, Accelerated Computing Fundamentals, HPC containers, Data Science Workflows, GenAI, GNN, RNN, Bright Cluster Manager, SuperPod(DGX)/BasePods, Accelerated Computing, BCM, Cuda

Network Appliance Certified Engineer

VMWare Certified Engineer

Citizenship: United States

Education: Bachelor of Science, CIS; Shaftesbury University, London, England, UK; 1991

Security Clearance: US Top Secret/EBI (expired)

Professional Experience

Pershing/Ringier JV November 2025 to January 2026

AI Ops Assessment

For a joint venture by an equity investments firm: support a Swiss partner brand management firm to provide AI enabled investment and guidance for luxury brand growth, rearchitecting a complex project into a deliverable one.

Assess AI cybersecurity risks (Data security, IP protection, exfiltration detection, etc.) when using shared AI services and identify issues with hallucination and data leakage in pilot AI Digital Twin (AWS Ireland), data security with architected shared AI services, identify issues with hallucination and data leakage. Support rearchitecture around common AI data pipelines, smaller scale digital assistants, improving AWS costs, GPU utilization, reduce hallucination, etc.

Lucid Motors May October 2025

Data Engineering: AI DataOps Architect Contract

Develop AI Cybersecurity posture, develop AI COE and GRC (AI): collateral, checklists, etc. Leverage ISO, NIST, EU AIA, OWASP, GDPR, et al for GRC processes (EU, US, AME), AI COE (with AI GRC).

Implement AI Data Ops Center of Execution: merging separate groups in Data Engineering into a cohesive AI Data COE. Transferring to Director of AI.

Set up Governance, Risk, & Compliance best practices, AI Cybersecurity standards (NIST AI, OWASP AI, ISO ISO 23894, ISO 42001, EU AI Act) in AI project risk & cybersecurity assessments. Then work with C-level and directors to implement GRC processes. Focus on compliance and risk acceptance for ADAS 3 and ADAS 4 level private and for hire vehicles. GRC, PII, Cybersec CI/CD, AI Training Data, SOX 2, etc.

Rearchitecture telematics to leverage AI in lambda in vehicle event processing. Infinite (divide by zero ;{) improvement in telematics event delivery.

Modernize the architecture of various in-production and under development AI applications’ data pipelines, with a focus on improved Observability, fix car vehicle digital twin architecture and data flows (also fixing observability issues both for vehicle eventing and for digital twin), identify needs to migrate from Python3.8 to 3.10, all the AI, DataOps, etc. libraries as the infrastructure had not been updated in 2+ years. Resolve configuration issues with Trino containers/helm charts, improve utilization and throughput ( Karpenter changes), etc. for the Trino, Spark, AI pods. Increase AI Observability (Cuda/ELK), of the AI teams.

AI Governance & COE: Established AI Cybersecurity posture and DataOps Center of Excellence, merging disparate data engineering teams and improving GPU farm utilization from <10% to 40%+: data pipeline improvements and training/inference improvements, k8s scheduling improvements, and use of smaller models. Leverage ISO, NIST, EU AIA, OWASP, build out the AI CyberSec COE.

Implement AI Data Ops Center of excellence: merging separate groups in Data Engineering into a cohesive AI Data COE. Transferring to Director of AI. Set up AI Cybersecurity standards (NIST AI, OWASP AI, ISO ISO 23894, ISO 42001, EU AI Act) in AI project risk & cybersecurity assessments.

Developed a scalable Jupyter based LLM infrastructure by modernizing existing low-code environments to support advanced AI applications, handling petabytes of IoT eventing data.

Utilization improvements and scaleout of high-performance computing infrastructures for automotive manufacturing and vehicle telematics, leveraging NVIDIA SuperPods (DGX) and Oracle Cloud for GPUs. (Nvidia joint venture)

Optimized HPC and GPU enabled HPC AI clusters (AWS) to improve performance for predictive analytics, event detection, and real-time AI agents, Cadence Virtuoso.

Support 10+% improvements in AI data pipelines & throughput (AWS) improvements in Siemens EDA, Pilot bring up of Dassault DigitalTwin (Catia), improvements to Siemens NX, Siemens MRP, SAP FI/CO, Service Now.

IOT, OT, ADAS: architect implementing best practices for AI enabled applications supporting the vehicles, ADAS 3 (Nvidia EGX/MGX DRIVE Hyperion w/Saudi KAUST, MobileEye), predictive maintenance, and supply chain AI for US and MEA manufacturing operations, on road vehicles, (OT, telemetry, analytics). Evangelize Observability to improve performance and scalability to focus investment in highest impact areas. AWS & OCI for high performance AI enabled computing infrastructures for automotive manufacturing, vehicle telematics, predictive analytics, and ADAS. Nvidia DRIVE AGX, RTI-DSS, simulation and training (K8S, Airflow, Nvidia DRIVE ADX ADAS 3 ADX, existing AV platform, Slerm)

Upgrade 10+ existing AI applications to Python3.10 from 3.8.

Kahn Ventures Oct 2024 Mar 2025

Consulting AI Architect Consultant

Assess AI startup pitch decks, focusing on data, and operational layers, including AIOps and DevOps, digital twin use cases, agentic and multi-agentic proposals.

Evaluate AI infrastructure strategies, cost optimizations, and security mitigations (EU AI, GDPR, PII, NIST/OWASP AI, etc.)

Assessed and advised on leveraging GPU-enabled infrastructure from providers like Lambda Labs and RunPod for scalable AI operations

TensorFlow, LangChain, MLFlow/ElasticSearch, Pandas, AI agents, Open AI, HuggingFace, SAP HANA, Oracle ERP, Service Now, Docker, Kubernetes, Airflow, Slurm, Python, HPE for AI/Nvidia Lepton Pilot JV, SuperPods(DGX), Run:AI, Mission Control, JFrog, AWS (CodeWhisper, Tranium, Bedrock, Q/Connect, Inferentia), GCP (Vertex, Gemini, AI HPC, Model/Garden Builder), Codeweaver, Linode, Banana, Lambda, RunPod, MLFlow for DevOps and baselined SRE/Observability, OpenAI, HuggingFace.

Dell Professional Services Sept 2023 Oct 2024

Chief AI Architect Consultant

AI Professional Service Practice: Develop the practice & execute: Develop Processes, training and customer collateral for AI Center of Excellence consulting (driven by Value Stream Mapping techniques from Accenture, Deloitte, Baine. Develop Unified Data architectures, explain and evangelize Bias and Variance processes, capabilities, readiness assessments, offerings, lead presales engagements (architect/Practice Lead) engage at CIO/CTO Level, deliver (ongoing) several engagements; GenAI, Hybrid Cloud, Big Data for ML/GenAI. Develop the Digital Human /Social Human Capital models for understanding transformations and risk, including hybrid SAP BW/HANA, S4/HANA, Oracle Cloud Apps, Service Now.

Developed AI modernization practices with a heavy focus on infrastructure readiness assessments for GenAI and LLM applications, including a go-to-market strategy for Dell/NVIDIA BasePod/SuperPod (DGX).

Support Nvidia partnership, support in-flight and presales engagements (Dell A100/H100 DGX BasePod/Nvidia SuperPod).

Led presales engagements and technical implementations across multiple industries.

Developed reference implementations and white papers for Nvidia NIM and Hugging Face HUGS ecosystem (GenAI, Image Tagging, including SRE Observability best practices for NIM configurations, MLFlow/Elasticsearch/Logstash/Kibana enablement in NIM containers, etc.

AI DevSecOps: AI Cybersecurity/Ethical/Reliability assessments: bringing my experience with actual client AI application issues, develop the assessments with Dell Delivery AI teams and execute (ISO 42001, OWASP AI, NIST AI V2, EU AI Act, GDPR, PII, HIPAA, etc.): Support emerging hardware platforms and readiness assessments: NVidia AI Enterprise, AMD MI300/300X. Full stack Kubernetes with Run:AI. Heavy focus on data (Bias, distribution, normalization, synthetic data, etc.). Health care, banking, sports medicine. Customer Demos (and support other teams having performance issues in their demos): NVIDIA NVAIE, Triton Inference Server, NeMo, etc., and AMDs ROCm frameworks.

AI DevSecOps/Tanzu (VMWare Tanzu Labs): Tanzu for Nvidia AI DataOps NIMs: Tanzu Kubernetes Grid (TKG), tune client’s Spring apps, CI/CD security (Tanzu App Catalog), Tanzu AI Solutions (DevOps/MLOps), SRE/Observability improvements for clients with pre-configured K8S extensions to Tanzu Cloud Health.

Delivering training classes: (Internal) Nvidia Enterprise AI for Dell, Dell Validated Design: GenAI Clusters in the Data Center (Nvidia EAI on Dell); Dell Data Mesh (DataBricks) for semi- image- and un-structured data, labelling, tagging, etc. processes for GenAI, AI Data and ML data utilizations and gotchas, etc.; Dell Validated Design: Data Wrangling for GenAI; (Internal) special topics for DVD for Nvidia (Hugging Face: Mistral, Run:AI; RAG w/Feedback (RLHF vs. RHAIF), DPO, Imitation Learning with DVD); Converted Data Centers: GenAI in the converged, hyperconverged, and virtualized data center; Cybersecurity for Generative AI; Digital Human: Implications for GRC, cybersecurity, and business processes.

Dell AI Factory: R760xa, through XE9680A/Dell AI Factory with Nvidia, Nvidia BCM, Nvidia Unified Fabric Manager (UFM)/Adaptive Routing/UFM Subnet Manager, etc.; RHEL AI: OpenShift AI, IBM OS Granite LLM, Large Scale Alignment for Bots. (LAB), IBM/REL InstructLab; NVidia Omniverse OVX; PyTorch, Llama 3, Pinecone DB, Pandas, NumPy, NextData, DataBricks, NER/Topic extraction, SpaCy, BigPanda, HuggingFace, MLFlow, Kubeflow, OpenAI API, etc. Support clients with IoT data volumes for AI Predictive Analytics, event detection, AI optimizations: Smart Grid, Power distribution, solar/wind load optimizations; Real time human sports kinetics, etc. at petabyte scale. Installation/configuration/alerts setup in Splunk, DataDog, ELK

Installation, tuning, and upgrades client AI infrastructures: Databricks, Hugging Face, Nvidia NeMo, NIS, Nvidia Base cluster manager as part of hardware and solutions sales: petrochemical, pharma, banking, higher education, etc.: support AIOps teams porting applications (image, pharma, RAG). BCM installations and upgrades. 7+ clients. Work on various POCs (Emory SPARC medicine, University of Tennessee – student outcomes, City of Austin, George Bush Airport modernization, etc.)

MLOps/AIOps: Install, configure, support AI engineers: Dell Nvidia clusters (up to 200 H100 GPUs), Nvidia UFM (Unified Fabric Manager), InfiniBand setup and testing (5-20 Dell node clusters, XE9680s thru XE8640s) internally and at client sites (3rd level support), Kubernetes, Bright Cluster Manager, NeMo, Nvidia Inference Microservices/NI, Cephs, Slerm. Architect data meshes (unstructured/semi-structured) for multiple clients (health care, medical schools, manufacturer, midsized airport (DHS pilot), etc. Hybrid cloud and private cloud solutions (AWS, Azure, some GCP), Master data management, end-to-end governance (including reputational, legal, financial risk). Work with implementation teams to migrate, add semantic content to multi-terabyte scale data (Hadoop, Cell tower IoT events, images, kinetic sports medicine modeling, semantic chunking, optimize AI stacks through UX, reduce UX response times, improve AI tech stack utilization, etc. in Kubernetes (Nvidia Base, GKS, AKS, EKS), Slurm, Jupyter, etc. NiFi, Kafka, ADS, Databricks, Grafana, ServiceNow/SAP, etc.

Perform AI stack (and task) tuning for multiple client’s deliveries: stabilize/tune scalable model pretraining workflows, debug/resolve bottlenecks as 3rd level Dell AI customer facing support. Architecture, implementation, and architectural issues transformer networks like LLAMA2, FALCON, MIXTRAL, T5. GenAI emerging issues: e.g., data quality, bias, explainability, training & tuning, data tagging, chunking, model degeneration, hallucination, overreliance.

Dell/Nvidia Presales, POC, MVP: EU Truck manufacturer IoT, ADAS events (training, regression testing); various sports medicine data storage and AI training/tuning processes, Dell Data Mesh (DataBricks) setup and flows, NiFi/Kafka optimizations; Banking credit card data storage and ML training/tuning; Cell phone 5G MEC and 4G/5G IoT event data flows (ML Predictives at scale)

Support internal POCS’ and client engagements of data, AI, and Ops layers per Nvidia BasePod architecture standards (K8S, BCM, DGX, Slerm, BCM Data Mesh, etc.): installation, tuning, and upgrades client AI infrastructures: Databricks/HuggingFace/OpenAI/Nvidia NeMo, NIS, Bright/Base cluster manager as part of hardware and solutions sales: petrochemical, pharma, banking, higher education, etc.: support AIOps teams porting applications (image, pharma, RAG). BCM installations and upgrades. 7+ clients. Work on various POCs with Databrick’s Mosaic AI team (Emory SPARC medicine, University of Tennessee – student outcomes)

Support presales and engagemente POCS’ with Nvidia to showcase Nvidia BasePod architecture and solutions (Nvidia prebuilt solutions (NIMS), K8S/BCM, DGX, Slerm, BCM Data Mesh, etc.). Beta Nvidia DXC Accelerated Computing (BasePods). GRC AI data and ML/AI for hybrid cloud engagements.

Nvidia BasePods, Nvidia Superpods (DXC), Nvidia BCM, Nvidia Accelerated Computing, Lambda, RunPod, MLFlow (specific offering for MLOps Observability, DataBricks, Run:AI (pre-acquisition), DataBricks Mosaic & Data Mesh, BCM DataMesh, Base/Bright Cluster Manager/Kubernetes, OWASP, NIST, ISO, EUAIA. OpenAI, HuggingFace. Dell AI Factory: R760xa, through XE9680A/Dell AI Factory with Nvidia, Nvidia BCM, Nvidia Unified Fabric Manager (UFM)/Adaptive Routing/UFM Subnet Manager, etc.; RHEL AI: OpenShift AI, IBM OS Granite LLM, Large Scale Alignment for Bots. (LAB), IBM/REL InstructLab; NVidia Omniverse OVX; PyTorch, Llama 3, Pinecone DB, Pandas, NumPy, NextData, DataBricks, NER/Topic extraction, SpaCy, BigPanda, HuggingFace, MLFlow, Kubeflow, OpenAI API, etc. Nvidia DGX/EGX/MGX.

Key Engagements

Bank (US, EU): Nvidia EAI/NIM setup, Ethical/Reliable AI Life Cycle assessment, AI Cybersecurity assessments

Bank (US/EU): Nvidia EAI/NIM setup, AI Cybersecurity assessments

Cellular provide (EU): Dell/Nvidia Clusters for GenAI in the 5G MEC.

Cloud provider (US) AI acquisition

Health Services: Medical School radiology AI, sports medicine AI, major US Health solutions & pharmacy, DOD health facility

Insurance (US): GenAI Cybersecurity assessments, Ethical/Reliable AI Governance in the life cycle assessments, etc.

Energy sector consulting firm: Support the buildout of client’s services modeling infrastructure to support 100 16 Nvidia GPU nodes: Nvidia BCM, Kubernetes, Slerm. Tuning, assist AIOps teams in migrating to current NeMo, etc.

Military medical facility: architect shared service model for advanced AI, Digital Assistant process enhancements.

Petroleum (US): GenAI Cybersecurity assessments (Dell/Nvidia AIE/NeMo/NIM setup)

Pharma (US): GenAI Digital Humans for health care professionals (AI Cybersecurity Assessment, Nvidia AIE/NeMo/NIM setup)

Pharma (US): GenAI for LIMS (AI Cybersecurity Assessment), AI Stack improvements

Truck manufacturer (EU) (AI Cybersecurity Assessment, Nvidia AIE/NeMo/NIM setup)

Anthropomorphics Inc. 2019 2023

AI SRE/Architect Employee

Provide expertise in SRE Optimization driven AI (GPU HPC, Kubernetes) Architecture, AI SRE, AI Cybersecurity assessments, AI SOWs/POCs/MVPs to a variety of consulting firms for their clients, supporting the major Python AI Stacks (TensorFlow, PyTorch, Keras, MLFlow, etc.)

Re-architected an open-source big data platform to a hybrid cloud SaaS model to support generative AI and LLM-specific data best practices

Architected and managed Kubernetes GPU clusters (100+ GPUs) to support large-scale LLM models for telcos, including 5G and IIoT use cases

Led efforts in network optimization using technologies like InfiniBand and RDMA to enhance GPU performance and reduce bottlenecks across distributed training systems

Partnered with Microsoft and OpenAI to enable LLM container deployments on Azure Edge AKS

Leveraged Kubernetes and cloud-native technologies (AWS, Azure, GCP) for provisioning, monitoring, and optimizing GPU clusters for predictive analytics and AI workloads.

Ligadata

Practice Modernization June Sept 2023

Work with CIO to develop AI Center of Excellence. Taught management Value Stream Mapping techniques to add customer value. ML Predictive analytics extensions to BI, GenAI for problem resolution support for telcos in emerging markets: Work with CTO and team to rearchitect the existing open-source big data platform (Hadoop, Hive, Kafka) to hybrid Cloud SAAS (AWS, Azure, GCP) around GenAI Data specific MLOps best practices, unstructured and semi-structured big data, support Databricks partnership in AME. Identify opportunities around 5G IoT, continued support for 4G IoT, IIOS, MFG 4.0 leveraging current SAAS ML pipelines. Work with CEO and Heads of Sales teams on engagements in Dubai, UAE, Nigeria, Egypt. Extend existing predictive analytics and chat diagnostic RBESs to leverage generative AI over new data. Identify and resolve architectural and performance related issues across the client base (3rd level support). Hadoop optimization (HBase, Hive). Enhancements to ELK, Open Telemetry for observability improvements.

RPF responses and SOWs for AI modernization initiatives for telecoms in emerging markets, transitioning big data platforms to hybrid cloud SaaS. Incredible sales conversion rate improvements after sales teams began leveraging simple value stream mapping.

Worked with leadership teams and 2 key clients leveraging Value Stream Mapping to develop focused improvements to SAAS ML pipelines (Tensor, MLFlow) for 5G, IIoT, and Manufacturing 4.0.

Partnered with Microsoft/OpenAI to enable OpenAI container deployment on Azure Edge AKS, simple multi-agent collaboration (Cell comms predictive analytics)

Architected distributed AI/ML pipelines and managed GPU clusters (100+ GPUs) supporting large-scale ML models for telcos in emerging markets, including 5G and IIoT use cases.

Utilized Kubernetes and cloud-native technologies (AWS, Azure) for provisioning, monitoring, and optimizing GPU clusters for predictive analytics and AI workloads.

Led efforts in network optimization (VXLAN, fat-tree architecture) to enhance GPU performance and reduce network-related bottlenecks across distributed training systems.

Training sessions for sales and technical teams: hybrid cloud big data, best practices for migrations, etc.

Presales advisory for 3 Ligadata client’s AI cloud modernizations

GCP Edge, Azure Edge, AKS, GKS, Apache Kubernetes, HBase, HQL, Hadoop, BigData, BigQuery, AI Predictive Analytics, Generative AI/GenAI, NAG, LLM, Kubernetes, GPU scheduler, Nvidia Enterprise AI, Terraform, ETL via NiFi/Kafka, Elasticsearch

Google

Edge Compute SRE/Observability Architect Nov 2022 June 2023

Supported AI and AIOps integration in Google Cloud for major telecom providers (Jio, Deutsche Telekom, Telus, TIM). Supported engagement management by teaching and using value stream mapping techniques for SOW’s. Obtained various Google Cloud and Cloud Edge training and certifications.

Assisted in onboarding 5G Core Network Functions to Kubernetes (GDCE) for enhanced automation and scalability, leveraging mult-AI collaboration architectures for 5g digital twins.

Validated and stress-tested Google’s internally developed AI predictive analytics for telco analytics, integrating Vertex AI Cloud to Edge for Telco GPU enabled GKS clusters.

Technical integration engineer (DevOps/AIOps) supporting Google Cloud Telco (install, tune, optimize) in pilots for Ericsson, Nokia, Casa: onboarding of their 5G Core Network Functions to GDCE Kubernetes for various telco providers (Jio, Deutche Telekom + T-Mobile, Telus, TIM (Italy, Brazil)) in 5G Core on Kubernetes (GCP Edge), supporting 5G RAN test integration and Spirent RAN load testing. Validating, load & stress testing (simulating 1-2Tb/sec of multi-node IoT eventing, CI/CD automation into Google Distributed Cloud Edge (GDCE) and hybrid cloud (GDCE/GKE): Provide security, k8s configurations for NF networking, Mellanox NICs, performance tuning, Helm chart support: Google Telco Solutions, Kubernetes, NF Networking, GCP, ELK, Grafana, integration with Open Telemetry.

Telco Analytics Solutions: support integration (testing) Google’s Vertex AI Cloud (GenAI RAG) to Edge for Telco RAN/MIMO and MEC optimization, training pipelines via DataProc & DataFlow on BigTables/DataProc (similar to HBase/Hadoop).

Support Cybersecurity assessments of Google Telco Solutions (Telco Edge): Google SAIF, OWASP AI, NIST AI, GDPR, EUAIA. Responsible for K8S and MEC components.

Western Digital/SanDisk

SRE/O April Nov 2022

Lead SRE/DevSecOps/Observability: support the migration from Red Hat Open Shift (Kubernetes) to hybrid Google Anthos Kubernetes for VMware (On Prem) + AWS EKS, with continued support of a specialized OpenShift cluster running DPDK – Data Plane Development Kit for shared NVidia GPUs (and debugging out of vector issues due to pod affinity defaults for very large nodes). Improving AI driven manufacturing insights & decision making, support the stabilization and migration of additional applications from VMWare: application (re)architecture, technology: buildout of Center of Excellence for Public/Hybrid/Private cloud, extending Splunk reporting and dashboards for reactive unexpected event (including security) event reporting.

AI Architect: work with several fab line teams to improve image and flow processing custom trained LLM’s, Multi-AI collaboration for a fab line digital twin.

AIOps: Rearchitect virtualized (VMWare) GKE GPU HPC environment: set up MLFlow tracking to central tracking, identified causes of large model snapshot/folding timeouts, excessive data copying (shared storage/server, GCP/Azure cloud). Blue/green, canaries, etc. via MLFlow (and added model registry to check-ins) to improve throughput, worked with the Keras/MLFlow based teams to deliver 3x improvement in GPU utilization, 2x faster image recognition, observability improvements in client’s fabs. Reduced fab error rates by 3-7%, increased accuracy and utilization of cell test system by 4x, reduced expedited shipment delivery costs by 2x.

Technical architect/deployment support: supporting new and existing AIOps/IIoT/Digital Twin deployments to Fab data centers (GKE Anthos/VMware) Worldwide; DataIQ integration; Cloudera (CDP/CDF) on Kubernetes, Architect new AI systems into Kubernetes, Rancher, Confluence Kafka, NiFi, DataIQ, CDP Spark, NVidia VMware nodes. Support Looker ETL and query performance improvements.

Performance/throughput improvements in Cadence (Airflow and Infra) flows, support beta Cadence AI Virtuoso efforts for AI enabled EDA (GPU enabled EDA). Bring up of Ansys, work with design engineers on performance and tuning.

Observability – Security in Depth: Improve security posture of global private clouds (China, Thailand, Israel, US, India) - Kubernetes active intrusion detection, enable TLS, mesh, etc. Improved Splunk performance to resolve issues with IIOT image processing AI (Spark NVidia Tensor, 1200 to 200mscec improvement, increased ML parms), K8S Observability COE Chief Architect: DataIQ Integration, GCP, GCP Anthos Private Cloud for VMware, OpenShift Container Platform to Portworx, Anthos managed AWS EKS + GKS, AWS CI/CD Jenkins Pipelines, OpenShift/Anthos/Kubernetes/Container/Networking/NVidia/VMware vSphere. Hashicorp Vault, Active Intrusion Detection support, CI/CD code quality/Software Supply Chain Vulnerability testing, evaluate various security posture products (Qualys, BeyondTrust, Palo Alto Networks, Forcepoint, Proactive & reactive IDSs, etc.). AWS IAM, CloudWatch/Trails, Splunk, integration for EKS, Redshift, Cilium. External vendor (Workday, SAP, Oracle ERP) security and application integration technical architectures

Deep dive into complex technical issues affecting stability, scalability, security, stability. Lead relationship (technical) between Western Digital and Portworx, Google Support, F5, Palo Alto Networks, Mandiant, AWS Cilium, Cloudnetics.

Own the technical relationships with AWS, Google, Portworx, RedHat (Rancher, OpenShift) for all operational, technical and architectural asks (FAAS, SAAS, some IAAS/PAAS).

For the test cell analytics AIOps (ML feature detection) team(s), resolved scalability and performance issues around applications running in the pods, integrating to Vertex AI, help team to optimize containers, code, work with GCloud to identify Kubelet configuration issues, Linux kernel (CNI, Contrack, & NAT) issues, improve training, improve Looker performance, etc.

Improve application responsiveness by enhancing Kubernetes operators for AIOps training, inference exception eventing, monitoring, and complex Spark/Kafka/NiFi ELT needs.

AIOps improvements for GKE (TensorFlow: TFUs & GPUs; VMWare/GKE, HP HPC), NiFi, Kafka, Hadoop/HBase, MongoDB, Elasticsearch for predictive analytics and event driven ML inference for various digital twin applications. Extensive buildout of ELK and Grafana.

11 Fabs (3 US, Israel, India, 2 Thailand, 2 China, 2 Japan) with collocated data centers, running 12 clusters of Anthos Kubernetes (GKE), ~2k K8S nodes, 1M pods, support subset of global business apps running in AWS and GCP, moving to Onprem Anthos, with global ML primarily in GCP, and general business in AWS: except for Fab based operations collocated due to 1-2 petabytes of daily data per fab, client is multicloud native (no data centers). VMware, Rancher, Portworx, EMC, NetApp. Spark, NiFi, Snowflake, DataBricks, DataIQ, Bitbucket, AWS EKS, GCP GKE, GKE Anthos VMWare, De-scheduler, Envoy, Prometheus, Splunk, CloudStrike, Grafana, ELK, Goldilocks, Fairwinds, Java, Go, Python, TensorFlow, VMWare, GKS Tesla GPU pod scheduling, Aero, Bitbucket, Artifactory, Airflow, Kafka (Confluent), Bitnami, AWS Redshift, Snap Logic (ERTL), EMR, Jenkins, Spinnaker, MongoDB, PostgreSQL, MySQL, Elasticsearch, AWS Redshift, Cilium, Pega (Supply Chain Analytics), EKS.

Optimized virtualized HPC environments, improving GPU utilization by 3x and reducing fab error rates by 3-7%, EDA Slurm/Airflow completion times by 20%, etc.

Assisted team to enable SRE/Observability (Nvidia Cuda stack) and deployed AI-driven predictive analytics for manufacturing process optimization. Significant reduction in test cell loading.

Integrated AIOps solutions across global data centers, running large-scale Kubernetes clusters with multi-cloud AI workflows.

Pacific Coast Partners

Advisory Jan Mar 2022

With VC round funding cutback strategies in place, perform assessments of key portfolio investments for minimum viable product next generation assurance

Guide Cybersecurity posture improvements to include OSASP 100, CISA KEV Catalog, 100 static and “significant” dynamic code coverage. SRE SLO/SLI/Observability, etc.

Strategies for incorporating FAAS/DBAS/SAAS to assure focused MVP pipelines: feature delivery, security in depth via DevSecOps, automated CI/CD with 100% code/vulnerability validation, Supply Chain BOM, Big Data alternatives (Presto vs. Snowflake), etc.

Microsoft Azure Professional Services

SME Oct 2019 Nov 2020

Initially SME for Open-Source Big Data ecosystem (DSS, AI/ML) on Azure, part of team that implemented AKS, then supporting AKS engagements: Digital transformation of 5500 applications (75,000 systems) to Azure. AKS (Kubernetes) Subject Matter Expert team, heavy focus on securing Kubernetes/containers. Architect 400 remote clusters of Remote Kubernetes (5G MasterCore/EdgeCompute), K8S real time intrusion detection, SME supporting the migration to AKS (Terraform + Helm via DevOps pipelines – ML/DevOps) of

Contact this candidate