Lead AI/ML Engineer Production ML & Platform Architect

Location:

United States

Posted:

February 12, 2026

Contact this candidate

Resume:

Jacob Roger White

Irving, Texas, United States

+1-945-***-**** ******************@*****.*** https://www.linkedin.com/in/jacob-white-a446a431/

Summary

Lead AI/ML Engineer with 10+ years delivering ML/AI to production and 15+ years of software engineering across startups and Fortune 500 enterprises. Strong background in model lifecycle management, monitoring, retraining pipelines, and compliance-ready deployments across AWS, Azure, and GCP. Architect and operate agentic, tool-using AI platforms, including RAG/GraphRAG pipelines and multi-agent systems with evaluation-gated CI/CD. Led AI deployments spanning conversational agents, document intelligence, and enterprise copilots on AWS, Azure, and GCP. Deep focus on LLM evaluation, grounding, observability, latency, and cost control in regulated environments. Former customer-facing technical lead. Fluent in Python (FastAPI, LangChain, TensorFlow) and modern JavaScript stacks (Node.js, React, Next.js, Angular).

PROFESSIONAL EXPERIENCE

Wells Fargo Apr 2022 - Present

Lead AI/ML Engineer, Boston Massachusetts

•Architected and led delivery of an enterprise agentic AI platform supporting RAG, GraphRAG, and multi-agent execution, implemented with LangGraph, AutoGen, LangChain, LlamaIndex, and MCP-compliant toolchains, reducing time-to-production by around 45% across internal AI use cases.

•Designed automated LLM evaluation and configuration search pipelines in Python that performed grid-search and ablation testing across retrieval strategies, chunking schemes, rerankers, embedding models, LLM backends, and agent topologies, optimizing for p95 latency, token cost, faithfulness, and task success rate.

•Partnered closely with compliance, risk, and governance teams to enforce audit-ready controls, documentation standards, and approval workflows.Developed production inference services using Python (FastAPI) and containerized deployments, ensuring scalability, logging, and observability.

•Led the design and delivery of distributed NLP and document intelligence pipelines leveraging Hugging Face Transformers and PySpark, enabling large-scale processing of unstructured financial documents and customer communications; improved document processing throughput by 40% and increased classification and routing accuracy by 25% across regulated enterprise workflows.

•Fine-tuned large language models for personalized communications, increasing open rates by 20% and reducing inference costs by 60% by transitioning from third-party APIs to in-house transformer deployments.Built a GraphRAG knowledge architecture that decomposed unstructured enterprise content into atomic propositions using LLM-based fact extraction, persisted as property graphs, and queried via hybrid vector and graph traversal, yielding 20-point improvements in grounding metrics.

•Engineered agent-to-agent (A2A) communication protocols with shared memory stores, event-based coordination, and tool arbitration, enabling autonomous collaboration between planning, retrieval, execution, and verification agents and increasing system throughput by around 40%.

•Designed and implemented structured LLM evaluation suites and automated benchmarking pipelines using Python, LangChain, LlamaIndex, and Hugging Face evaluation frameworks, supporting zero-shot, few-shot, and system-prompt configurations; integrated internal enterprise systems and workflow tools to enable scalable data enrichment, QA validation, and release gating for production AI systems.

•Implemented neuro-symbolic validation layers combining textual entailment models, satisfiability checks, and rule-based constraints to enforce grounding guarantees and reduce hallucinations by more than 80% compared to baseline RAG pipelines.

•Architected document intelligence systems combining vision-language models (VLMs), classical CV (layout detection, segmentation), OCR, and symbolic post-processing, achieving more than 90% reduction in manual document review and around 34% improvement in defect detection.

•Delivered enterprise-grade, compliance-aligned AI solutions using TensorFlow, PyTorch, and Hugging Face Transformers to support risk assessment, customer servicing, and decision-support use cases, improving operational efficiency and downstream decision quality by 35% while meeting internal security, governance, and audit requirements.

•Implemented automated hyperparameter optimization and cross-validation pipelines to standardize model selection against business-aligned performance metrics.Developed high-throughput ingestion and inference microservices using FastAPI, Java/Spring Boot, and gRPC, supporting asynchronous processing, idempotency, and horizontal scaling, compressing processing timelines from 7 months to under 30 days.

•Developed and operationalized classification and neural network models for churn prediction and demand generation, optimizing for recall and stability in imbalanced datasets.Implemented distributed data processing pipelines using PySpark, cloud-native batch/streaming services, and schema validation layers, supporting millions of records per day with fault tolerance and back-pressure handling.

•Contributed to the development of advanced analytics and ingestion functions for proprietary AI model training, improving predictive accuracy by 15% and reducing downstream maintenance costs by around 16%.

•Utilized Python, Go, Terraform, and cloud-native services to build scalable data collection and inference systems, supporting 24/7 production workloads with automated recovery and monitoring.

•Architected enterprise MLOps platforms using MLflow, Airflow, Kubeflow, managed model pipelines, and Terraform, enabling reproducible training, evaluation, deployment, and lineage tracking while reducing release failures by more than 50%.

•Led cloud-agnostic AI deployments across AWS, GCP, and Azure, leveraging managed GPU services, secure networking, and IAM to support OCR, NLP, predictive analytics, and agentic workflows, generating $10–12M in annualized business value.

•Integrated Azure Cognitive Services (OCR, document intelligence, vision, NLP, speech, and intent extraction) into enterprise workflows, reducing manual processing and accelerating document-based decisioning.Delivered production conversational and task-oriented AI agents with secure tool access and workflow automation, saving 40 hours per team per month, improving response times by 45%, and increasing task completion rates by 35%.

•Led development and production deployment of classical and ensemble ML models (tree-based, gradient boosting, neural networks, time-series forecasting) for pricing, churn, demand forecasting, and risk use cases, with automated hyperparameter optimization and Kubernetes-based serving.

•Spearheaded implementation of AI-driven recommendation systems, increasing user engagement metrics by 24% while aligning model behavior with human-centric success measures.

•Built hybrid search infrastructure integrating Pinecone, Weaviate, Azure AI Search, dense embeddings, keyword indexes, and rerankers, improving retrieval recall and grounding scores by 12–18 points across representative datasets.

•Designed and deployed production-grade model serving platforms using Kubernetes, TorchServe,NVIDIA Triton,and autoscaling policies, reducing development timelines by 2 months,increasing throughput by 40%, and cutting inference latency by 62%.

CoreLogic Apr 2019 - Apr 2022

Senior AI Engineer Irvine, CA

•Designed and deployed end-to-end RAG pipelines for document intelligence, integrating layout detection, OCR, PDF parsing, vector search, and reranking—improving document throughput by 30%.

•Implemented transfer learning and fine-tuning workflows for NLP and OCR models (BERT-family, transformer-based OCR) using LoRA-style PEFT techniques, improving text extraction accuracy by 14% and reducing missing-text errors by around 15% across domain-specific PDFs.

•Developed automated retraining pipelines using batch processing and scheduled workflows to maintain model performance over time.Ensured ML solutions met enterprise governance and security requirements, collaborating with platform and compliance stakeholders.

•Designed, trained, and deployed production regression, ensemble, and time-series forecasting models (linear models, tree-based ensembles, gradient boosting, ARIMA/ARIMAX) supporting pricing, forecasting, fraud, and risk analytics use cases.

•Built end-to-end retrieval-augmented NLP pipelines integrating layout detection, OCR (PaddleOCR/Tesseract), PDF parsing, vector search, and reranking, increasing document-processing throughput by 30%.

•Fine-tuned early-generation GPT-based personalization models for outbound communications, increasing email open rates by 21%, response rates by 3.8%, and conversions by 2.5%; later reduced inference costs by 60% by training and deploying an in-house transformer model.

•Implemented human-in-the-loop reinforcement learning and feedback-driven fine-tuning pipelines to refine translation and text normalization models, improving translation accuracy by around 26% in production evaluations.

•Designed and deployed containerized ML inference services using Docker, Kubernetes, TorchServe, and Vertex AI, supporting both online and batch prediction workloads with autoscaling and SLA guarantees.

•Built and productionized classification and neural network models (Random Forest, boosted trees, ANN) for churn prediction, demand generation, and customer segmentation, achieving recall up to 98.5% while controlling overfitting through cross-validation, resampling, and rigorous evaluation.

•Designed and implemented large-scale NLP evaluation and benchmarking pipelines using custom Python frameworks, FAISS-based similarity search, and traditional IR metrics (precision/recall, MRR), enabling systematic comparison of retrieval, summarization, and grounding strategies across millions of documents.Designed and implemented Kubernetes-based MLOps platforms using Metaflow, Airflow (MWAA), and SageMaker Pipelines, reducing model deployment time by 70% and enabling continuous training and validation.

•Led enterprise CI/CD modernization initiatives, migrating ML systems to AWS SageMaker and introducing governance, monitoring, and observability standards that improved production reliability and auditability.

•Owned a mission-critical multiclass classification system (scikit-learn), replacing a vendor model with an in-house solution incorporating data validation, automated retraining, and API deployment, reducing iteration cycles from over 2 weeks to 1 day while eliminating licensing costs.

•Developed a benchmarking framework and Python library for a proprietary change-point detection algorithm, improving interpretability and establishing internal performance baselines.

•Built high-throughput API services using Python, FastAPI, Java/Spring Boot, and gRPC, handling 100K+ daily requests with low latency and high availability.

•Engineered event-driven and streaming architectures using Kafka, Go microservices, and PostgreSQL, increasing concurrent processing capacity by more than 50% and improving fault tolerance.

•Designed ML workflows for fraud, risk, and forecasting use cases using AWS Step Functions, Spark/Scala batch scoring, and Python inference layers, supporting over 65K daily predictions with 99.9% uptime.

•Provisioned and managed cloud infrastructure using Terraform across AWS and GCP (VPCs, compute, Dataproc, Redis, Pub/Sub, storage, IAM, autoscaling), supporting secure, highly available ML platforms.

•Built AI-powered internal web applications integrating transformer-based NLP models and early-generation language APIs to automate analytics, document scoring, and decision-support workflows for internal stakeholders.

•Developed prompt- and template-based inference strategies for transformer language models, improving output consistency, scoring robustness, and evaluation reliability by approximately 40% across internal benchmarks.

•Collaborated cross-functionally with data science, product, and platform teams to deliver production-grade ML systems aligned with security, compliance, and scalability requirements.

Alsac St Jude Apr 2018 - Apr 2019

Software Engineer Greater Memphis Area

•Contributed to the architecture and development of an enterprise e-commerce platform supporting millions of users, focusing on frontend performance, backend services, and multi-tenant application patterns.

•Played a key role in modernizing a legacy ASP.NET MVC application, helping introduce early Blazor-based SPA patterns (including experimental WebAssembly builds), reducing page load times by 40%.

•Participated in a multi-phase migration from .NET Framework 4.8 to .NET Core, laying the groundwork for later upgrades to .NET 5+ while maintaining production uptime, refactoring substantial portions of the codebase, and maintaining production stability.

•Refactored and optimized long-running background workloads, assisting in migration from Azure Function Apps to Linux VM–based services, contributing to an overall 80% reduction in hosting costs.

•Developed and maintained cloud-native services on Microsoft Azure, working with App Services, Function Apps, Elastic SQL Pools, Storage Accounts, and networking components to ensure scalability and reliability.

•Built internal tooling using Azure SDK for .NET to automate provisioning and configuration of single-tenant client deployments, supporting dozens of customer environments.

•Collaborated with product, sales, and solutions teams to translate enterprise customer requirements into technical designs, supporting the successful onboarding of large and multinational clients.

•Implemented and maintained CI/CD pipelines using Azure DevOps, TeamCity, and Octopus Deploy, reducing manual deployment effort and improving release consistency.

•Actively participated in code reviews, testing, and release processes, contributing to improved system stability and a measurable reduction in post-release defects.

•Worked across frontend and backend layers, delivering features in C#, .NET Core, SQL Server, JavaScript, and React, while adhering to enterprise security and performance standards.

TCI Technology Consulting Inc Aug 2017 - Mar 2018

Consultant Louisville, Kentucky

•Designed and delivered PHP/Laravel–based e-commerce and analytics platforms with ad tracking, A/B testing, subscription billing, and member portals, reliably supporting thousands of concurrent users across multi-tenant environments.

•Engineered secure ETL workflows using PHP and Python to aggregate marketing and transaction data from Google Ads, Facebook Ads, Stripe, and PayPal, enabling faster campaign analytics and reducing decision turnaround time by 52%.

•Led frontend development within a Scrum team while supporting additional teams on UI architecture, building single-page applications using React (class-based components), Redux, JavaScript (ES6), and REST APIs, improving cross-team delivery consistency.

•Designed and implemented data-driven dashboards providing campaign KPIs, enrichment progress, and operational insights, reducing manual reporting effort by over 32% for sales and marketing stakeholders

•Contributed to a renewable-energy development data platform used by project developers and investors, delivering a project management UI using React, Redux, and Webpack, improving page load times and UI responsiveness.

•Delivered enterprise Java web applications across multiple client engagements using Java EE, Spring MVC, Spring Core, JDBC, JSP/Servlets, and JBoss EAP, supporting production systems with thousands of daily active users.

•Served as technical lead on a major Java EE application migration from Oracle WebLogic to JBoss, reducing annual application server licensing costs by 66% and improving application startup times by 28%.

•Conducted large-scale application security audits across legacy Java codebases, identifying and remediating over 320 vulnerabilities, including XSS, CSRF, SQL injection, insecure cookie handling, null dereferencing, and information leakage.

•Implemented Spring Security–based authentication and authorization, hardened session management, and secure input validation, reducing post-remediation security findings by 74.8% in follow-up scans.

•Refactored monolithic Java services into clean, layered Spring architectures, improving code maintainability and reducing average defect resolution time by 36%.

•Optimized JDBC data access layers and SQL queries, tuning connection pooling and query execution plans to improve backend response times by 40% under production load.

•Led remediation of static analysis findings (e.g., Fortify / SonarQube-style scans), enabling applications to pass third-party penetration testing and meet client security compliance requirements.

•Standardized build and deployment workflows using Maven and JBoss CLI, reducing manual deployment errors by 55% and improving consistency across development, QA, and production environments.

•Collaborated directly with client architects, QA teams, and security stakeholders to deliver fixes under aggressive consulting timelines, achieving 100% on-time delivery across assigned projects.

•Authored technical documentation and conducted handoff sessions for client engineering teams, reducing post-engagement support requests by 44%.

Global Business Solutions,Inc.(GBSI) Oct 2015 - Aug 2017

Senior Application Developer(.NET / Web) Greater Memphis Area

•Designed and developed enterprise web applications using ASP.NET MVC 5, implementing Repository, Unit of Work, and Dependency Injection patterns to improve maintainability, testability, and separation of concerns across the codebase.

•Built end-to-end application features from the ground up, owning controller logic, Razor views, service layers, and data-access components, contributing to the successful launch of a core business application.

•Enhanced and maintained multiple production web applications, delivering new features and resolving high-priority defects, improving system stability and reducing recurring production issues by 30%.

•Developed responsive UI components using HTML5, JavaScript (ES5/ES6), jQuery, and Razor, ensuring cross-browser compatibility and consistent user experience across enterprise clients.

•Implemented robust data-access layers using T-SQL and SQL Server, optimizing queries and stored procedures to reduce page load times and improve transactional performance by 25%.

•Established and expanded unit test coverage using .NET testing frameworks, improving regression detection and reducing post-release defects by 20%.

•Built automated end-to-end test suites with Selenium, validating critical user workflows and reducing manual QA effort while increasing release confidence.

•Collaborated closely with business analysts and stakeholders to translate requirements into technical designs, delivering features on schedule while meeting functional and non-functional requirements.

•Participated in source control, branching, and release workflows using Team Foundation Server (TFS), supporting parallel development across multiple applications and environments.

•Acted as a full-stack contributor across UI, backend, and database layers, accelerating feature delivery and reducing handoff friction between frontend and backend teams.

Engility/TASC Jan 2014 - Jun 2015

Software Engineer NC

Worked for Defense Information Systems Agency (DISA) as an onsite contractor. C++/CLI, ASP.NET, HTML, Javascript, WCF. Worked on a team that developed a web application to facilitate testing.

Department of Defense – US Army Electronic Proving Ground Jul 2009 - Jan 2014

Pathways Program Washington, DC

Started programming in VB.NET. In the .NET world I helped write code to interface with spectrum analyzers. Learned much about RF during this time. The team I was on regularly went to track down signals. Helped write code to interface with a TDOA multilateration system.

After a few years, I moved to mobile programming(Android) where I created and maintained a few apps. Finally, I maintained custom android ROMs based off of the Android open source project(AOSP) for several devices and versions of the AOSP.

Education

The University of Texas at Dallas 2022–2026

Doctorate, Artificial Intelligence

Georgia Institute of Technology 2016 – 2020

Master’s Degree, Computer Science

University of Arizona 2011 - 2015

Bachelor of Science, Mathematics and Computer Science

Cochise College 2005–2011

Associate’s Degree, Mathematics

Skills

Artificial Intelligence & Machine Learning: TensorFlow, PyTorch, Hugging Face Transformers, LlamaIndex, Semantic Kernel, CrewAI, Scikit-learn, NLP, Generative AI (GenAI), Retrieval-Augmented Generation (RAG), LangChain, Computer Vision, BERT, ONNX Runtime, Triton Inference Server, AutoML, TPOT, Benchmarking Frameworks, TTFT, SuperAnnotate, MLflow

Agentic & Multi-Agent Systems: Agentic AI, Multi-Agent Workflow Design, Autonomous Data Orchestration, Agentic Observability Systems, LLM Feedback Loops, Self-Healing Pipelines, AutoGen

MLOps & Model Management: Databricks, PySpark, MLflow, Model Serving, Monitoring & Observability (Prometheus, Grafana, ELK), Logging & Alerting Systems, CI/CD for ML, Infrastructure as Code, Cloud Platforms (AWS, GCP, Azure), Serverless Architectures, API Gateways

Cloud & DevOps: AWS (SageMaker, Bedrock, ECS, Lambda, CloudWatch), GCP (Vertex AI, Cloud Run, BigQuery), Azure (Cognitive Services, Functions), Docker, Kubernetes, Helm, GitHub Actions, GitOps, Cloud-Native Deployment, Performance Optimization

Software Engineering & Programming: Python, Go, C++, Node.js, TypeScript, Flask, FastAPI, Express, RESTful APIs, GraphQL, High-Concurrency Backends, Distributed Systems, Microservices Architecture

Data Engineering & Storage: ETL Pipelines, Data Warehousing, Schema Design, PostgreSQL, MongoDB, Redis, BigQuery, Vector Databases (Weaviate, Milvus), ElasticSearch, OpenSearch, Hybrid Search, Keyword Search, Data Lineage, Data Quality Pipelines, Secure Data Handling & Encryption

Frontend Engineering: React, Redux, Angular, Next.js, TypeScript, Material UI, Tailwind CSS, SPA Development, Context API, Data-Driven Dashboards

Security & Compliance: HIPAA Compliance, Identity & Access Management, Secure API Development, Vulnerability Assessment, Penetration Testing, Cloud Security Controls, Encryption Standards

Certifications

Computer Vision

Deep Reinforcement Learning

Contact this candidate