Senior Python AI/ML Engineer (LLMs, MLOps, Cloud)

Location:

Houston, TX

Posted:

January 06, 2026

Contact this candidate

Resume:

Deepika Gude

***************@*****.***

PROFESSIONAL SUMMARY:

Innovative and results-driven Senior Python AI/ML Engineer with 3+ years of experience in designing, deploying, and scaling AI-powered solutions across healthcare, enterprise search, and cloud com- puting domains. Proven expertise in LLM-based applications (RAG, summarization, chatbots), MLOps lifecycle, and GPU-accelerated deep learning. Adept at bridging research and produc- tion using tools like TensorFlow, PyTorch, Hugging Face, ONNX, and LangChain. Cloud-native practitioner with experience on AWS, GCP, and Azure delivering robust, real-time systems that reduce latency, improve inference, and drive impact at scale.

TECHNICAL SKILLS:

LLMs & NLP: GPT-4, LLaMA, Claude, BERT, ColBERT, Hugging Face, LangChain, OpenAI, Tokenizers

ML/DL Frameworks: TensorFlow, PyTorch, Scikit-learn, Keras, FastText, ONNX, TensorRT

MLOps & DevOps: MLflow, Airflow, Jenkins, ArgoCD, GitHub Actions, Docker, Ku- bernetes, Helm

Cloud Platforms: AWS (SageMaker, Lambda, S3, DynamoDB, EKS), GCP (Vertex AI, BigQuery), Azure (OpenAI)

Data Engineering: PySpark, Apache Beam, Dask, Pandas, Kafka, AWS Glue, Redshift, Feature Store

Infra & Orchestration: Terraform, OpenShift, NVIDIA MIG, Triton Inference Server

APIs & Backend: FastAPI, Flask, Django, REST, GraphQL, JWT, OAuth2

Databases: PostgreSQL, MySQL, MongoDB, Redis, Pinecone, FAISS, Weaviate, Chro- maDB

Accelerators: CUDA, cuDNN, Jetson Nano, DeepStream SDK

PROFESSIONAL EXPERIENCE:

Role : AI ML Engineer Jan 2025 –Present

Perplexity AI,USA

Optimized and productionized RAG pipelines combining BM25, FAISS (HNSW), Col- BERT for LLM retrieval.

Built adaptive document chunking using sliding window + heuristics to enhance LLM context retention.

Fine-tuned LLMs and improved prompt engineering, integrating self-reflection for accu- racy boosts.

Automated data pipelines for training via Pandas + Python; deployed via SageMaker, Lambda, Redis.

Developed APIs for LLM serving via FastAPI; ensured RBAC + secure access across teams.

Enabled cloud-native scaling with Kubernetes, DynamoDB, S3; cut inference latency by 40%.

Established MLOps best practices—CI/CD with GitHub Actions, monitored with Prometheus

+ Grafana.

Led multi-team collaboration via GitHub (GitFlow), boosting delivery velocity and testing coverage.

Role : AI ML Engineer 2023 Oct- Dec 2024

NVIDIA, USA

Built high-performance AI models for real-time vision and search using PyTorch + CUDA.

Integrated Databricks + MLflow for ML lifecycle: training, tuning, monitoring, retraining.

Designed scalable AI on GPU clusters (DGX, EKS), deployed with Helm + Terraform automation.

Served models with Triton Inference Server across cloud/edge; reduced infra cost by 30%.

Used Jetson Nano, TensorRT, DeepStream SDK for autonomous/edge AI.

Created RAG workflows with Redis Vector Search, ChromaDB, Pinecone for enterprise Q&A bots.

Implemented MLOps pipelines with ArgoCD, AWS CodePipeline, improving deployment speed by 45%.

Role : Python Engineer 2021 June – 2023 Aug

Cognizant, India

Developed scalable APIs with Flask & GraphQL; reduced latency 25% via query opti- mization & caching.

Built ETL pipelines (Airflow + Pandas) for data ingestion & transformation.

Integrated PostgreSQL + MySQL with SQLAlchemy for transactional AI app data stores.

Engineered containerized deployments (Docker + Kubernetes) with AWS Lambda for backend workloads.

Designed and built modular backend services using Python and Flask, enabling secure data flow and real-time orchestration across systems with clean, maintainable code practices.

Developed and consumed RESTful APIs with token-based authentication, pagination, input validation, and robust error handling, delivering well-documented and secure endpoints.

Implemented secure API communication using OAuth2 and JWT, including custom Python middleware for token validation, ensuring compliance with internal security standards.

Analyzed DNS and network telemetry data using Pandas and JSON, extracting insights to identify latency issues and optimize routing behavior during testing.

Deployed services on AWS EC2 and used Amazon S3 for configuration/log storage, supporting rollback and behavior analysis as part of cloud-native deployment practices.

Automated CI/CD with Jenkins & GitHub Actions; enabled real-time testing & Docker- based rollout.

Implemented event-driven systems with Kafka, Celery, RabbitMQ for async data handling & scaling.

CAREER HIGHLIGHTS :

LLM Optimization & Deployment: Optimized RAG pipelines using BM25, FAISS, and ColBERT for efficient LLM retrieval at Perplexity AI, boosting factual accuracy and cutting inference latency by 40%.

Scalable MLOps Systems: Designed and deployed end-to-end MLOps pipelines using CI/CD (GitHub Actions, ArgoCD), SageMaker, and Kubernetes, enabling fast, reliable model rollout across cloud environments.

High-Performance Inference at Scale: Built production-grade transformer inference pipelines using PyTorch, TorchScript, and FBGEMM, achieving sub-60ms latency on mobile and edge platforms at Meta/NVIDIA.

Secure Backend Engineering: Developed robust, modular backend APIs using Python, Flask, and OAuth2/JWT, supporting real-time orchestration and secure communication between cloud-native microservices at Cognizant.

Real-Time Data Analytics: Engineered ETL pipelines and telemetry analysis systems using Apache Spark, Pandas, and Airflow, improving model training workflows and decision support across high-volume datasets.

Responsible AI & Compliance: Embedded SHAP, LIME, and Fairlearn into ML systems to ensure compliance with GDPR, KYC, and AML, integrating ethical AI practices into production models.

EDUCATION :

Master of Science in University of Arkansas at Little Rock, Little Rock, AR - 2025

Bachelor of Technology in, NRI of InstituteTechnology, Andhra Pradesh, India - 2022

Contact this candidate