Data Science Engineer with AI & ML Deployment

Location:

Washington, DC

Posted:

January 24, 2026

Contact this candidate

Resume:

Lavanya Shankar

******************@*****.*** +1-704-***-**** LinkedIn GitHub Research Portfolio Education

Johns Hopkins University Baltimore, MD

Master of Science in Engineering in Data Science CGPA: 3.9/4 Aug 2023 – May 2025 Visvesvaraya Technological University Bengaluru, India Bachelor of Engineering in Computer Science CGPA: 3.6/4 Aug 2016 – Aug 2020 Technical Skills

Programming: Python, Java, JavaScript, Ansible, SQL, HTML/CSS, Spring Boot, C/C++ DevOps: AWS, Docker, Kubernetes, Helm, Terraform, Firebase, Swagger, Postman, Jenkins, Apache Spark, CI/CD Machine Learning: PyTorch, TensorFlow, Hugging Face, NumPy, Matplotlib, vLLM, Triton, Pandas, JAX, GPU Professional Experience

Meta Oct 2025 – Present

Linguist Engineer 2 Burlingame, CA

• Developed hybrid data and rule-driven Text Normalization system for Meta Ray-Ban Smart Glasses (Telugu TTS), via lightweight CNN + BiLSTM model with action masking and dropout regularization, improving accuracy by 38%

• Engineered production inference pipeline with beam-search pruning and TorchScript C++ deployment, reducing on-device TTS latency by 25% (595ms 446ms) and memory by 9% (32.9MB 30.1MB) Scale AI Jan 2025 – Sep 2025

Gen AI Intern San Francisco, CA

• Developed production-grade MCP (Model Context Protocol) servers enabling LLM agent tool calling for e-commerce

(Amazon) and local business (Yelp), supporting 11 real-world tools and saving customers 4 minutes per query

• Collaborated with 3 cross-functional teams to design RLHF-based LLM evaluation pipeline tracking 14 error categories across 4 dimensions, achieving 91% tool selection accuracy, 86% parameter accuracy, and 98% satisfactory summary Center for Language and Speech Processing Jan 2024 – Dec 2024 Machine Learning Engineer Baltimore, MD

• Preprocessed 28 hours of audio in PyTorch by silence removal and segmentation, leading to 20% memory reduction

• Optimized embedding extraction by 40% (109 hrs 1 hr) using CUDA-accelerated distributed batch processing, and improved classification accuracy by 15% through Zipformer + BiLSTM model with hyperparameter tuning OpenText Aug 2020 – Aug 2023

Software Engineer Bengaluru, India

• Built distributed data pipeline to visualize 50K+ events from microservices; employed Filebeat for log shipping, Kafka for ingestion, Spark for real-time processing, and Elasticsearch/Kibana for scalable storage and visualization

• Led Password Management SaaS team in building 20+ REST APIs on Linux (SLES and RHEL) using Spring Boot with OAuth2 authentication for change password and policy enforcement functionality with Swagger

• Implemented Jenkins CI/CD pipeline to validate customer releases via deployment testing (AWS, Docker, Kubernetes, Helm), security scanning (Trivy), API/UI testing (Javascript), and reporting, reducing manual effort by 100% Publications

Parseltongue package: Engineered voice-driven Python programming toolkit for hands-free coding, integrating Dragon NaturallySpeaking with custom grammars; supported Python 3.x commands with 71% speed improvement (PyPI) ACL 2025: Created spoken language translation systems for 10 language pairs leveraging SeamlessM4T, Whisper, and Whisper+NLLB models; applied Minimum Bayes Risk (MBR) ensembling to enhance translation accuracy (ACL Paper) ACL 2025: Generated educational material for 4 low-resource Indigenous languages employing chain-of-thought reasoning, POS tagging, and ensemble learning; improved accuracy by 10% over previous benchmarks (ACL Paper) Projects

Aragorn - RAG Chatbot for Argumentative Reasoning Link

• Built multi-agent system with GPT API, LangChain, BM25, and FAISS to argue with humans; applied prompt engineering techniques via one-shot, few-shot, and role-playing personas, boosting conversational quality by 20% News Article Recommendation System Link

• Implemented data processing pipeline with Selenium to collect 40K+ news articles with FastAPI backend for content-based recommendations through TF-IDF and GloVe; deployed on Google Cloud Run for serverless hosting Cognitive Decline Prediction with Behavioral Risk Analysis Link

• Leveraged Spark and Databricks to analyze BRFSS data (445k rows, 320 columns), applying feature selection using Cram er’s V and ANOVA, imputation, and SMOTE for class imbalance, yielding 91% accuracy with XGBoost

Contact this candidate