Data Scientist - NLP & Retrieval Engineering

Location:

Binghamton, NY

Posted:

June 29, 2026

Contact this candidate

Resume:

AISHWARYA MANDYA YOGANANDA

New York +1-607-***-**** *************@*****.*** LinkedIn GitHub Portfolio SUMMARY

Data Scientist with 3 years of experience in NLP, model fine-tuning, and retrieval pipeline evaluation using Python, PyTorch, and Hugging Face, focused on rigorous experimentation, domain-specific training, and production deployment, through internships and personal projects. EDUCATION

Binghamton University, State University of New York Aug 2024 – May 2026 Master of Science in Computer Science

Visvesvaraya Technological University Sep 2020 – May 2024 Bachelor of Engineering in Information Science and Engineering EXPERIENCE

Machine Learning Software Engineer Manifoldz, NY, USA May 2026 – Present

• Developed core logic modules and documented 100% of codebase for the MSC data transformation and reporting tool, enabling the analytics team to independently run and maintain workflows, reducing onboarding time by 40%.

• Analyzed MSC report outputs, identified 15+ software defects and data inconsistencies, and logged structured incidents with reproducible findings, achieving 100% issue traceability and accelerating defect investigation and resolution.

• Led cross-functional QA/QC across MSC workflows, validated outputs against defined business rules, uncovered 20+ reporting errors, and created a five-step remediation roadmap that reduced recurring reporting rework by 30%. Machine Learning Software Engineering Intern Manifoldz, NY, USA Sep 2025 – Apr 2026

• Fine-tuned Whisper with Hugging Face Transformers on legal-audio datasets, applying audio preprocessing, feature engineering, and multi-split evaluation to reduce transcription error rates by 43.3%.

• Optimized real-time inference pipelines using Python, REST APIs, and weather integrations, improving data flow, request handling, and API execution to reduce end-to-end prediction latency by 30%.

• Retrained TensorFlow and NLP models using curated domain data, feature selection, hyperparameter tuning, and structured evaluation, increasing prediction accuracy by 18% and improving alert reliability by 20%.

• Built a real-time ML prediction system using Laravel, APIs, and inference workflows, automating model execution, response tracking, and performance monitoring to process 1K+ daily predictions with 35% greater operational efficiency. Research Assistant Binghamton University, Binghamton, NY Jan 2025 – Apr 2025

• Designed a secure real-time pipeline using C, RTOS, and AES-128 encryption to collect, validate, preprocess, encrypt, and transmit streaming sensor data with 99.9% end-to-end transmission reliability.

• Optimized real-time streaming through profiling, performance analysis, and pipeline tuning, resolving runtime bottlenecks and reducing processing latency by 40% while sustaining 100 FPS throughput and 95% system uptime. Data Scientist Intern BrainOVision Solutions, Bangalore, India Apr 2022 – May 2022

• Optimized data workflows using Python, Pandas, NumPy, and SQL to clean, transform, validate, and structure large datasets, improving processing efficiency and reducing data-quality inconsistencies by 30% across projects.

• Developed an NLP sentiment analysis model using NLTK, Python, and BI tools, automating text classification, feature extraction, and dashboard reporting to reduce manual analysis and reporting time by 50%. PROJECTS

Marine ASR Fine-tuning Python, Whisper, Hugging Face, PyTorch §

• Fine-tuned Whisper on marine and legal audio, reducing Word Error Rate (WER) by 43.3% compared with the base model.

• Applied audio segmentation, noise filtering, and augmentation; evaluated performance using WER, MER, and CER across test splits. Quantamind Legal AI Agent Python, Mistral 7B, LoRA, FAISS, Hugging Face §

• Benchmarked LoRA fine-tuning (92.5% accuracy) vs prompt engineering (85.3%) on the CUAD legal contract dataset — 7.2% improvement.

• Built an offline RAG pipeline using FAISS and Mistral 7B for private contract clause retrieval, semantic search, and structured extraction. Audit Readiness AI Python, LangChain, OpenAI API, RAG, FastAPI §

• Reduced audit preparation by 60% using RAG over SOC 2 documentation for automated evidence retrieval and compliance analysis.

• Benchmarked TF-IDF, SentenceTransformers, and hybrid retrieval across precision, recall, and latency — hybrid retrieval improved precision by 12% over TF-IDF while maintaining sub-250ms query latency for compliance searches. TECHNICAL SKILLS

Programming & Data Analysis: Python, SQL, R, Pandas, NumPy, Matplotlib, Seaborn, Tableau, Power BI Machine Learning: Scikit-learn, TensorFlow, PyTorch, XGBoost, Feature Engineering, Model Evaluation, Hyperparameter Tuning Statistical Techniques: Hypothesis Testing, A/B Testing, Regression, Classification, Time-Series Analysis Data Engineering & Databases: PostgreSQL, MySQL, MongoDB, Spark, PySpark, ETL Pipelines Tools & Platforms: Jupyter Notebook, Git, Docker, AWS (S3, EC2), GCP

Contact this candidate