Anuhya Samudrala
***.******@*****.*** LinkedIn GitHub 980-***-****
ABOUT
Results-driven Data Scientist specializing in AI/ML Engineering with 8+ years of experience delivering enterprise AI solutions across government, finance, and automation domains. Proven track record of architecting production-grade systems including LLM-powered chatbots and fraud detection pipelines, generating $6M+ in cost savings and efficiency gains. Expert in leading cross-functional teams and deploying scalable, ethical AI using cloud platforms and LLMOps best practices
WORK EXPERIENCE
AI Engineer – Honeywell (UNCC Research Collaboration) Jan 2024 – Present Customer Support Chatbot (Multi-modal RAG with GenAI)
• Architected production-ready RAG pipeline using OpenAI and Claude APIs with Pinecone vector database for Honeywell's automation, resolving 8,000+ monthly queries and reducing unresolved cases by 70%
• Configured persistent memory with PostgreSQL and hybrid search retrieval via LangChain, enabling seamless multi-turn conversations for technical support across 15+ global manufacturing sites
• Engineered conversational application with Python-based classification models and secure API development, boosting accuracy 40% using advanced prompting (CoT, HyDE) techniques
• Implemented LLM evaluation using LangGraph for structured flow control, with confidence thresholds, human-in-the-loop, RAGAS metrics (0.87 relevance, 0.83 faithfulness)
• Deployed and optimized GenAI pipelines using AWS Lambda, ECS, API Gateway, S3, and SageMaker, integrating with CloudWatch and IAM for secure, scalable LLM workloads in production
• Leveraged vibe coding tools like Cursor for test case generation and Copilot for boilerplate generation, enabling faster, iterative GenAI development with streamlined engineering practices
• Translated complex business requirements into scalable GenAI solutions by collaborating with 12+ SMEs and Product Owners, driving stakeholder alignment and delivery across a $2.1M project scope Agentic AI Coding Agent
• Built a multi-agent system using CrewAI to automate development tasks including code summarization, classification, and version-aware alerting, improving efficiency by 25%
• Designed specialized agent roles for planning, coding, and review, enabling collaborative generation and refinement of software components that simulate real-world engineering workflows
• Benchmarked enterprise and open-source LLMs (OpenAI, Anthropic, Mistral, Llama, Gemini) across latency, cost-per-token, and accuracy metrics to validate model assumptions for production deployment decisions Data Scientist – NTT Data, India Dec 2021 – Sep 2023 Legal Dialog System (Multilingual NLP)
• Built OCR-based Named Entity Recognition (NER) pipeline with Tesseract, spaCy, and regex to extract structured entities (names, dates, IDs) from certificates, achieving a 40% improvement in accuracy
• Spearheaded a hybrid legal chatbot using rule-based NLP and transformer models to classify certificate types and automate validation across 15+ departments, reducing compliance costs by 30% ($850K annually)
• Fine-tuned CAMeLBERT (Hugging Face) using PyTorch and TensorFlow (Keras) for text classification and field extraction in multilingual legal documents, focusing on Arabic language processing
• Collaborated with technical leads, business users, and policy teams to ensure alignment with government regulations, maintaining 95%+ stakeholder satisfaction over multiple release cycles Document Processing Automation
• Designed scalable ML pipelines using Apache Spark and MLflow to process 12M+ certificates annually, improving validation accuracy by 35% using ensemble methods with Random Forest and Logistic Regression
• Orchestrated end-to-end workflows using Airflow on Azure, managing scheduling, data cleaning, feature engineering, model selection, hyperparameter tuning, and drift analysis across 20+ departments
• Developed pre-screening and validation workflows using rule-based logic and predictive binary/multiclass classification models to detect forgery and duplicates, preventing $1.2M in fraud losses annually
• Conducted data analytics and EDA using Tableau and PCA to reduce 3,000+ features, and applied LIME for explainable AI, improving transparency on rejected applications and reducing escalation time by 5 days
• Containerized ML models with Docker and deployed via Flask APIs on Kubernetes, integrating CI/CD workflows (GitLab, Jenkins) and real-time Kafka ingestion to maintain 99.1% uptime
• Implemented data lake architecture using Hadoop ecosystem (HDFS, Hive, HBase) to store and process 50TB+ of legal documents, enabling efficient batch and real-time analytics
• Mentored 3 junior data scientists on ML practices and deployment strategies, improving productivity by 20% Data Scientist – Infosys, India Dec 2018 – Nov 2021 Real-time Fraud Detection Systems
• Built and fine-tuned ML models for imbalanced datasets using SMOTE, ADASYN, and cost-sensitive learning improving fraud recall by 42%, while maintaining precision above 90%
• Applied unsupervised and semi-supervised techniques (Isolation Forest, Autoencoders, One-Class SVM, K- means) to detect anomalies in 8TB+ datasets, identifying new fraud patterns with minimal false positives
• Created fraud intelligence dashboards using Power BI, Tableau, and exploratory visualizations in Matplotlib and Seaborn, enabling real-time alerting and saving 10,000+ analyst hours annually
• Deployed real-time ETL pipelines using Kafka and Spark for fraud detection, scaling from synthetic-data PoC to production system with 24/7 monitoring and processing 100M+ transactions daily
• Optimized financial strategy and policy simulations using ML models, securing $450K in cost savings and deferring $1.47M in payouts, earning recognition from senior leadership
• Collaborated with Business Analysts to define success metrics (F1, ROC-AUC, Precision/Recall), ensuring technical outputs aligned with business KPIs and regulatory requirements Software Developer – Infosys, India Jun 2017 – Nov 2018
• Created automated travel reimbursement system using Python for scraping and analyzing data with Word Clouds and Cosine/Jaccard similarity algorithms, saving $250K annually in audit costs for HR operations
• Promoted within 10 months for leading employee retention analysis improvements, increasing workflow efficiency by 35% through predictive analytics implementation
• Monitored ETL pipelines integrating data from diverse sources (SQL, XML, flat files, Excel) enabling accurate reporting across HR, Finance, and Operations dashboards
• Contributed to data science initiatives by prototyping classification algorithms in Scikit-learn and TensorFlow, establishing foundation for advanced analytics capabilities
• Participated in Agile workflows using JIRA for task tracking and sprint planning, conducted code reviews, and implemented scalable backend solutions along with unit tests for early-stage code validation AWARDS & RECOGNITION
• LabLab.AI Hackathon Winner: Built a RAG-based CrewAI app with Streamlit for surgical patient support
• Devpost Hackathon Finalist: Created an AI-powered credit analysis tool with real-time risk monitoring
• CAIR Bronze Winner: Developed stock market forecasting tool placing 3rd among 600+ competing teams
• Employee of the Year, NTT Data: Honored for driving innovative AI solutions in legal services automation
• Infosys Rockstar Award: Recognized for building an end-to-end fraud detection system, saving $4.2M
• Employee of the Month (2X), Infosys: Awarded twice for outstanding performance in ML pipeline optimization and stakeholder alignment on mission-critical projects LEADERSHIP & VOLUNTEERING
• President, TRIVENI (UNCC Indian Student Org): Led 25-member team supporting 500+ Indian students with onboarding, housing coordination, and airport transportation services
• Open-Source Contributor, Rocket.Chat: Resolved critical bugs, refactored legacy modules, and mentored 10+ new contributors through GitHub collaboration
• Technical Volunteer, AI & Data Science Symposium: Served as reviewer and technical coordinator for university research symposium with 300+ attendees
• Python Instructor: Taught 80+ high school students in underserved communities, focusing on Python fundamentals, logic building, and problem-solving; helped several transition into CS education CERTIFICATIONS
• Machine Learning Specialization by Stanford: Completed Andrew Ng's comprehensive course covering supervised learning, model evaluation, and regularization techniques
• Deep Learning Specialization by Stanford: Gained hands-on experience with neural networks, CNNs, RNNs, and sequence models using TensorFlow
• Prompt Engineering for Developers: Learned to craft effective prompts for coding and problem-solving
• Generative AI with LLMs: Explored LLM architecture, fine-tuning, embeddings and real-world applications EDUCATION
Master’s in Computer Science, Concentration in AI Jan 2024 - May 2025 University of North Carolina, Charlotte GPA: 4.0
Bachelor’s, Electronics and Communication Engineering Aug 2013 - May 2017 RVR & JC, India GPA: 3.92
Relevant Coursework: Advanced NLP, Reinforcement Learning, Generative AI, Data Mining SKILLS
AI & Machine Learning: PyTorch, TensorFlow, scikit-learn, Keras, NumPy, Pandas, RAG, Transformers Large Language Models: GPT, Llama, Claude, Ollama, Gemini, Hugging Face, Fine-Tuning, LoRA, QLoRA LLMOps & Vector DB: MLflow, Weights & Biases, ChromaDB, FAISS, Pinecone, Weaviate, LangSmith Deep Learning: ANN, LSTM, Autoencoders, GANs, Attention Mechanisms, Tokenizers, ResNet, U-Net Cloud/DevOps: AWS (EC2, S3, SageMaker), Azure ML, GCP, Docker, Kubernetes, CI/CD, Jenkins Data & Analytics: R, EDA, Classification, Regression, Clustering, Seaborn, MATLAB, Tableau, PowerBI Databases/Backend: MongoDB, PostgreSQL, MySQL, FastAPI, Flask, React, Streamlit, Nginx Advanced AI Skills: A2A (Agent-to-Agent), MCP (Model Context Protocol), Multi-Agent Systems