Dhruv Miyani
+1-617-***-**** ******.*@************.*** github.com/DhruvMiyani linkedin.com/in/dhruvmiyani Website EDUCATION
SKILLS
Programming Languages: Python, C, Java, Dart, SQL, JavaScript, SAP-ABAP Libraries & Platforms : Pytorch, Pandas, NLTK, Scikit-learn, Hugging Face, Scikit-Learn Large language Models : Fine-Tuning, Prompt Engineering, RAG, RLHF (Feedback), Scaling Laws, LangChain Cloud & Database : Google cloud platform, Amazon Web Services, MySQL, PostgreSQL, Firebase Data Visualization: Tableau, PowerBI, Seaborn, ggplot2, Flourish, Plotly, Altair, D3 Natural Language Processing: Text Classification, BERT - Transformers, Text Generation, Language modelling Concepts & Methodologies : Big Data Processing, CNN, RNN, Transformer architecture, Junit, MVC PROFESSIONAL EXPERIENCE
ML Research Assistant, Northeastern university Sep 23-Present GCP, NLP,Pandas, Fuzzy Matching, Big Data Processing, SQL,Machine Learning, ETL, Technical Writing
• Developed and automated data pipelines via Python and GCP for analysing forced labor risks in global supply chains, saving over 200+ hours of manual work and enhancing scalability and resilience.
• Utilized SQL and Fuzzy Logic for in-depth analysis and data handling within large datasets (1M+ records), improving decision-making processes and data storage optimization.
• Employed advanced heuristics and NLP techniques, including Doc2Vec, to efficiently process and analyze extensive datasets and to apply joins on large non-standardized datasets, streamlining model development. Teaching Assistant, khoury college of computer sciences Jul 23- Present Matplotlib, D3,Vega-Altair, Tableau, Java script, Technical Troubleshooting
• Engaged 150+ undergraduates in 3 classes with an emphasis on Information Presentation and Visualization, honing their skills in technical communication and complex concept understanding.
• Hosted practical 13 + data visualization sessions using matplotlib, D3,Vega-Altair, and Tableau, equipping students with hands-on experience in industry-standard tools.
• Mentored students for data visualization project, coupled with personalized technical support, fostering a rapid learning environment and demonstrating strong interpersonal and troubleshooting skills. Junior Data Scientist, veloc Surat Dec 21- Dec 22
TF-IDF, Word2Vec, and Doc2Vec, MVP, Name Entity Recognition
• Refined customer review datasets by employing text normalization and vectorization techniques, including stop word removal, tokenization, lemmatization, and regex, followed by NER, TF-IDF, Word2Vec, and Doc2Vec applications, Leveraged KMeans and LDA to extract and define key topic clusters, resulting in a 25% enhancement in content interpretation accuracy.
• Innovated the development process by integrating MVP architectural patterns, leading to a 20% improvement in code reusability, and synthesized complex data insights into a Plotly-Dash Web application, effectively communicating 6 critical customer pain points to stakeholders and influencing strategic decisions.
ACADEMIC PROJECTS
RAG Based Clinical Language Model : Performance Study Feb 2024 – Present LLMs, GenAI, Fine Tuning, BERT, RNNs, LSTM, RAG,OpenAI API, LlamaIndex, LangChain, RAGAS
• Conducted a performance study comparing Retrieval-Augmented Generation (RAG) with GPT-3.5 and a fine-tuned ClinicalBERT model on Nosocomial Risk Datasets.
• Demonstrated through empirical research the subtle yet significant edge of RAG-integrated LLMs over fine-tuned specialized models in clinical domain-specific tasks.
• Utilized LlamaIndex to query clinical records, enhancing the efficiency and accuracy of data retrieval for model training, and evaluated the results using RAGAS score.
• Strategized the expansion of the model’s capabilities to incorporate multimodal data inputs, paving the way for integrating Medical Imaging with electronic health records using LLaVa. Multi-classes Hate speech Detection in Gujarati social media Text Jan 22-Jul 22 KNN,Naïve Bayes,TF-IDF, Stochastic Gradient Descent
• Developed,implemented 6 classes hate speech detection system for Gujarati language social media text
• Gathered data from 3+ social media platforms, performed advanced tokenization, and used TF-IDF vectorization method
• Experimented dataset using logistic regression, KNN, Naive Bayes, and Stochastic Gradient Descent (SGD)
• Created a labelled dataset of 5500 user's comments, Annotated into 6 subcategories of hate speech, Achieved a maximum accuracy of 0.6798 on test dataset using best performing classifier Northeastern University - Khoury College of Computer Sciences, Boston, MA Jan 23 - Present Master of Science in Artificial intelligence GPA : 3.8/4.0 Courses: Algorithms, Practical Neural networks, Large language Modals, Computation and visualization P P Savani University, Surat, India Jul 19 – April 22 Bachelor of Science in Information Technology GPA: 3.9/4.0 Courses: Object Oriented Programming, Database System, Web Technology, Discreate Math & Linear Algebra, Calculus, Statistics