Post Job Free
Sign in

Data Visualization Big

Location:
Boston, MA
Posted:
June 07, 2024

Contact this candidate

Resume:

Dhruv Miyani

+1-617-***-**** ******.*@************.*** github.com/DhruvMiyani linkedin.com/in/dhruvmiyani Website EDUCATION

SKILLS

Programming Languages: Python, C, Java, Dart, SQL, JavaScript, SAP-ABAP Libraries & Platforms : Pytorch, Pandas, NLTK, Scikit-learn, Hugging Face, Scikit-Learn Large language Models : Fine-Tuning, Prompt Engineering, RAG, RLHF (Feedback), Scaling Laws, LangChain Cloud & Database : Google cloud platform, Amazon Web Services, MySQL, PostgreSQL, Firebase Data Visualization: Tableau, PowerBI, Seaborn, ggplot2, Flourish, Plotly, Altair, D3 Natural Language Processing: Text Classification, BERT - Transformers, Text Generation, Language modelling Concepts & Methodologies : Big Data Processing, CNN, RNN, Transformer architecture, Junit, MVC PROFESSIONAL EXPERIENCE

ML Research Assistant, Northeastern university Sep 23-Present GCP, NLP,Pandas, Fuzzy Matching, Big Data Processing, SQL,Machine Learning, ETL, Technical Writing

• Developed and automated data pipelines via Python and GCP for analysing forced labor risks in global supply chains, saving over 200+ hours of manual work and enhancing scalability and resilience.

• Utilized SQL and Fuzzy Logic for in-depth analysis and data handling within large datasets (1M+ records), improving decision-making processes and data storage optimization.

• Employed advanced heuristics and NLP techniques, including Doc2Vec, to efficiently process and analyze extensive datasets and to apply joins on large non-standardized datasets, streamlining model development. Teaching Assistant, khoury college of computer sciences Jul 23- Present Matplotlib, D3,Vega-Altair, Tableau, Java script, Technical Troubleshooting

• Engaged 150+ undergraduates in 3 classes with an emphasis on Information Presentation and Visualization, honing their skills in technical communication and complex concept understanding.

• Hosted practical 13 + data visualization sessions using matplotlib, D3,Vega-Altair, and Tableau, equipping students with hands-on experience in industry-standard tools.

• Mentored students for data visualization project, coupled with personalized technical support, fostering a rapid learning environment and demonstrating strong interpersonal and troubleshooting skills. Junior Data Scientist, veloc Surat Dec 21- Dec 22

TF-IDF, Word2Vec, and Doc2Vec, MVP, Name Entity Recognition

• Refined customer review datasets by employing text normalization and vectorization techniques, including stop word removal, tokenization, lemmatization, and regex, followed by NER, TF-IDF, Word2Vec, and Doc2Vec applications, Leveraged KMeans and LDA to extract and define key topic clusters, resulting in a 25% enhancement in content interpretation accuracy.

• Innovated the development process by integrating MVP architectural patterns, leading to a 20% improvement in code reusability, and synthesized complex data insights into a Plotly-Dash Web application, effectively communicating 6 critical customer pain points to stakeholders and influencing strategic decisions.

ACADEMIC PROJECTS

RAG Based Clinical Language Model : Performance Study Feb 2024 – Present LLMs, GenAI, Fine Tuning, BERT, RNNs, LSTM, RAG,OpenAI API, LlamaIndex, LangChain, RAGAS

• Conducted a performance study comparing Retrieval-Augmented Generation (RAG) with GPT-3.5 and a fine-tuned ClinicalBERT model on Nosocomial Risk Datasets.

• Demonstrated through empirical research the subtle yet significant edge of RAG-integrated LLMs over fine-tuned specialized models in clinical domain-specific tasks.

• Utilized LlamaIndex to query clinical records, enhancing the efficiency and accuracy of data retrieval for model training, and evaluated the results using RAGAS score.

• Strategized the expansion of the model’s capabilities to incorporate multimodal data inputs, paving the way for integrating Medical Imaging with electronic health records using LLaVa. Multi-classes Hate speech Detection in Gujarati social media Text Jan 22-Jul 22 KNN,Naïve Bayes,TF-IDF, Stochastic Gradient Descent

• Developed,implemented 6 classes hate speech detection system for Gujarati language social media text

• Gathered data from 3+ social media platforms, performed advanced tokenization, and used TF-IDF vectorization method

• Experimented dataset using logistic regression, KNN, Naive Bayes, and Stochastic Gradient Descent (SGD)

• Created a labelled dataset of 5500 user's comments, Annotated into 6 subcategories of hate speech, Achieved a maximum accuracy of 0.6798 on test dataset using best performing classifier Northeastern University - Khoury College of Computer Sciences, Boston, MA Jan 23 - Present Master of Science in Artificial intelligence GPA : 3.8/4.0 Courses: Algorithms, Practical Neural networks, Large language Modals, Computation and visualization P P Savani University, Surat, India Jul 19 – April 22 Bachelor of Science in Information Technology GPA: 3.9/4.0 Courses: Object Oriented Programming, Database System, Web Technology, Discreate Math & Linear Algebra, Calculus, Statistics



Contact this candidate