Why I am a perfect fit for this job:
**+ years of experience developing and deploying novel ML/DL methods to real world problems (albeit in medical domain)
8+ years of team leadership and management experience to recruit, engage and develop staff with advanced degrees.
7+ years of experience delivering end-to-end ML solutions (requirement gathering to deployment) at enterprise level (1000’s API calls per second)
10+ years of expertise in developing NLP models.
Thorough understanding and strongly conversant in the theory underlying both classical ML and DL
Experience with Machine Learning algorithms, including Deep Neural Networks, supervised and semi-supervised methods with strong knowledge of the mathematical underpinnings behind these models
An interdisciplinary communicator - ability to describe technical ML concepts to C-suite, and to convey company goals to ML researchers.
What I have done so far:
Fine-tuned and deployed an end-to-end LLM model to extract information like doctor name, DOB, medications, diagnosis, procedures, lab values from these EHR pdfs.
Developed and deployed a time-series model to predict the advent of Clostridium difficile infection with a team of 4 data scientists. Our current model has a F1 Score of 0.7 for a 4 day lead.
Developed and deployed a ML model pipeline to update existing drug mapping software which maps local drug with dosages to an internal standard. This model ensembles LSTMs and SVMs
Deployed a model to predict whether a scientific article has an adverse drug reaction that needs to be reported to the FDA. This model used transformer-based NLP models (Like BERT, XLNET etc.)
Deployed a model to automate the mapping of incoming lab names to LOINC. This model uses a combination character and word-based Sequential DL models (GRU/LSTM)
Developed and deployed a hierarchical model to predict the label for a fax document. When I left the company, it is predicting at above 90% accuracy and once deployed will be saving athenahealth millions of dollars in BPO costs.
Developed several algorithms for matching strings intelligently. Use cases involve matching an incoming compendium of medical terms to a global list of medical terms. This was a manual process and automating it saves precious FTE which can diverted for other useful purposes.
Developed an ensemble machine-learning model to predict no-show probability of patients based on their medical information and history.
Where have I worked so far:
Data and Analytics Lead, Takeda Pharma [ 2022 – 2023]
Led a team of data scientists, harnessing the secondary use of data with descriptive, predictive, and prescriptive analytics. Delivered innovative visualization, analysis, integration, and engineering solutions for diverse therapeutic area units (TAUs) within Takeda
Utilized real-world datasets like Optum, MarketScan, and Flatiron data to extract actionable insights from real-world data, conducting analyses like patient response to medications, etc. etc.
Headed an effort to design, develop, and provide support for software predicting patient enrollment for all ongoing drug trials.
Lead Data Scientist and Senior Manager, Wolters Kluwer Health. [ 2017 –2022]
Leading a team of 8 data scientists and data engineers on multiple projects (mentioned above) involving Text Classification, Named Entity Extraction, Document Clustering, and Disease onset predictions.
Lead Member of Technical Staff, Athena health. [ 2015 –2017]
Lead a team of 4 data scientists to extract entities (like name, zip etc.) from fax document (https://hbr.org/2018/03/how-ai-is-taking-the-scut-work-out-of-health-care).
Deployed a model to predict the label for a fax document.
Designed several algorithms for intelligent string matching in healthcare domain
Developing machine learning (ML) models for analysis of patterns in patient data to predict no-show rates.
Brainstorming with Biz Dev to develop and plan required analytic projects in response to business needs.
Interacting with Data Engineering team on ML modeling specs and implementation details
Building prototype ML models
Analyzing data sets to understand patient behavior
Researching new and better ML Algorithms to optimize performance
Instructor in Neurobiology, Harvard Medical School [ 2012 – 2017]
Mentored over 5 Ph.D students
Taught a course on advanced techniques for analysis of patterns in data.
Wrote scripts to analyze the fMRI data (using Linux/MATLAB and C) using supervised and unsupervised machine-learning methods (I used SVM, Fuzzy K-means clustering and Gaussian Mixer models for this project).
Developed an automatic in-cage touchscreen system for training animals efficiently to recognize and respond to shapes and images representing different reward amount.
What languages can I code:
Programming Languages C (> 10 Years), C++ (>5 Years), JAVA (<2 Years)
Scripting Languages MATLAB (> 10 Years), Bash (>6 Years), Python (>8 Year)
Packages Numpy, Scipy, scikit-learn, Pandas (>6 Year)
Others Direct X (>4 Years), SQL (> 1Year),
What are my other achievements:
Mentored 5 grad students and multiple under-grad students in my time at Harvard
Wrote scripts to analyze the fMRI data (using Linux/MATLAB and C) using supervised and unsupervised machine-learning methods (I used SVM, Fuzzy K-means clustering and Gaussian Mixer models for this project).
Developed an automatic in-cage touchscreen system for training animals efficiently to recognize and respond to shapes and images representing different reward amount. The code was written in MATLAB.
Developed a non-invasive method for fMRI in alert monkeys. Most laboratories that scan alert monkeys use a surgically implanted head post to keep the animal’s head still during scanning. I designed a helmet that holds a monkey’s head still during MRI scanning using a chinstrap and gentle vacuum.
Developed Software to display, control and analyze fMRI and Neurophysiology data in real time (I used DirectX 11 for display and C++ for control). The software uses object oriented programming for real time processing and analysis.
Developed MATLAB scripts to merge CT scan data with fMRI and MRI data to localize regions of interest in the brain.
As part of Ph.D. thesis, I implemented a biologically plausible convolutional neural network for tracking moving objects. I used C to code and MATLAB to display my results.
As part of Masters thesis, I used parallel virtual machine (PVM) software to parallelize video encoding of satellite data
What’s my skill set:
Python pandas and scikit-learn ( for my current job)
Deep learning expertise using Keras and PyTorch
Algorithm Development in C++ and C (for my display software)
Natural Language Processing (Large Language Models (ChatGPT like), Transformers (BERT-like), TFIDF, Word and Document Vectors, Conditional Random Fields, NER, Hierarchical Clustering,)
Supervised Machine Learning (LDA, SVM, Gaussian Mixer Models, Random Forest for fMRI Data analysis)
Unsupervised Machine Learning (Clustering Techniques like K Means, Fuzzy K Means, DBScan, ISoMap; Dimensionality reduction techniques like ICA, PCA)
Neural Networks (Convolutional Neural networks for my Ph.D thesis)
Parallel Processing (Used Parallel Virtual Machine to do data parallelization as part of my Masters thesis).
What’s my educational background:
Research fellow and Instructor in Neurobiology at Harvard Medical School, Boston, MA [ 2008 –2015 ]
Published 7 papers in major journals in 6 years. One got published in Neuron and two in Nature Neuroscience.
Ph.D. in Cognitive and Neural Systems at Boston University, Boston, MA [ 2002 – 2008 ]
GPA 3.8 (among top 3% of the class)
Masters in Engineering in System Science and Automation, IISc, Bangalore, India [ 1997 – 1999 ]
Among the top 3% of the class.
Bachelors in Electronics Engineering from Sri Venkateswara University, Tirupati, India [ 1992 – 1996 ]
Ranked first among 536 students in the University
Can I work in us:
US Citizen