Professional Summary
**+ Years Research, ** Years Data Science
Accomplished Data Scientist with 10 years of experience that leverages a deep understanding of machine learning, statistical, and mathematical techniques to propel business performance and extract maximum value from across several key domains.
●Experience in the application of Naïve Bayes, Analysis, Neural Networks/Deep Neural Networks, and Random Forest machine learning techniques.
●Advanced statistical and predictive modeling techniques to build, maintain, and improve on real-time decision systems.
●Creative thinking/strong ability to devise and propose innovative ways to look at problems by using business acumen, mathematical solutions, data models, and statistical analysis.
●Advanced analytical teams to design, build, validate and refresh data models.
●In-depth knowledge of statistical procedures that are applied in both Supervised and Unsupervised Machine Learning problems
●Machine learning techniques to marketing and merchandising ideas.
●Ability to quickly gain an understanding of niche subject matter domains, and design and implement effective novel solutions to be used by other subject matter experts.
●Experience implementing industry standard analytics methods within specific domains and applying data science techniques to expand these methods, for example, using Natural Language Processing methods to aid in normalizing vendor names, implementing clustering algorithms, and deriving novel metrics.
Technical Skills:
Analytic Development
Python, R, IDL, SAS, SQL
Python Packages
Numpy, Pandas, SciPy, TensorFlow, PyTorch, Matplotlib, Seaborn
Machine Learning
Natural Language Processing & Understanding, Image Recognition, and Detection, Forecasting
Artificial Intelligence
Text understanding, classification, Pattern Recognition
Deep Learning
Data Mining, Machine Learning Algorithms, Neural Networks, TensorFlow, Keras.
Analysis Methods
Advanced Data Modelling, Forecasting, Statistical, Sentiment, Stochastic, Bayesian analysis, Regression analysis, Linear models, Multivariate analysis, Sampling methods
Analysis Techniques
Classification and Regression Trees (CART), Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, ANN, Regression, Naïve Bayes.
Data Modeling
Bayesian Analysis, Statistical Inference, Predictive Modelling, Linear Modelling, Probabilistic Modelling, Time-Series Analysis.
Applied Data Science
Natural Language Processing, Machine Learning, classification, Social Analytics
IDE
Jupyter Notebook, Spyder, RStudio, Google Colab
Version Control
GitHub
Soft Skills
Excellent communication and presentation skills; ability to work wee with stakeholders to discern needs accurately; leadership, mentoring, and coaching
Data Scientist
Kaiser Permanente, 2019-Present
Irvine, California
The initial focus of the team was to build a model to identify and predict possible cancer remission based on patient and tumor information. However, once the Covid-19 pandemic began, the team’s focus was redirected into building a classifier capable of separating cases of Covid-19 from other respiratory issues. To accomplish this, a combination of CT scan images and other demographic data was used to achieve a novel 80% accuracy in the early stages of the pandemic.
●Produced cancer diagnoses based on patient demographic and tumor size, shape, and location data.
●Predictions were then used to recommend and optimize patient treatment plans by medical professionals.
●Data pulled from an internal SAS database.
●CT scan image data analyzed using Convolution Neural Networks (CNNs).
●Dataset consisted of an even split of 743 COVID and non-COVID images from the medical office.
●Images standardized into 64x64 flattened matrices with a 70%/30% Train/Test split.
●Generated Visualizations from SAS data using R: https://preethamvignesh57.wixsite.com/mysite/covid-19
●Generated Tableau visualization of Covid-19 Data: https://preethamvignesh57.wixsite.com/mysite/tableau
Senior Data Scientist
NASA, 2015-2019
Irvine, California
Heavy rainfall prediction is a major problem for the meteorological department as it is closely associated with the economy and daily life. For this project I developed several time-series models to predict rainfall across several distinct regions using a large amount of historic data from 1901-2015. These predictions were then additionally used to generate advance warnings for natural disasters like floods and drought across the globe.
●Used Python and IDL to retrieve the historical hierarchical format (HDF5) and clean data prior to implementing and model training.
●Wrote functions to perform pre-processing to impute the missing values using a linear interpolation technique.
●Normalization of features in the data to reduce noise and maximize signal-to-noise ratio.
●Feature reduction using Principal Component Analysis (PCA) to minimize the data
●Data stationarity validated with the Dickey-Fuller test.
●The Multiple Linear Regression and ARIMA models used for rainfall prediction.
●Strong seasonality favored ARIMA performance.
●Final prediction accuracy is 80.67% and the F-measure value is 0.88 for estimating the efficiency of the model.
●Using ML techniques we can predict future long-term trends using historical datasets.
●Comprehensive reports and documentation written in LaTeX and subsequently presented to stakeholders.
Data Scientist
Bank of Canada, 2012 - 2015
Halifax, Nova Scotia
For this project an request evaluation system was developed to ingest customer requests from a diverse set of digital and handwritten sources which were then filtered based on urgency and forwarded to the appropriate department. Handwritten sources were pre-processed using Optical Character Recognition (OCR) techniques and then handed to the classification model in a hierarchical approach. Relevant department was identified using metadata such as form and request type determined by the source.
●Bayesian and KNN techniques along with Tesseract were compared for OCR applications based on model accuracy and speed.
●Models achieved 91%, 96%, and 97% accuracy respectively.
●OCR model performance evaluated on textual and MNIST datasets.
●Tesseract provided the most consistent OCR results and was used in the productionized solution.
●Sorting subsequently done through the training and testing of an artificial neural network.
●Data Cleaning, Imputation, Tokenizing: used python libraries (pandas, nltk, numpy, Keras) to clean and prepare the data for analysis.
●Urgency identified using a Natural Language Processing-based classifier.
●Final Bidirectional LSTM model achieved 85% test accuracy for identifying urgent vs. non-urgent.
●Production model deployed to a flask API for use by the business.
Data Analyst, ML SME
Nippon News Network, 2008-2012
Tokyo, Japan
Predicted air temperatures are a key factor not only for individual consumers of news, but especially for agricultural, ecological, environmental, and industrial sectors. Thus, In order to get a competitive edge over other networks a model to better estimate air temperatures was developed. To accomplish this, Machine learning tools were implemented over the entire workflow of weather prediction. The resulting estimates of temperature for monthly, daily, and hourly forecasts showed on average a 4.6% improvement over the previous standard.
●Attributes considered included max temperature, min temperature, humidity, pressure, and wind speed.
●Post-processing data analysis of completed self-consistent calculations done with Python and IDL.
●Several models were constructed to compare performance and computational cost and select the one which best met the needs of the client.
●Used a variety of Linear Regression, KNN Regression, and Support Vector Machine (SVM) techniques.
●SVM Demonstrated best and most robust performance
●The performance and evaluation of the models are measured by its mean absolute error (MAE), root-mean-squared-error (RMSE).
●In addition to these error measures, percentages errors have been calculated as well during the evaluation in the forecast domain.
●Finalized model was then handed over to Android and iOS app developers along with web developers to create a user front-end.
Education
Ph.D. in Physics
Indian MST Radar Data analysis
Sri Venkateswara University
Tirupati, Andhra Pradesh, India
Masters of Science in Physics
Sri Venkateswara University
Tirupati, Andhra Pradesh, India
Bachelor of Science in Mathematics, Physics, and Statistics
Sri Venkateswara University
Tirupati, Andhra Pradesh, India