Data Analyst

Location:

Irvine, CA

Posted:

February 22, 2021

Contact this candidate

Resume:

Professional Summary

**+ Years Research, ** Years Data Science

Accomplished Data Scientist with 10 years of experience that leverages a deep understanding of machine learning, statistical, and mathematical techniques to propel business performance and extract maximum value from across several key domains.

●Experience in the application of Naïve Bayes, Analysis, Neural Networks/Deep Neural Networks, and Random Forest machine learning techniques.

●Advanced statistical and predictive modeling techniques to build, maintain, and improve on real-time decision systems.

●Creative thinking/strong ability to devise and propose innovative ways to look at problems by using business acumen, mathematical solutions, data models, and statistical analysis.

●Advanced analytical teams to design, build, validate and refresh data models.

●In-depth knowledge of statistical procedures that are applied in both Supervised and Unsupervised Machine Learning problems

●Machine learning techniques to marketing and merchandising ideas.

●Ability to quickly gain an understanding of niche subject matter domains, and design and implement effective novel solutions to be used by other subject matter experts.

●Experience implementing industry standard analytics methods within specific domains and applying data science techniques to expand these methods, for example, using Natural Language Processing methods to aid in normalizing vendor names, implementing clustering algorithms, and deriving novel metrics.

Technical Skills:

Analytic Development

Python, R, IDL, SAS, SQL

Python Packages

Numpy, Pandas, SciPy, TensorFlow, PyTorch, Matplotlib, Seaborn

Machine Learning

Natural Language Processing & Understanding, Image Recognition, and Detection, Forecasting

Artificial Intelligence

Text understanding, classification, Pattern Recognition

Deep Learning

Data Mining, Machine Learning Algorithms, Neural Networks, TensorFlow, Keras.

Analysis Methods

Advanced Data Modelling, Forecasting, Statistical, Sentiment, Stochastic, Bayesian analysis, Regression analysis, Linear models, Multivariate analysis, Sampling methods

Analysis Techniques

Classification and Regression Trees (CART), Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, ANN, Regression, Naïve Bayes.

Data Modeling

Bayesian Analysis, Statistical Inference, Predictive Modelling, Linear Modelling, Probabilistic Modelling, Time-Series Analysis.

Applied Data Science

Natural Language Processing, Machine Learning, classification, Social Analytics

IDE

Jupyter Notebook, Spyder, RStudio, Google Colab

Version Control

GitHub

Soft Skills

Excellent communication and presentation skills; ability to work wee with stakeholders to discern needs accurately; leadership, mentoring, and coaching

Data Scientist

Kaiser Permanente, 2019-Present

Irvine, California

The initial focus of the team was to build a model to identify and predict possible cancer remission based on patient and tumor information. However, once the Covid-19 pandemic began, the team’s focus was redirected into building a classifier capable of separating cases of Covid-19 from other respiratory issues. To accomplish this, a combination of CT scan images and other demographic data was used to achieve a novel 80% accuracy in the early stages of the pandemic.

●Produced cancer diagnoses based on patient demographic and tumor size, shape, and location data.

●Predictions were then used to recommend and optimize patient treatment plans by medical professionals.

●Data pulled from an internal SAS database.

●CT scan image data analyzed using Convolution Neural Networks (CNNs).

●Dataset consisted of an even split of 743 COVID and non-COVID images from the medical office.

●Images standardized into 64x64 flattened matrices with a 70%/30% Train/Test split.

●Generated Visualizations from SAS data using R: https://preethamvignesh57.wixsite.com/mysite/covid-19

●Generated Tableau visualization of Covid-19 Data: https://preethamvignesh57.wixsite.com/mysite/tableau

Senior Data Scientist

NASA, 2015-2019

Irvine, California

Heavy rainfall prediction is a major problem for the meteorological department as it is closely associated with the economy and daily life. For this project I developed several time-series models to predict rainfall across several distinct regions using a large amount of historic data from 1901-2015. These predictions were then additionally used to generate advance warnings for natural disasters like floods and drought across the globe.

●Used Python and IDL to retrieve the historical hierarchical format (HDF5) and clean data prior to implementing and model training.

●Wrote functions to perform pre-processing to impute the missing values using a linear interpolation technique.

●Normalization of features in the data to reduce noise and maximize signal-to-noise ratio.

●Feature reduction using Principal Component Analysis (PCA) to minimize the data

●Data stationarity validated with the Dickey-Fuller test.

●The Multiple Linear Regression and ARIMA models used for rainfall prediction.

●Strong seasonality favored ARIMA performance.

●Final prediction accuracy is 80.67% and the F-measure value is 0.88 for estimating the efficiency of the model.

●Using ML techniques we can predict future long-term trends using historical datasets.

●Comprehensive reports and documentation written in LaTeX and subsequently presented to stakeholders.

Data Scientist

Bank of Canada, 2012 - 2015

Halifax, Nova Scotia

For this project an request evaluation system was developed to ingest customer requests from a diverse set of digital and handwritten sources which were then filtered based on urgency and forwarded to the appropriate department. Handwritten sources were pre-processed using Optical Character Recognition (OCR) techniques and then handed to the classification model in a hierarchical approach. Relevant department was identified using metadata such as form and request type determined by the source.

●Bayesian and KNN techniques along with Tesseract were compared for OCR applications based on model accuracy and speed.

●Models achieved 91%, 96%, and 97% accuracy respectively.

●OCR model performance evaluated on textual and MNIST datasets.

●Tesseract provided the most consistent OCR results and was used in the productionized solution.

●Sorting subsequently done through the training and testing of an artificial neural network.

●Data Cleaning, Imputation, Tokenizing: used python libraries (pandas, nltk, numpy, Keras) to clean and prepare the data for analysis.

●Urgency identified using a Natural Language Processing-based classifier.

●Final Bidirectional LSTM model achieved 85% test accuracy for identifying urgent vs. non-urgent.

●Production model deployed to a flask API for use by the business.

Data Analyst, ML SME

Nippon News Network, 2008-2012

Tokyo, Japan

Predicted air temperatures are a key factor not only for individual consumers of news, but especially for agricultural, ecological, environmental, and industrial sectors. Thus, In order to get a competitive edge over other networks a model to better estimate air temperatures was developed. To accomplish this, Machine learning tools were implemented over the entire workflow of weather prediction. The resulting estimates of temperature for monthly, daily, and hourly forecasts showed on average a 4.6% improvement over the previous standard.

●Attributes considered included max temperature, min temperature, humidity, pressure, and wind speed.

●Post-processing data analysis of completed self-consistent calculations done with Python and IDL.

●Several models were constructed to compare performance and computational cost and select the one which best met the needs of the client.

●Used a variety of Linear Regression, KNN Regression, and Support Vector Machine (SVM) techniques.

●SVM Demonstrated best and most robust performance

●The performance and evaluation of the models are measured by its mean absolute error (MAE), root-mean-squared-error (RMSE).

●In addition to these error measures, percentages errors have been calculated as well during the evaluation in the forecast domain.

●Finalized model was then handed over to Android and iOS app developers along with web developers to create a user front-end.

Education

Ph.D. in Physics

Indian MST Radar Data analysis

Sri Venkateswara University

Tirupati, Andhra Pradesh, India

Masters of Science in Physics

Sri Venkateswara University

Tirupati, Andhra Pradesh, India

Bachelor of Science in Mathematics, Physics, and Statistics

Sri Venkateswara University

Tirupati, Andhra Pradesh, India

Contact this candidate