Data Scientist/Analyst

Location:

Posted:

May 11, 2017

Resume:

OBJECTIVE: With * years of experience in data analysis in academic research, I am seeking a data scientist or analyst position in which I can use my problem solving skills, knowledge of mathematics and statistics, and experience with scientific computing to solve real word problems.

TECHNICAL EXPERTISE

Languages: Python (Jupyter notebook, Scikit-learn, Pandas, Numpy, Seaborn, SQLAlchemy, BeautifulSoup), R, MatLab, SQL, Familiar with Big data analysis with Hadoop and Map Reduce, Familiar with Github version control

Data Visualization: (Matplotlib, ggplot2, Tableau, Bokeh)

Algorithms: Machine learning, predictive Modeling, Linear/Logistic/Lasso Regression, Decision Trees, Random Forest, Support Vector Machines, Natural Language Processing.

Skills and Proficiencies: Data Science, Advance mathematics, Statistics, Scientific Computing, Data Presentation, Academic Research

Software: Origin, SolidEdge, Mathematica, Latex

EDUCATION

PhD in Physics Aug 2008 - May 2014

Tulane University, New Orleans, LA

MS in Physics Aug 2008 - May 2010

Tulane University, New Orleans, LA

M.Sc in Physics Aug 2003 - May 2006

Tribhuvan University, Kathmandu, Nepal

B.Sc. in Physics Aug 2000 - May 2003

Tribhuvan University, Kathmandu, Nepal

EXPERIENCE

POST DOCTORAL RESEARCH SCIENTIST June 2014 - Feb 2017 National Institute of Standards and Technology (NIST), Gaithersburg

Data Collection: Data collection from photo diode detectors, temperature, electric and hall probe sensors, image data collections from electron microscopes (collected about 5 MB data from various measurement everyday).

Data Cleaning: Removing outliers, applying median filter, averaging to smooth data (using R, Matlab).

Data Analysis: Extraction of physical observables: fitting various models, uncertainty analysis, calculations on daily basis (using Python, R, Matlab, Origin).

Data Visualization: scatter, line, contour plots, bar charts on daily basis to explore the relationship between different variables (500 graphs every week using Origin, Python, Matlab).

Project Outcomes: Published 1 first-authored paper in peer reviewed journal and presented in 1 scientific conference.

RESEARCH ASSISTANT Aug 2008 - May 2014

Tulane University (Graduate Student)

Data Collection: Data collection from x-ray detector, electric, magnetic and temperature sensors collected about (3-12) MB data from various measurement everyday).

Data Cleaning: Removing outliers, applying median filter, averaging to smooth data (Using Matlab).

Data Analysis: Developed various models to address the disorder induced quantum interference effects, extraction of physical observables by fitting these models, uncertainty analysis, calculations on daily basis (Using Matlab, Origin).

Data Visualization: scatter, line, contour plots, bar charts on daily basis to explore the relationship between different variables (using Matlab, Origin).

Project Outcomes: Published 3 first-authored and 8 co-authored papers in peer reviewed journal and presented in about a dozen scientific conference.

INDEPENDENT PROJECTS

Natural Language Processing Project in Python: Spam/Ham classification 02/01/2017- 02/10/2017

Using the labeled ham and spam examples, trained a machine learning model (naive Bayes classification) to learn to discriminate between ham/spam automatically from a data set containing a collection of over 5000 SMS phone messages.

Then, with the trained model, classified arbitrary unlabeled messages as ham or spam.

The Python code for this project can be viewed at;

https://github.com/Punam-Silwal/Python/blob/master/Natural_Language_Processing.ipynb

Machine Learning algorithms in Python using Scikit Learn: 12/15/2016 - 12/30/2016

Explored various machine learning algorithms to identify which one is best suited to classify the unknown iris flower on the Iris data set.

Based on accuracy score, Linear Discriminant Analysis provides 98.3% and K-nearest neighbors (with k-value 16) provides 98.35% accuracy compared to Logistic Regression (95%), Naive Bayes (93.3%), DecisionTree Classifier (95%), Support Vector Machine (95%), both K-nearest neighbors and Linear Discriminant Analysis are suitable algorithms for this classification problem.

The Python code for this project can be viewed at;

https://github.com/Punam-Silwal/Python/blob/master/Project_1_Iris_Data_set.ipynb

Machine Learning Project: Franchise Profit Prediction by a Linear 05/01/2016 - 05/15/2016

Regression Model

Project description

• Developed a linear regression model to predict profit for a restaurant franchise's new outlet, providing the data for profits of the chain's old outlets in various cities and populations of the cities.

• Determined the parameters of the regression model by applying a gradient descent algorithm on the given data set such that the regression cost function was minimized.

• Predicted the new outlet's profit using the regression model with the determined parameters, providing the population of the city to expand.

• The programming code and data set for this project can be viewed at:

https://github.com/Punam-Silwal/Project_1_Linear_Regression

Machine Learning Project: Product Qualification Predicting by a 05/16/2016 - 05/30/2016

Logistic Regression Model

Project description

• Developed a logistic regression model to predict whether product of a factory passed quality assurance testing, providing the data of test results on past microchips.

• Applied a gradient descent algorithm using Matlab to calculate the parameters of the model, which minimized the cost function of the model with regard to the training data set.

• Added in regularization terms to avoid the risk of over-fitting; achieved an increase of 5% in prediction accuracy on the cross-validation data set comparing to the model without regularization.

• Programming code and data set for this project can be viewed at:

https://github.com/Punam-Silwal/Project_2_Logistic_Regression

Machine Learning Project: Spam Classification by using 06/01/2016 - 06/15/2016

Support Vector Machines (SVM)

Project description

• Trained an SVM model which classified emails into spam and non-spam email, given 4000 spam or non-spam emails as the training data set.

• The trained SVM classifier achieved a training accuracy of 99.8% and a test accuracy of 98.5%. Programming code and the data set for this project can be viewed at:

https://github.com/Punam-Silwal/Project_4_Spam_Classification_by_SVM

Contact this candidate