Sign in

Computer Science Data

San Ramon, California, United States
October 07, 2016

Contact this candidate

Simi YARI (Somaye)

Ph.D. in Computer Science, Data Scientist/Machine Learning Eng.


Python: Pandas, NumPy, scikit-learn, SciPy, Seaborn, BeautifulSoup, GraphLab, NLTK, Boto, StatsModels, PySpark, PyMongo, Theano, Tensorflow

Spark (MLlib, GraphX, SQL, Streaming), C++/C, MATLAB, LateX, Git, Flask (Familiar), Java (Familiar), Matlab

Databases: SQL, NoSQL(MongoDB)

Technologies: MapReduce, AWS EC2 & S3, Docker (Kitematic), VM (Vagrant), Hadoop

Machine Learning: Linear regression, Logistic regression, Neural networks, Unsupervised learning, Clustering, Regularization, Support vector machines, Recommendation systems, Decision trees, Random forests, AdaBoost, KNN, XGBoost

Deep Learning, Natural Language Processing

Probability Theory and Statistics, A/B Testing, Web Scraping.

Fault-Tolerance techniques, Erasure Coding for Big Data

Signal Processing and Information Theory

Algorithms, Data Structures

Linear Algebra, Graph Theory, Algebra, Number Theory, Combinatorial Analysis Experience

Galvanize Feb 2016-Current

Data Scientist in Residence San Francisco, CA

Giving lectures, teaching and mentoring Master students of Galvanize University for a two-month course in data engineering which covers Advanced SQL, Linux, AWS, Docker, Advanced Spark, NoSQL, MapReduce

Completed Galvanize data science immersive program with project-based curriculum that focused on data processing, machine learning and data visualization

Patent search tool:

Worked with 300 GB of XML patent data obtained from USPTO and created a MongoDB database. Using Spark, Python and NLP on AWS (EC2) created a patent search tool which can be used to invalidate new claims

Loan predictor: Applied NLP analysis to classify models and find the most important factors that affect loan interest by Random Forest, Ada Boost and regression models.

Scraped/processed NYT articles and used K-Means clustering to discover underlying themes/topics.

Built movie recommender using matrix factorization on ratings dataset to predict new movie ratings.

Classified credit card fraud using a Random Forest Classifier that focused on optimizing recall.

Used MongoDB to accept JSON pings and output new Fraud predictions on a web-app interface.

Predicted user churn at a ride-sharing company using a Gradient Boosted classifier. Western Digital (WD) Oct 2013- June 2015

Staff Engineer Irvine, CA

Failure/error analysis using sum-product networks (deep learning) - Fault-Tolerance techniques: Designed advanced algorithms to improve fault-tolerance of big data analytics jobs (Patent disclosure submitted). Analyzed failures in data written in flash memories. The algorithms (sum-product and message passing algorithms) were developed using C++ and Python for several projects.

Oregon State University July 2011-July 2012

Research Scholar Corvallis, OR

Failure/error analysis to improve frame error rate of flash memories carried out using mathematical and statistical methods, C++, MATLAB and Python University of Bergen Oct 2009-Dec 2012

Research Fellow Bergen, Norway

Designed and developed sum-product neural networks that generates codes to correct unbalanced errors in flash memories using C++, Python and MATLAB Contact

(541) ***-****

San Francisco, CA



Ph.D. Computer Science,

GPA: 3.8/4, University of

Bergen, Norway, 2012.

Research Scholar,

Computer Science, Oregon

State University, Corvallis,

Oregon, 2011

M.S. Applied Mathematics,

GPA: 18.8/20, Isfahan

University of Technology, Iran,


B.S. Mathematics,

GPA: 17.2/20, Isfahan

University of Technology, Iran,



Some codes correcting

unbalanced errors of limited

magnitude for flash memories

S Yari, T Kløve, B Bose

IEEE transactions on

information theory 59 (11),


Have published 8 papers, 4 in

IEEE, with > 70 citations. A

list of published articles can

be reached at Google Scholar

Honors and


Patent disclosure, lead

inventor: "Generalized LRC

structure for archival

storage", WD, 2015

Western Digital Award for an

innovative concept on erasure

coding for big data

Ph.D. Research Fellowship,

University of Bergen, Norway

Best MS thesis Award among

all Master’s programs at

Isfahan University of


Ranked first among all MS

students of the department of

applied mathematics, Isfahan

University of Technology

Contact this candidate