Zhang, Ru
*************@*****.***
To Hiring Manager,
I am interested in applying for the position of data scientist at Incorporan Inc. Montreal. The enclosed resume lists my relevant working experience, selected projects and research publications. I would like to work with statistical models and machine learning algorithms for challenging industrial applications. If you have a moment, I would be delighted to ask you some questions about the type of projects which your institution works on.
Please feel free to contact me by email or telephone if you have any question about my resume. I am also available for a face-to-face interview in Montreal. I look forward for your response. Sincerely,
Ru Zhang
Zhang, Ru Email : *************@*****.***
Mobile : 613-***-**** Address: 115 Wright Crescent, Kingston, ON, Canada Education
Queen’s University Kingston, ON
Ph.D. in Statistics; GPA: 4.17/4.30 Sep. 2014 { Present
Nankai Unversity Tianjin, China
M.S. in Statistics Sep. 2010 { June. 2013
Dongbei University of Finance and Economics Dalian, China B.S. in Applied Mathematics Sep. 2006 { June. 2010 Experience
Queen’s University Kingston, ON
Teaching & Research Assistant Sep. 2014 - Present
Research Assistant: Ph.D. research in statistics focusing on modeling, analysis and the inverse problem of dynamic computer experiments with large-scale data set.
Teaching Assistant: Tutorials, assignment marking and term marking for statistical and mathematical courses.
Annoroad Gene Technology Beijing, China
Data Scientist Mar. 2014 - Sep. 2014
R & D: Machine learning algorithms for preimplantation genetic screening of the Down syndrome.
China First Heavy Industries Tianjin, China
Software Engineer June. 2013 - Mar. 2014
C/C++ development: numerical libraries for solving di erential equations in the process control system of hot rolling machinery.
Projects
Sequential Design for the Inverse Problem in Dynamic Computer Experiments Sep. 2017 - Feb. 2018
Description: a new sequential design algorithm and expected improvement criterion for solving inverse problems in dynamic computer experiments which achieves signi cantly higher accuracy than existing alternatives. The convergence of this algorithm is proved by the theory of reproducing kernel Hilbert space (RKHS).
Keywords: expected improvement, sequential design, RKHS
Languages: R and C
GitHub Repo: https://github.com/heavenmarshal/l2inv
Corporacio n Favorita Grocery Sales Forecasting
Dec. 2017 - Jan. 2018
Description: Kaggle challenge of forecasting the sales of 4000+ grocery items in 54 chain stores between Aug. 16, 2017 to Aug. 31, 2017. The submitted solution consists of 80+ handcrafted features including moving average, periodic mean and standard deviation, promotion and holiday information. Ensemble of gradient boosting models using LightGBM and xgboost is applied.
Result: silver medal (top 3%, ranks 43 of 1675 teams)
Keywords: machine learning, sale forecasting, time series
Languages and Tools: R, C, data.table, dplyr, LightGBM, SQLite and xgboost
Local Approximate SVD-based Gaussian Process (GP) Models Mar. 2016 - Aug. 2017
Description: a new algorithm for selecting neighborhood sets and tting local SVD-based GP models to address the infeasibility of the classic GP on large-scale dynamic experiments. The algorithm is parallelized via R package
\parallel" and OpenMP. A stable version named \DynamicGP" has been published on CRAN.
Keywords: Gaussian processes, nearest neighborhood, parallel programming
Languages: R, C and C++
GitHub Repo: https://github.com/heavenmarshal/lasvdgp
Quora Question Pairs
May. 2017 to Jun. 2017
Description: Kaggle challenge of identifying identical questions posted on Quora in order to merge similar entries. The training set consists of 400K+ question pairs, with tags 1 (identical) or 0 (di erent). The submitted solution uses 40+ handcrafted features plus 200 automatically selected bag of words features. Classi cation is made by gradient boosting models (xgboost).
Result: bronze medal (top 6%, ranks 168 of 3307 teams)
Keywords: classi cation, natural language processing
Language & Tools: Python, NLTK, scikit-learn, Stanford CoreNLP and xgboost
Field-aware Factorization Model Training on Hadoop HDFS May. 2017
Description: transform the lib m data I/O interface to allow directly reading training data from and writing predictions to Hadoop HDFS. Accelerate the work
ow of model training.
Language and Tools: C, Hadoop and HDFS
Research Items
Zhang, R., Lin, C. D., Ranjan, P. (2018) \Local Gaussian Process Model for Large-scale Dynamic Computer Experiments", accepted by Journal of Computational and Graphical Statistics.
Zhang, R., Lin, C. D., Ranjan, P. (2018) \A Sequential Design Approach for the Inverse Problem in Dynamic Computer Experiments" Manuscript.
Skills
Programming Languages: R, C/C++, Python
Operating System: Linux
Machine Learning Libraries: xgboost, LightGBM, NLTK, scikit-learn
Numerical Libraries: BLAS, LAPACK, Eigen
Data Wrangling: data.table, dplyr, pandas, SQLite, MySQL
Text Editing: Emacs, LATEX
Parallel Programming: OpenMP, R package ’parallel’
Version Control: git
Links
My GitHub: https://github.com/heavenmarshal
My Kaggle Pro le: https://www.kaggle.com/heavenmarshal