Post Job Free
Sign in

Data Scientist

Location:
Houston, TX
Posted:
February 13, 2017

Contact this candidate

Resume:

Shujiao Huang, PhD

+1-240-***-**** j *******.*@*****.***

SUMMARY

Statistics PhD with skills and experience in theoretical model building, proof, and testing using both simulated and real data. Strong understanding of statistical methodology, data analysis, machine learning, and statistical consulting.

EDUCATION

PhD in Mathematics, 3.90/4.00, University of Houston, TX Aug 2014 - Dec 2016 MSc in Statistics, 3.91/4.00, Georgia Southern University, GA Aug 2012 - Jul 2014 Meritorious Graduate Student, Apr 2014

BSc in Statistics, 91.69/100, Beijing Institute of Technology, Beijing Aug 2008 - Jun 2012 Premium Student, Aug 2010

BSc in Economics, Peking University, Beijing Aug 2010 - Jul 2012 RELEVANT EXPERIENCE

Graduate Assistant, University of Houston, TX Aug 2014 - Dec 2016

Addressed the identifiability problem in correlatedmultivariate linear model by developing a two- stage smoothing semi-parametric regression method, which gives unique and better estimation than regular parametric and nonparametric model.

Proved the large sample convergence theorem of the smoothing cohort model estimates and tested against simulated and real mortality/incidence rate data.

Consulted statistical methods to life science researchers through regular meetings and discus- sions, collaborated on research paper, built and tested models.

Recognized 5 significant genes out of 24,000 genes by applying variable selection methods and cox proportional model in genomic/microarray data.

Conducted data transformation, normalization, and visualization with R and feed that data to SAS for a supervised hierarchical clustering analysis. Graduate Assistant, Georgia Southern University, GA Aug 2012 - Jul 2014

Proposed novel distributions to enhance their capability to fit diverse lifetime data.

Improved the maximum likelihood estimation efficiency by 14 times using algorithms that ini- tialize the parameter values, calculate the model statistics, and evaluate the fit automatically, which significantly saved time from 7 months to 2 weeks.

Examined the performance of various newly proposed distributions and evaluated root mean square errors by Monte Carlo simulation.

Completed model evaluation on real data by hypothesis testing and compared survival and haz- ard curves with fitted curves.

Data Mapping Assistant, Amazon Inc., Beijing Apr 2012 - Jul 2012

Matched and compared data sets with given comparison guidelines to analyze potential and ex- isting competitors.

Conducted competitor analysis and prioritized mappings for Amazon’s pricing decision.

Worked closely with retail team to identify mapping issue, reviewed mapping guideline.

Communicated with global team for trouble shooting. COMPUTER SKILLS

Proficient with R, SAS, Matlab, Python, SQL, Latex, C, SPSS, etc. Comfortable with Mac OS, Windows and Unix operation systems. Shujiao Huang, 2

SELECTED PRESENTATIONS

(Invited) Huang, S., and Fu, W., Alternative Approach to the Identifiability Problem. ICSAApplied Statistics Symposium, Atlanta, GA Jun 2016

Huang, S., and Fu, W., The Selection of the Constraint for Smoothing Cohort Model. Joint Statis- tical Meeting, Chicago, IL Aug 2016

Huang, S., and Oluyede, B. O., Exponentiated Kumaraswamy-Dagum-Weibull and Related Dis- tribution. Graduate Seminar, Geogia Southern University, GA May 2013 SELECTED PUBLICATIONS

– Generalized Distributions - to fit real-world data, such as product survival time, price, hospital re- mission rate, etc., for reliability and survival analysis.

Huang, S., and Oluyede, B. O., Exponentiated Kumaraswamy-Dagum Distribution with Appli- cations to Income and Lifetime Data, Journal of Statistical Distributions and Applications, 1(1), 1-20, 2014.

Oluyede, B. O., Huang, S., and Pararai, M., A New Class of Generalized Dagum Distribution with Applications to Income and Lifetime Data, Journal of Statistical and Econometric Methods, 3(2), 125-151, 2014.

Oluyede, B. O., and Huang, S., Estimation in the Exponentiated Kumaraswamy Dagum Distri- bution with Censored Samples, Electronic Journal of Applied Statistical Sciences, 8(1), 122-135, 2015.

Oluyede, B. O., Huang, S., and Yang, T., A New Class of Generalized Modified Weibull Distribu- tion with Applications, Austrian Journal of Statistics, 44, 45-68, 2015.

Oluyede, B. O., Mutiso, F., and Huang, S., The Log Generalized Lindley-Weibull Distribution with Application, Journal of Data Science, 13(2), 281-310, 2015.

Oluyede, B. O., Elbatal, I., and Huang, S., Beta Linear Failure Rate Geometric Distribution with Application, Journal of Data Science, 14, 317-346, 2016.

Oluyede, B. O., Motsewabagale, G., Huang, S., Warahena-Liyanage, G., and Pararai, M., The Dagum-Poisson Distribution: Model, Properties and Application, Electronic Journal of Applied Statistical Analysis, 9(1), 169-197, 2016.

Huang, S., and Oluyede, B. O., The McDonald Log-logistic Distribution with Applications to Lifetime and Pricing Data, Journal of Probability and Statistical Science, 14(2), 123-139, 2016.

Oluyede, B. O., Huang, S., Basele, G., andMakubate, B., ANewClass of Generalized Log-logistic Weibull Distribution: Theory, Properties and Applications, Journal of Probability and Statistical Science, 14(2), 171-201, Aug 2016.

Oluyede, B. O., Foya, S., Warahena-Liyanage, G., and Huang, S., The Log-logistic Weibull Dis- tribution with Applications to Lifetime Data, Austrian Journal of Statistics, 45, 43-69, 2016.

Oluyede, B. O., Shi, X., Odubote, O., and Huang, S., Weighted Generalizations of the Generalized Rayleigh Distribution with Applications, Journal of Probability and Statistical Sciences, (in press), 2016.

Mutiso, F., Huang, S., and Oluyede, B. O., Theoretical Properties of the Generalized Lindley- Weibull Distribution with Application, Journal of Data Science, (in press), 2014.

– Microarray Data Analysis - to identify significant genes that have stronger relationship with DNA repairs to predict clinical outcome of cancer patients.

Huang, S., Tu, W., Ju, Z., Poage, G, Brewster, A., Lin, S., Mills, G., Wang, H., and Peng, G., A five-gene signature inferred from transcriptome profiling of homologous recombination-mediated DNA repair predicts clinical outcome of patients with carcer, (In press - Jacobs Journal of Biomarkers).

– Semi-parametric Linear Regression Model - to address the identifiability problem in correlated vari- ables. Variance study for parameter constraints on analysis of variance (ANOVA) model.

Huang, S., and Fu, W., A Smoothing Model and Its Asymptotics, (in process), 2016.

Gao, K., Huang, S., and Fu, W., Parameter Constraints and Impact on Parameter Estimation in ANOVA and Age-period-cohort Medels, (in process), 2016.



Contact this candidate