Baoqiang Cao
Research Associate
The Institute for Computational Engineering and Sciences
University of Texas at Austin, Austin, TX 78721
Email: *********@*****.***
URL: http://baoqiang.org/
Phone: 512-***-****
OBJECTIVE:
I am looking for a position with strong emphasis on developing machine learning algorithms to recognize and predict patterns.
EMPLOYMENT AUTHORIZATION:
I am a permanent resident.
DATA MINING AND STATISTICAL MODELING SKILLS
Skills summary:
● Supervised learning: k-NN, k-means, hidden Markov models, linear regression, artificial neural networks
● Unsupervised learning: hierarchical clustering
Projects that I was/am fully in charge of:
Design a convex optimization with linear inequlities approach to predict protein-RNA binding
Formulate the problem and then apply solver to train the model
Developed web crawling scripts to collect RNA-protein binding records from PubMed
Parse html files and get the reference link
Edit the links and analysis the papers directed by the link
Developed and analyzed protein sequence evolutionary networks
Built the directed and weighted flow networks for protein evolution
Analyze network properties, for example largest connected component
Analyze the dynamics of the networks and predict the evolutionary direction
Developed statistical learning models to predict classification of residues in membrane proteins
Trained and evaluated using cross-validation the following models Neural networks models, Support vector machine classification, Linear discrimination analysis to predict transmembrane domains
Developed models to predict relative lipid accessibility
Used support vector machine regression to learn and predict the relative lipid accessibility which is casted as a regression problem.
Analyzed data and built models to cluster genes from various databases
Combine Bayesian clustering and k-NN clustering
Cluster genes that are co-expressed with highly altered DNA copy numbers in breast cancer patients
Developed a hidden Markov model to predict the nuclesome position in various genomes
Co-designed server in Linux for public to use the method to predict topology of membrane proteins
MINNOU (http://minnou.cchmc.org)
COMPUTER LANGUAGES AND SYSTEMS:
C/C++:
Worked on several projects to do large scale simulation(to name a few):
Built a directed and weighted graph to understand protein networks
Built and trained a hidden Markov model to predict the nucleosome position in genome
PERL:
Developed modules to do sophisticated machine learning and pattern recognition
Prepare data for Neural networks model, SVM classification, linear regression, and hidden Markov models, train each model with cross- validation and parse the results
Co-developed online server: [http://minnou.cchmc.org]
Built the final predictor based on trained Neural networks models so that it was integrated with an online interface
Developed codes to parse and test web based applications
Constantly develop codes for paring different datasets with different customized criteria
R:
Developed packages to do statistical analysis and machine learning from massive data or web based data
Used machine learning functions in R: Support vector machine, Neural networks, k-nn,k-means, and linear regression
Contributed to open source R project (“bio3d”)
Modified one tiny function in bio3d
Mixture Perl+R:
Developed programs to parse data in text files or online and call R interface to do statistics, or use R to collect and parse data and Perl to do modeling
FORTRAN:
Developed several programs to modeling thermal dynamic properties of materials
Mostly FORTRAN 77
Matlab:
Developed codes to solve convex optimization problems
Linux/Unix:
Daily use
EDUCATION:
● Ph.D., major in Physics, University of Cincinnati, Cincinnati, OH (10/2001 –08/2006)
● M.S., major in Theoretical Condensed Matter Physics, Nanjing University (thesis) and Northwest University (certificate), China (02/1998 -- 07/2000)
● B.S., major in Physics, Northwest University, Xi’an, China (09/1994 -- 02/1998)
EXPERIENCE:
● 04/2009-present, Research Associate, University of Texas at Austin
● 12/2007-03/2009, Postdoctoral Research Fellow, University of Texas at Austin
● 09/2006-12/2007, Postdoctoral Research Associate, University of Nebraska-Lincoln
● 07/2005-08/2006, Research Assistant, Cincinnati Children’s Hospital Medical Center in conjunction with University of Cincinnati
● 08/2003-09/2003, Research Assistant, Oak Ridge National Lab in conjunction with University of Cincinnati
● 10/2001-06/2005, Teaching/Research Assistant, University of Cincinnati, Ohio
● 07/2000-10/2001, Lecturer, Northwest University, Xi’an, China
● 02/1998-07/2000, Research Assistant, Nanjing University, Nanjing, China
PUBLICATIONS:
● Jiawei Ling, C Fang, Y Xu, G Zhuang, Baoqiang Cao, “Evaluation of the fidelity of multiple displacement amplification from small number of cells”, Zhonghua Yi Xue Yi Chuan Xue Za Zhi. 2010 Feb 10;27(1):42-6. Chinese
● Baoqiang Cao and Ron Elber, “Computational exploration of the network of sequence flow between protein structures”, Proteins: Structure, Function, and Bioinformatics, Vol. 78 Issue 4, (p 985-1003), 2010.
● Baoqiang Cao, Michael Wagner, and Jaroslaw Meller, “Lipid Accessibility Prediction in Membrane Proteins”, in submission.
● Brinda Kizhakke Vallat, Jaroslaw Pillardy, Peter Májek, Jaroslaw Meller, Thomas Blom, Baoqiang Cao, Ron Elber, “Building and assessing atomic models of proteins from structural templates: Learning and benchmarks”, Proteins: Structure, Function, and Bioinformatics, Vol. 76, Issue 4 (p 930-945), 2009
● Jiawei Ling, Guanglun Zhuang, Barbra Tazon-Vega, Chenhui Zhang, Baoqiang Cao, Zev Rosenwaks, Kangpu Xu, “Evaluation of genome coverage and fidelity of multiple displacement amplification from single cells by SNP Array”, Molecular Human Reproduction, Vol.15, No.11 pp. 739–747, 2009.
● Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell, and Jaroslaw Meller, "Enhanced Recognition of Protein Transmembrane domains with Prediction-based Structural Profiles", Bioinformatics 2006 22(3):303-309.
● Baoqiang Cao, Changde Gong, Jun Li and Yongjun Liu, “Doping Dependence of In-plane resistivity and Hall Effect in Cuprate Superconductors”, Phys. Rev. B 62, 15237 (2000).
BOOK CHAPTER & CONFERENCE PAPER:
● Baoqiang Cao, Mario Medvedovic, and Jaroslaw Meller, "Prediction of Transmembrane Domains and Pore-facing Residues in Beta-barrel Membrane Proteins", Applications of Statistical and Machine Learning Methods in Bioinformatics;Series: Advances in Computational and Systems Biology, Vol 1 (eds. Meller J and Nowak W), Peter Lang Publishing Group (2007).
SERVICE: (MANUSCRIPT REVIEW)
Bioinformatics