Sign in

Computer Science Data

San Francisco, California, United States
January 27, 2018

Contact this candidate

Victor V. Solovyev, Ph.D.

Curriculum Vitae and List of Publications


**** *********** **., #***, *** Francisco, CA 94133, tel.650-***-****

Decades of academic and industrial research and development in computer science, data analytics, computational biology and genomics. Leading author of popular genome analysis and annotation pipelines as well as pipelines for analysis of next generation sequencing data. Hands on experience in data analysis software development including application of machine learning, data analytics, deep learning approaches: Python libraries - NumPy, SciPy, Pandas, SciKit-Learn, Keras, Tensor Flow, Theano, Apriori. Programming in Java, Python, C/C++, Objective-C, Fortran, R. Using MIT star cluster and Amazon cloud (AWS). Working knowledge of mobile application programming for Android (Java) and iPhone (Objective C). Google full list of publications and their >19000 citations: (H-index: 43): Work Experience:

2015–current Chief Scientific Officer, Softberry Inc., USA (

§ Leading research-oriented software development teams focused on bio-medical data analysis using AWS cloud or computer clusters.

§ Applying convolutional neural networks and other machine learning approaches for genome functional patterns identification

§ Building pipelines for next generation data analysis to discover novel gene isoforms, genetic variations, variation in the expression level and biomarkers useful for disease detection and classification, patient stratification, treatment response prediction. 2013 -2015 Professor of Computer Science, Computer, Electrical and Mathematical Sciences and Engineering Division, KAUST, KSA

§ Applying machine learning approaches for extracting significant features important for modeling, design and engineering of genes and pathways, biomarkers discovery, biofuel production.

§ Building software for genome and protein pathways annotations, modeling genetic networks, study genome functional regions and compiling databases of genomic information.

§ Developing cluster and cloud computing applications for high-throughput NGS data analysis.

§ Teaching (postgraduate courses): Introduction to Computational biology and Algorithms in Bioinformatics

2003 -2012 Professor of Computer Science, Department of Computer Science, Royal Holloway, University of London.

§ Statistical analysis of genome, transcriptome and proteome data

§ Developing databases of genomic information

§ Building software pipelines to support next generation sequencing technologies and developing new algorithms for gene finding, promoter prediction, SNP detection, estimation of SNP effects and selection disease specific SNP sub-sets.

§ Teaching (undergraduate and postgraduate courses): Neural Networks, Software Engineering, Biomedical Informatics, Bioinformatics, Computational biology 2003 -2003 Genome Annotation Group Leader, Joint Genomic Institute, Lawrence Berkeley National Lab, USA

§ Leading a group of researchers, biologists and software developers to build pipelines for identification of genes and other genome functional elements in genomic sequences

§ Applying computational tools for annotation of new genomes. 1999 -2003 Director of Bioinformatics, EOS Biotechnology, South San Francisco

§ Managing bioinformaticians and programmers to create a system for selection genes and microarray probs for Affymetrix cheap design

§ Analysis of gene expression data to identify drug target candidates. 1997 -1999 Computational Genomics Group Leader, Bioinformatics Division, The Sanger Centre

§ Leading a group of researchers to develop gene identification algorithms

§ Developing databases to support sequencing and analysis of Human genome. 1995 -1997 Computational scientist, Department of Computational biology, Amgen Inc., Thousand Oaks

Developing pipelines for analysis of EST and protein sequences to select potential drug target candidate proteins

1992 -1995 Assistant professor/instructor), Department of Cell Biology, Baylor College of Medicine, Houston

1991 -1992 Visiting scientist, Supercomputer computation research institute, Florida State University, Tallahassee

1985 -1992 Head of computer analysis of biopolymers group/research scientist at the Institute of Cytology and Genetics, Novosibirsk

Education Background:

• PhD, Genetics, Institute of Cytology and Genetics, Novosibirsk, Russia

“Computer analysis of biopolymers”

• Physics, BSc, Novosibirsk State University, Russia Editor of Mathematical Biosciences journal (2008 –2015). Programming Skills: C/C++, Objective-C, Java, Python, Fortran, R, SQL, HTML Other interests: developing cryptography and information security software; development of computer/mobile phone games.

Led the development of many widely used bioinformatics applications. More than a hundred algorithms implemented in pipelines, data viewers, machine learning and statistical analysis packages have been developed. Just in 2017 these software applications have been used in more then 2000 research publications (according to Google Scholar). Fgenesh program along has been used/cited in

~ 4000 scientific publications.

Participated in organization of many international conferences including "Networks and data mining" (school of advance sciences) Luchon, France, July 2015; Chairman of Bioinformatics section of the 6th Annual World DNA & Genome Day 2015 (WDD-2015); Program committee member: Computational Systems Bioinformatics International Conference, Stanford, USA (2005 – 2010); the First International Conference on Advances in Bioinformatics and Applications (BIOINFORMATICS 2010-211, Mexico/Italy), International Conference on Intelligent Systems for Molecular Biology (ISMB2006, Brazil), DIMACS Mini- Workshop on Gene-Finding and Gene Structure Prediction (1995, USA) Bibliography: author/co-author of more than 100 research publications: Selected publications:

Umarov R.K, Solovyev V.V. (2017) Prediction of Prokaryotic and Eukaryotic Promoters Using Convolutional Deep Learning Neural Networks. PLoS ONE, Shahmuradov I., Umarov R., Solovyev V. TSSPlant: a new tool for prediction of plant Pol II promoters. (2017) Nucleic Acids Research, doi: 10.1093/nar/gkw1353 Boudellioua I, Saidi R, Hoehndorf R, Martin MJ, Solovyev V (2016) Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining. PLoS ONE 11(7): e0158896. doi:10.1371/journal.pone.0158896

Mansuelto et al. (2016) Rice SNP-seek database update: new SNPs, indels, and queries. Nucl Acids Res, doi: 10.1093/nar/gkw1135

Shahmuradov I., Solovyev V. (2015) Nsite, NsiteH and NsiteM Computer Tools for Studying Transcription Regulatory Elements. Bioinformatics, doi: 10.1093. Allam A., Kalnis P., Solovyev V. (2015) Karect: Accurate error correction of sequencing reads based on multiple alignment. Bioinformatics, 31(21):3421-3428. Abdel-Haleem et al. (2015) Genome Sequence of a Multidrug-Resistant Strain of Stenotrophomonas maltophilia with Carbapenem Resistance. Genome Announc. 3(5). pii: e01166-15. doi: 10.1128/genomeA.01166-15.

Moroz et al. (2014) The ctenophore genome and the evolutionary origins of neural systems. Nature 510(7503): 109-114.

Elsik et al. (2014) Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics 15:86, 1-29.

Earl D. et al. (2014) Alignathon: a competitive assessment of whole-genome alignment methods.

Genome Res. 24 (12):2077-2089.

Steijger et al. (2013) Assessment of transcript reconstruction methods for RNA-seq. Nature Methods 10, 1177–1184

Engström et al. (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nature Methods 10, 1185–1191.

Solovyev V, Salamov A. (2011) Automatic Annotation of Microbial Genomes and Metagenomic Sequences. In Metagenomics and its Applications in Agriculture, Biomedicine and Environmental Studies (Ed. R.W. Li), Nova Science Publishers, p.61-78. Earl D. et al. (2011) Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Res. published online September 16. Solovyev V., Tatarinova T. (2011) Towards the integration of genomics, epidemiological and clinical data. Genome Medicine, 3, 48, 1 -3.

Solovyev VV, Shahmuradov IA, Salamov AA. (2010) Identification of promoter regions and regulatory sites. Methods Mol Biol. 674, 57-83.

The Nasonia Genome Working Group, et al. (2010) Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species. Science 327, 343-348. International Aphid Genomics Consortium. (2010) Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol. 8(2):e1000313.

The Bovine Genome Sequencing and Analysis Consortium. (2009) The Genome sequence of taurine cattle: A window to ruminant biology and evolution. Science, 324, 522-528. Mayako Michino et al., (2009) Community-wide assessment of GPCR structure modelling and ligand docking. Nature Reviews Drug Discovery 8, 455-463. Coghlan A. et al. (2008) nGASP - the nematode genome annotation assessment project. BMC Bioinformatics 2008, 9:549.

Richards S. et al., (2008) The genome of the model beetle and pest Tribolium castaneum. Nature. 452 (7190), 949-55.

Velasco R. et al (2007) A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety. PLoS ONE 2(12): e1326. Solovyev V.V. (2007) Statistical approaches in Eukaryotic gene prediction. In Handbook of Statistical genetics (eds. Balding D., Cannings C., Bishop M.), Wiley-Interscience; 3d edition, 1616 p.

Sodergen at al. (2006) The genome of the sea urchin Strongylocentrotus purpuratus. Science, 314(5801), 941-952.

Weinstock et al. Insights into social insects from the genome of the honey bee Apis mellifer. Nature (2006), 433, 7114, 931-949.

Solovyev V, Kosarev P, Seledsov I, Vorobyev D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7, Suppl 1: P. 10.1-10.12. Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL.

(2006)Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol. 7, Suppl 1, p. 3.1-3.13.

Shahmuradov I, V. Solovyev and A. J. Gammerman (2005) Plant promoter prediction with confidence estimation. Nucleic Acids Research 33(3):1069-1076. Collins et al. (2004) Finishing the euchromatic sequence of the human genome. Nature 431

(7011), 931-945

Grimwood J, Gordon LA, Olsen A, .., Salamov A., Solovyev V., Lukas S. (2004) The DNA sequence and biology of human chromosome 19. Nature, 428 (6982), 529-535. Gene W. Tyson, Jarrod Chapman, Philip Hugenholtz, Eric E. Allen, Rachna J. Ram, Paul M. Richardson, Victor V. Solovyev, Edward M. Rubin, Daniel S. Rokhsar, Jillian F. Banfield (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37 – 43. Michael Brudno, Alexander Poliakov, Asaf Salamov3, Gregory M. Cooper, Arend Sidow, Edward M. Rubin2, Victor Solovyev, Serafim Batzoglou, Inna Dubchak (2004) Automated Whole- Genome Multiple Alignment of Rat, Mouse, and Human. Genome Research Journal, 14(4): 685- 692.

Shahmuradov IA, Akbarova YY, Solovyev VV, Aliyev JA. (2003) Abundance of plastid DNA insertions in nuclear genomes of rice and Arabidopsis. Plant Mol Biol. 52(5): 923-934. Hild M, Beckmann B, Haas SA, Koch B, Solovyev V, Busold C, Fellenberg K, Boutros M, Vingron M, Sauer F, Hoheisel JD, Paro R. (2003) An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome. Genome Biol. 25

(1), R3.

Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV. (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics. 19(15):1964-1971. Solovyev VV, Shahmuradov IA. (2003) PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res. 31(13):3540-3545. Shahmuradov IA, Gammerman AJ, Hancock JM, Bramley PM, Solovyev VV. (2003) PlantProm: a database of plant promoter sequences. Nucleic Acids Res. 31(1): 114-117. Solovyev V.V. (2002) Finding genes by computer: probabilistic and discriminative approaches. In Current Topics in Computational Biology (eds. T.Jiang, T. Smith, Y. Xu, M. Zhang), in The MIT Press, p. 365-401.

Solovyev V.V., Shindyalov I.N. (2002) Properties and Prediction of Protein Secondary Structure. In Current Topics in Computational Biology (eds. T.Jiang, T. Smith, Y. Xu, M. Zhang), in The MIT Press, p. 201 – 248.

Solovyev V.V. (2002) Structure, Properties and Computer Identification of Eukaryotic genes. In Bioinformatics from Genomes to Drugs. V.1. Basic Technologies. (ed. Lengauer T.), p. 59 - 111. Burset M., Seledtsov I., Solovyev V. (2001) SpliceDB: database of canonical and non-canonical mammalian splice sites. Nucleic Acids Res., 29(1), 255-259. Salamov A., Solovyev V. (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10(4), 516-522.

Burset M., Seledtsov I., Solovyev V. (2000) Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res., 28(21), 4364-4375. Filippov V., Solovyev V., Filippova M., Gill S. (2000) A novel type of RNase III family proteins in eukaryotes. Gene, 245(1), 213-221.

Kolchanov N., Ponomarenko M., Frolov A,, Ananko E,, Kolpakov F, Ignatieva EV, Podkolodnaya OA, Goryachkovskaya TN, Stepanenko IL, Merkulova TI, Babenko VV, Ponomarenko YV, Kochetov AV, Podkolodny NL, Vorobiev DV, Lavryushev SV, Grigorovich DA, Kondrakhin YV,Milanesi L, Wingender E, Solovyev V, Overton GC (1999) Integrated databases and computersystems for studying eukaryotic gene expression. Bioinformatics, 15(7-8): 669-86. V.V., Salamov A.A. (1999) INFOGENE: a database of known gene structures and predicted genes and proteins in sequences of genome sequencing projects. Nucl.Acid Res., 27,248-250. Alexandrov N.N., Solovyev V.V.(1998) Statistical significance of ungapped sequence alignment. In Pacific Symposium on Biocomputing'98 (eds. Altman R.,Dunker K.,Hunter L., Klein T.), p.463- 472.

Agulnik A.I., Bishop C.E., Lerner J.L., Agulnik S.I., Solovyev V.V.(1997) Analysis of mutation rate in the SMCY/SMCX genes shows that mammalian evolution is male driven. Mammalian Genome 8, 134-138.

Salamov A.A., Solovyev V.V. (1997) Protein secondary structure prediction using local alignments. J. Mol.Biol, 268,1, 31-36.

Salamov A.A., Solovyev V.V. (1997) Recognition of 3'-end cleavage and polyadenilation region of human mRNA precursors. CABIOS 13, 1, 23-28.

Kelly R.L., Solovyeva I., Lyman L.M., Richman R., Solovyev V.V., Kuroda M. (1995) Expression of MSL-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell, 81,867-877.

Salamov A.A., Solovyev V.V. (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J. Mol.Biol. 247,1,11-15. Solovyev V.V., Salamov A.A. (1994) Predicting a-helix and b-strand segments of globular proteins. CABIOS 10, 6, 661-669.

Solovyev V.V., Salamov A.A., Lawrence C.B. (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl. Acids Res. 22, 24, 5156-5163.

Lawrence C.B., Solovyev V.V. (1994) Assignment of Position-Specific Error Probability to primary DNA Sequence Data. Nucl. Acids Res. 22, 7,1272-1280. Solovyev V.V. (1993) Fractal graphical representation and analysis of DNA and Protein sequences. BioSystems, 30, 137-160.

Solovyev V.V., Makarova K.S. (1993) A novel method of protein sequences classification based on oligopeptide frequency analysis and its application to search for functional sites and to domain localization. CABIOS 9, 17-24.

Solovyev V.V., Korolev S.V., Lim H.A. (1993) A new approach for the classification of functional regions of DNA sequences based on fractal representation. Int.J. Genomic. Res. 1, 2, 108-127. Solovyev V., Seledtsov I. (1993) A new approach to the phylogenetic trees construction based on analysis of relatively conservative regions of nucleotide and amino acid sequences. International Journal of Genome Research 1 (3), 177-185. Makarova K.S., Mazin A.V., Wolf Y. I., Solovyev V.V. (1992) "DIROM"- an experimental design interactive system for directed mutagenesis and nucleic acids engineering. CABIOS 8, 425-431. Solovyev V.V., Salikhova A.K., Lim H.A. (1992) 3D-structure calculation of a-helical domains of protein molecules using the quasispherical primary approximations. In Biomedical Modeling and Simulation (J.Eisenfeld, D.Levin, M.Witten Eds.) Elsevier Science Publishers B.V. (North- Holland), p. 201-211.

Seledtsov I.A., Solovyev V.V., Merkulova T.I. (1991) New elements of glucocorticoid-receptor binding sites of hormone-regulated genes. Biochim. Biophys. Acta 1089, 367-376. Rogozin I.B., Solovyev V.V., Kolchanov N.A. (1991) Somatic hypermutagenesis in immunoglobulin genes. Correlation between somatic mutations and repeats. Somatic mutation properties and clonal selection. Biochim. Biophys. Acta 1089, 175-182. Vershinin A.V., Salina E.A., Solovyev V.V., Timofeyeva L.L. (1990) Genomic organization, evolution, and structural peculiarities of highly repetitive DNA of Hordeum vulgare. Genome 33, 441-449.

Kholodilov N., Bolshakov ., Blinov V., Solovyov V., Zhimulev I. (1988) Intercalary heterochromatin in Drosophila. Chromosoma 97 (3), 247-253.

Kolchanov N., Solovyov V., Rogozin I. (1987) Peculiarities of immunoglobulin gene structures as a basis for somatic mutation emergence. FEBS letters 214 (1), 87-91. Merkulova TI, VV Solov'ev, NA Kolchanov, SY Plisov, RI Salganik (1986) Identification of DNA sequences specific for 5 -flanking regions of glucocorticoid-regulated genes using computer analysis. Bulletin of Experimental Biology and Medicine 101 (4), 502-504. Solovyev V.V, Zharkikh A.A, Kolchanov N.A. (1984) The template RNAs of RNA polymerases can have compact secondary structure formed by long double helices with partial violation of the complementarities. FEBS Lett. 165, 72-78.

Solovyev V.V., Kolchanov N.A. (1984) A simple method for calculation of low energy packing of a-helices. A threshold Approximation. 1. The use of the method to estimate the effects of amino acid substitutions, deletions, and insertions in globins. J. Theor. Biol. 110, 67-91. Zharkikh A.A, Solovyev V.V., Kolchanov N.A. (1984) Conformational change in the Globins Family during Evolution 1. Analysis of the Evolutionary Role of Insertions and Deletions. J.Mol.Evol. 21, 42-53.

Kolchanov N.A.,Solovyev V.V., Zharkikh A.A. (1983) Effect of mutation, deletion and insertion of single amino acids on the three-dimensional structure of globins. FEBS Lett. 161, 65-70.

Contact this candidate