LONGLONG YANG
*******@*****.***
https://www.linkedin.com/in/longlong-yang-64676454/
SPECIAL SKILLS
Computer Science: Extensive experience in Java, C++, Perl, PHP, Python, R, R Markdown, R Shiny, JavaScript, SQL, and HTML(5), CSS, AJAX, XML, JAXB, JSON, Ant, Axis, JSP and servlet, JSF, Spring, Spring Boot, Struts, Hibernate, myBatis, JDBC, ODBC; Apache Tomcat, IIS, JBoss, IBM WebSphere, Docker; AWS, Azure; web services (SOAP), microservices (RESTful); NetBeans, Eclipse, RStudio, Anaconda3; GIT, bitbucket, Subversion and jCVS; JIRA, Jenkins, Knime; Linux, Unix and Windows.
Database development and administration (Oracle, MySQL, MS-SQL), data warehouse, DataMart, Data Lake.
Computational Biology & Bioinformatics: Managing, archiving, and analyzing large-scale data from microarray gene expression, Next Generation Sequencing (NGS), cDNA screens and proteomics using commercial, open-source or self-developed software tools and packages.
Text analytics: Natural Language Processing (NLP), sentiment analysis, machine learning, data mining, web crawling, entity extraction, document summarization and categorization.
EDUCATION
M.S. Master of Computer Science, North Carolina State University, Raleigh, May 2001
Ph.D. Molecular Evolution, North Carolina State University, Raleigh, December 2000
B.S. Plant Protection, Shanxi Agricultural University, Taigu, Shanxi, China, June 1983
EXPERIENCE
2/2021-7/2024. Data Scientist / Senior App. Development Engineer, Division of Translational Toxicology (DTT), National Institute of Environmental Health Sciences (NIEHS), contracted at ASRC Federal Data Solutions – Civilian & Health Operating Group, Research Data Management and Reporting (RDMR).
• Create the Data Mart from the CEBS data warehouse; data modeling, design of physical model from the logic model, data transformation and validation.
• Develop R scripts for batch processing of CEBS Data Mart and submit calculated stats results to the data lake (AWS) for web applications.
• Develop R scripts for parametric and nonparametric stats of data from different sources and studies in excel files, integrated as components in KNIME pipelines.
• Refactor Java application for statistical analysis of genotoxicity studies with bug fixes and significant improvement in flexibility and performance.
• Refactor, modularize and re-use existing R and python scripts for statistical analysis, graphic visualization, reporting and detection of outliers of various types of data from CEBS data warehouse. Design new Oracle database tables and implement analysis pipelines with improved performance and flexibility.
• Develop “TGx-HDACi and TGx-DDI Biomarkers for Classification of Toxicants” web application (UI) using PHP Laravel and R scripts to analyze gene expression data for class predictions of chemicals with biomarkers derived using the nearest shrunken centroid (NSC) method (a linear machine learning algorithm).
• Manage projects and tasks with JIRA; setup projects and run applications in Jenkins.
11/2018-2/2021. Java Technology Lead, Conduent Business Services LLC., InfoSys Limited.
• Support Conduent healthcare applications, UI, Web Services and Oracle databases with Spring Core and Spring Boot, Hibernate, Struts, JSP, JavaScript, jQuery, Angular.
• Develop, improve, enhance, debug, deploy and test web applications on DEV and UAT environments and Azure productions.
• Fix vulnerabilities after OWASP web application security testing, such as Cross Site Scripting (XSS) and SQL Injection, etc..
• Develop microservices in Power BI applications.
2/2014-11/2018. Senior Programmer/Analyst, National Toxicology Program (NTP), National Institute of Environmental Health Sciences (NIEHS), contacted at DS Technologies, Inc.
• Develop applications with Java, Python, R and Spotfire for integration, management, analysis and visualization of microarray gene expression, high throughput transcriptome resequencing, histopathology and hematology data in CEBS (Chemical Effects in Biological Systems), a public Oracle database repository.
• Created the Python library cebsPy with statistical tests/methods used for analysis of CEBS data, implement microservices and RESTful APIs using Python and Django.
• Develop the JSF web application for benchmark dose response models and histopathology studies.
• Developed the first PHP web application for biomarker classification based on gene expression data of test chemicals.
• Work in team to develop command line application for generation PDF reports with more than 60 report types using Java, Hibernate, myBatis and iText.
12/2003-2/2014, Senior Bioinformatics Research Associate, Center for Genomic Biology and Bioinformatics, the Hamner Institutes for Health Sciences.
• Software Development: Java Swing applications include BMDExpress (published) for the benchmark dose analysis of dose-response or time-course microarray gene expression data with ANOVA, GO and pathway enrichment tests; NetAtlas (published) as a Cytoscape plugin for examining signaling networks based on tissue gene expression data stored in MySQL database; SPRinGS for signaling pathway reconstruction in genome screens; GEAX for gene expression analysis cross (X) species; PowerRMA for very large sets (several thousands) of Affymetrix microarray data normalization from CEL files and basic statistics computation; Hockey Stick threshold model (original model in R) for does and response data.
• Web services (XML) for network/pathway modeling and simulation database (Oracle), Entrez Gene, Gene Ontology (GO), and Microarray gene annotation (MySQL).
• Database Implementation, Development and Management: GeneNet as a back-end database (Oracle) for GeneSpring; Entrez Gene, Gene Ontology (GO) and other publicly available interaction network/ pathway databases (BIND, Intact) with local MySQL, using Perl scripts for automatic updating; new MySQL databases for cDNA and siRNA screen data, siRNA designation, Affymetrix chip annotation and LIMS back-end databases.
• Data Analysis: Microarray gene expression, NGS (Next Generation Sequencing), ToxCast assays and simulation, high-throughput cDNA/siRNA screens, proteomics, cell imaging and cytometry data from toxicogenomics and caner biology research; commercial, open-source, and developed software and tools such as GeneSpring, PathwayAssist, BMDExpress, Partek, Ingenuity Pathways Analysis (IPA), Cytoscape, Bio-conductor R packages, SamTools, Bowtie, JMP (SAS), Perl scripts, etc..
5/2002-12/2003, Research Associate, Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University.
• Perform Phylogenomics/Comparative genomics analysis of approximately 70 pathogen and bacteria species/strains.
• Develop phylogenetic web service and client side GUI in Java, SOAP, XML and MySQL databases.
• Develop a phylogenetic component to the informatics system of pathogens - Pathogen Portal or "PathPort", for rapid detection, identification, and forensic attribution of high-priority pathogens, whether causing diseases (impacting on human or agriculture productivity) or potentially used as biological weapons.
• Develop Perl scripts and Java application to integrate a pipeline for automatic analysis of bacteria genomes including Blast search, database search (hmmpfam and Wise2), alignment of homologous sequence groups, and construction of phylogenetic tree.
10/2001-5/2002, Data Analyst/Scientific Curator, Computational Biology Resource and Mouse Genome Informatics, the Jackson Laboratory.
• Nucleotide sequence and microarray gene expression data analysis, bioinformatics software evaluation. Accessing Celera Discovery System, Sequencher, gene prediction, cluster analysis for RIKEN clones (mouse) and gene annotation of Mouse Genome Database (MGD).
ACADEMIC HONOURS and AWARDS
The outstanding published paper in 2007, “Advancing the Science of Risk Assessment. A Method to Integrate Benchmark Dose Estimates with Genomic Data to Assess the Functional Effects of Chemical Exposure” Toxicological Sciences. 98(1):240-248. Presented by the Risk Assessment Specialty Section at the Annual Meeting of the Society of Toxicology, Seattle, Washington, March 2008.
The Board of Publications Award for the Best Paper in Toxicological Sciences. “Temporal Concordance Between Apical and Transcriptional Points of Departure for Chemical Risk Assessment”. Toxicological Sciences, 2013, 134(1): 180-194. SOT 54th Annual Meeting, San Diego, CA, March 2015.
SELECTED PUBLICATIONS
Phillips JR, et. al. 2019. BMDExpress 2: enhanced transcriptomic dose-response analysis workflow. Bioinformatics. 35(10): 1780–1782.
Jackson MA, Yang L, Lea I, Rashid A, Kuo B, Williams A, Lyn Yauk C, Fostel J. 2017. The TGx-28.65 biomarker online application for analysis of transcriptomics data to identify DNA damage-inducing chemicals in human cell cultures. Environ Mol Mutagen. 58(7):529-535.
Thomas RS, et.al. 2013. Temporal Concordance Between Apical and Transcriptional Points of Departure for Chemical Risk Assessment. Toxicol. Sci, 134(1): 180-194.
Tappenden, D.M., H. J. Hwang, L. Yang, R. S. Thomas, and J. J. LaPres, 2013. The Aryl-hydrocarbon Receptor Protein Interaction Network (AHR-PIN) as Identified by Tandem Affinity Purification (TAP) and Mass Spectrometry. Journal of Toxicology. (Access)
Thomas RS, Wesselkamper SC, Wang NC, Zhao QJ, Petersen DD, Lambert JC, Cote I, Yang L, Healy E, Black MB, Clewell HJ 3rd, Allen BC, Andersen ME. 2013. Temporal concordance between apical and transcriptional points of departure for chemical risk assessment. Toxicol. Sci. 2013 Jul; 134(1):180-94
Thomas RS, Clewell HJ 3rd, Allen BC, Yang L, Healy E, Andersen ME. 2012. Integrating pathway-based transcriptomic data into quantitative chemical risk assessment: A five chemical case study. Mutat Res. 2012 Aug 15; 746(2):135-43.
Woods CG, Fu J, Xue P, Hou Y, Pluta LJ, Yang L, Zhang Q, Thomas RS, Andersen ME, Pi J. 2009. Dose-dependent transitions in Nrf2-mediated adaptive response and related stress responses to hypochlorous acid in mouse macrophages. Toxicol Appl Pharmacol. 238(1): 27-36.
Yang, L., John R. Walker, John B. Hogenesch and Russell S. Thomas1. 2008. NetAtlas: A Cytoscape plugin to examine signaling networks based on tissue gene expression. In Silico Biol. 8(1):47-52
Yang, L., Bruce C Allen and Russell S Thomas. 2007. BMDExpress: a software tool for the benchmark dose analyses of genomic data. BMC Genomics 2007, 8:387
Halsey, T.A., L. Yang, J. R Walker, J. B Hogenesch and R. S Thomas. 2007. A functional map of NFB signaling identifies novel modulators and multiple system controls. Genome Biol. 8(6):R104.
Thomas, R.S., B.C. Allen, ANong, L. Yang, E. Bermbudez, H.J.Clewell III, and M.E. Andersen. 2007. A Method to Integrate Benchmark Dose Estimates with Genomic Data to Access the Functional Effects of Chemical Exposure. Toxicol. Sci. 98(1): 240-248.
Thomas RS, Pluta L, Yang L, Halsey TA. 2007. Application of genomic biomarkers to predict increased lung tumor incidence in 2-year rodent cancer bioassays. Toxicol Sci. 97(1): 55-64.
Thomas, R S., T. M. O'Connell, L Pluta*, R. D. Wolfinger, L. Yang* and T. J. Page. 2007. A Comparison of Transcriptomic and Metabonomic Technologies for Identifying Biomarkers Predictive of Two-Year Rodent Cancer Bioassays. Toxicol. Sci. 96(1): 40-46.
Page, T. J. D. Sikder, L. Yang, L. Pluta, R. D. Wolfinger, T. Kodadek and R. S. Thomas. 2006. Genome-wide analysis of human HSF1 signaling reveals a transcriptional program linked to cellular adaptation and survival. Mol. BioSyst. (2), 627-639.
Baldarelli, R. M., et al. 2003. Connecting Sequence and Biology in the Laboratory Mouse. Genome Research 13: 1505-1519
Okazaki, Y. et al. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563-573 (among 136 co-authors, annotated ~3,500 clones)
Winterton, S., L. Yang, B.M. Wiegmann, and D.K. Yeates. 2001. Phylogenetic revision of Agapophytinae subf.n. (Diptera: Therevidae) based on molecular and morphological evidence. Systematic Entomology 26: 173-211
Yang, L. 2000. Molecular Phylogenetics of the Therevidae and their position among the families of the Asiloidea (Insecta: Diptera). Ph.D. dissertation. North Carolina State University. pp. ix + 106. figures, tables.
Yang, L., B. M. Wiegmann, D. K. Yeates, and M. E. Irwin. 2000. Higher-level phylogeny of the Therevidae (Diptera: Insecta) based on 28S ribosomal and elongation factor – 1 alpha gene sequences. Molecular Phylogenetics and Evolution 15(3): 440-451