CHENGYI ZHENG
US Permenant Resident
Indianapolis, IN, 46280
Google phone #: 818-***-**** abmo7r@r.postjobfree.com
Career Summary
A versatile self-driven person successfully delivered value through using
computer skills to analyze and discover information
Love of innovation, a can-do attitude, and enthusiasm for learning new
topics
More than 10 years of experience in Healthcare industry
Worked in health insurance, hospital, medical device and pharmaceutical
companies
Experienced with insurance/billing, clinical, and drug discovery data
Specialized in using text analytics skills to analyze unstructured data
More than 10 years of experience in Computer Science
. Pattern recognition and classification, machine learning, data mining and
text mining
. Analysis of large amounts of complex data, including audio and text data
. Natural language processing (NLP), language modeling (LM), data
clustering
Computer Skills
Programming: C, C++, SAS, SQL, Java, JCL, R, Matlab, XML, VB, PHP, Lex,
YACC
Script Language: Perl, Python, Tcl/Tk, Shell, AWK, Sed
Database: Oracle, SAS, Teradata, MySQL, DB2
Environment: AIX, Cygwin, Mainframe, Solaris, Linux, UNIX, Windows
Others: CVS, Emacs, GCC, Purify, Make, Mantis, NSIS, TestLink, SVN, VS,
Apache, Tomcat
Text and Data Mining Skills
Text mining: Lingumatics, SAS Text Miner, Oracle Text, UIMA, ONHLP, GATE
Data mining: SAS Enterprise Miner, JMP, R, RapidMiner, Weka
Ontologies or taxonomies: MeSH, SNOMED CT, MedDRA, Entrez Gene, Chebi,
UMLS, NCI, GO, ICD-9, CPT
Other tools: QUOSA, Spotfire, Pipeline Pilot, Denodo, Omniviz, Cytoscape
Databases
Clinical and claim database: Kaiser Permanente, EPIC, GPRD, i3, Thomson
Marketscan
Medline/PubMed, EMBASE, Ovid, PatBase, Factiva, SiteTrove
Ensemble, Biobase, Thomson Pharma, drugbank, OMIM
Epidemiology: AERS, SIDER, CDC, SEER, EPI
Education
Ph.D. Computer Science and Engineering
Oregon Graduate Institute of Oregon Health & Science University,
Portland, OR, 2004
M.S. Computer Science
Fudan University, Shanghai, China, 1998
B.S. Instrument Engineering
Shanghai JiaoTong University, Shanghai, China, 1995
Employment
Research Scientist, Biomedical Informatics, Eli Lilly and Company,
Indianapolis, IN
11/2008 - Current
Initiated and delivered text mining / information retrieval solutions to
identify relevant documents, reduced reading 113,000 papers (about 5
working years literature review time)
Ranked 1st in the InnoCentive@Lilly, a companywide (45,000 employees)
competition to solve challenging problems (Similar to
http://www.innocentive.com)
Performed text mining on literature search and data mining on electronic
health record databases to find out the target population for early and
late phase studies
Studied drug safety signal detection / predictive algorithms, used
decision tree, random forest, neural network and support vector machine
(SVM) and other clustering and regression methods (JMP and R)
Led projects on target identification by using literature based open
discovery principle
Worked with scientists from early stage discovery to late stage
production, such as drug disposition, translational medicine, BioTDR,
health outcome and patient safety
Conducted system review and meta-analysis on several projects
As the administrator and heavy user of Lingumatics I2E, in charge of its
installation, indexing, maintenance and answer internal users'
questions.
Data Consultant, Biostatistician Group, Department of Research and
Evaluation, Kaiser Permanente, Pasadena, CA
04/2008 - 11/2008
Create a pathology ORACLE database with daily ETL from Teradata data
warehouse
Using data mining methods to classify and predict cancer sites, types and
grades
Using NLP and rule based information extraction to identify, extract,
summarize and translate free text data from cardiology reports and
achieved 95% accuracy which helps to identify patients for beta-blockers
treatment
Daily ETL from the Clarity database which has over 3 billion clinical
reports using SAS and Oracle
Senior Software Engineer, Karl Storz Endoscopy, Goleta, CA
07/2004 - 03/2008
Developed a voice recognition system for an integrated surgical suite
Performed human computer interface (GUI) design, implementation and
usability tests
Developed the audio front end and feature extraction module
Performed software documentation, testing and validation of software to
FDA specifications
Built multilingual acoustic and language models
Consultant, VOX Technologies, Beaverton, OR
04/2003 - 12/2003
Developed a low resource SDK designed for small embedded devices. Wrote
APIs for a client-server type dialog system under a distributed SR
framework
Implemented a fixed-point library for speed improvement on embedded
devices
Rewrote a Tcl based neural network core using C
Graduate Research Assistant, Oregon Health and Science University, OR
09/1998 - 03/2003
2nd place in the 2003 NIST Language Recognition Evaluation
3rd place in the 2001 DARPA Speech In Noisy Environments competition on
accuracy and 1st place in speed
Designed and implemented a large-scale statistical based speech
recognition and understanding system using HMM, decision tree, class
based language model, adaptation, search, etc using C, Perl and other
tools. The system runs under a distributed Linux cluster using parallel
programs.
Intern, RadiSys Corporation, Hillsboro, OR
02/2004 - 03/2004
Performed software and system integration testing for a carrier-grade
Linux-based computer cluster (blade) server system mainly using Perl
Intern, Intel China Research Center, Beijing, China
07/1999 - 09/1999
Participated in the Intel Integrated Performance Primitives (IPP)
library project
Publications
1. C. Zheng and Y. Yan, Fusion Based Speech Segmentation in DARPA SPINE2
Task. In ICASSP 2004, Montreal, Canada, 2004.
2. Y. Yan, C. Zheng and et. al, A Dynamic Cross-Reference Pruning Strategy
For Multiple Feature Fusion at Decoder Run Time. In EuroSpeech 2003,
Geneva, Switzerland, 2003.
3. C. Zheng and Y. Yan, Run Time Information Fusion in Speech Recognition,
in ICSLP 2002, Denver, 2002.
4. Y. Yan, C. Liu and C. Zheng, A Multiple Feature Front-End Approach to
Speech in Noise. In International Conference on Signal & Image
Processing, 2002.
5. C. Zheng and Y. Yan, Efficiently Using Speaker Adaptation Data, in ICSLP
2000, Beijing, China, 2000.
6. C. Zheng and Y. Yan, Improving Speaking Adaptation by Adjusting The
Adaptation Data Set, in ISPACS 2000, Hawaii, 2000.
7. Z. Xu, C. Zheng, Z. Ye, M. Xie, Complex-Valued Multistate Bidirectional
Associative Memory, Acta Electronica Sinica, vol 27, 1999.
8. C. Zheng, X. Liu, Z. Li, A Chinese Speech Database for Network Service,
in ORIENTAL COCOSDA Workshop 1998, Tsukuba, Japan, 1998.
9. C. Zheng and Z. Xu, Sign Language Recognition System Using Image
Processing, Computer Engineering & Application, Special Edition, 1998.
Patent
US Patent 7620553 - Simultaneous support of isolated and connected phrase
command recognition in automatic speech recognition systems
Patent Applications
200-***-**** Device control system employing extensible markup
2 language for defining information resources
200-***-**** System and method for hazard mitigation in voice-driven
8 control applications
200-***-**** Audio, Visual and device data capturing system with
0 real-time speech recognition command and control system
200-***-**** Speech recognition system with user profiles management
1 component