Post Job Free

Resume

Sign in

Data Software Engineer

Location:
Indianapolis, IN, 46280
Posted:
June 05, 2010

Contact this candidate

Resume:

CHENGYI ZHENG

US Permenant Resident

**** ****** ***** **, #*

Indianapolis, IN, 46280

Google phone #: 818-***-**** abmo7r@r.postjobfree.com

Career Summary

A versatile self-driven person successfully delivered value through using

computer skills to analyze and discover information

Love of innovation, a can-do attitude, and enthusiasm for learning new

topics

More than 10 years of experience in Healthcare industry

Worked in health insurance, hospital, medical device and pharmaceutical

companies

Experienced with insurance/billing, clinical, and drug discovery data

Specialized in using text analytics skills to analyze unstructured data

More than 10 years of experience in Computer Science

. Pattern recognition and classification, machine learning, data mining and

text mining

. Analysis of large amounts of complex data, including audio and text data

. Natural language processing (NLP), language modeling (LM), data

clustering

Computer Skills

Programming: C, C++, SAS, SQL, Java, JCL, R, Matlab, XML, VB, PHP, Lex,

YACC

Script Language: Perl, Python, Tcl/Tk, Shell, AWK, Sed

Database: Oracle, SAS, Teradata, MySQL, DB2

Environment: AIX, Cygwin, Mainframe, Solaris, Linux, UNIX, Windows

Others: CVS, Emacs, GCC, Purify, Make, Mantis, NSIS, TestLink, SVN, VS,

Apache, Tomcat

Text and Data Mining Skills

Text mining: Lingumatics, SAS Text Miner, Oracle Text, UIMA, ONHLP, GATE

Data mining: SAS Enterprise Miner, JMP, R, RapidMiner, Weka

Ontologies or taxonomies: MeSH, SNOMED CT, MedDRA, Entrez Gene, Chebi,

UMLS, NCI, GO, ICD-9, CPT

Other tools: QUOSA, Spotfire, Pipeline Pilot, Denodo, Omniviz, Cytoscape

Databases

Clinical and claim database: Kaiser Permanente, EPIC, GPRD, i3, Thomson

Marketscan

Medline/PubMed, EMBASE, Ovid, PatBase, Factiva, SiteTrove

Ensemble, Biobase, Thomson Pharma, drugbank, OMIM

Epidemiology: AERS, SIDER, CDC, SEER, EPI

Education

Ph.D. Computer Science and Engineering

Oregon Graduate Institute of Oregon Health & Science University,

Portland, OR, 2004

M.S. Computer Science

Fudan University, Shanghai, China, 1998

B.S. Instrument Engineering

Shanghai JiaoTong University, Shanghai, China, 1995

Employment

Research Scientist, Biomedical Informatics, Eli Lilly and Company,

Indianapolis, IN

11/2008 - Current

Initiated and delivered text mining / information retrieval solutions to

identify relevant documents, reduced reading 113,000 papers (about 5

working years literature review time)

Ranked 1st in the InnoCentive@Lilly, a companywide (45,000 employees)

competition to solve challenging problems (Similar to

http://www.innocentive.com)

Performed text mining on literature search and data mining on electronic

health record databases to find out the target population for early and

late phase studies

Studied drug safety signal detection / predictive algorithms, used

decision tree, random forest, neural network and support vector machine

(SVM) and other clustering and regression methods (JMP and R)

Led projects on target identification by using literature based open

discovery principle

Worked with scientists from early stage discovery to late stage

production, such as drug disposition, translational medicine, BioTDR,

health outcome and patient safety

Conducted system review and meta-analysis on several projects

As the administrator and heavy user of Lingumatics I2E, in charge of its

installation, indexing, maintenance and answer internal users'

questions.

Data Consultant, Biostatistician Group, Department of Research and

Evaluation, Kaiser Permanente, Pasadena, CA

04/2008 - 11/2008

Create a pathology ORACLE database with daily ETL from Teradata data

warehouse

Using data mining methods to classify and predict cancer sites, types and

grades

Using NLP and rule based information extraction to identify, extract,

summarize and translate free text data from cardiology reports and

achieved 95% accuracy which helps to identify patients for beta-blockers

treatment

Daily ETL from the Clarity database which has over 3 billion clinical

reports using SAS and Oracle

Senior Software Engineer, Karl Storz Endoscopy, Goleta, CA

07/2004 - 03/2008

Developed a voice recognition system for an integrated surgical suite

Performed human computer interface (GUI) design, implementation and

usability tests

Developed the audio front end and feature extraction module

Performed software documentation, testing and validation of software to

FDA specifications

Built multilingual acoustic and language models

Consultant, VOX Technologies, Beaverton, OR

04/2003 - 12/2003

Developed a low resource SDK designed for small embedded devices. Wrote

APIs for a client-server type dialog system under a distributed SR

framework

Implemented a fixed-point library for speed improvement on embedded

devices

Rewrote a Tcl based neural network core using C

Graduate Research Assistant, Oregon Health and Science University, OR

09/1998 - 03/2003

2nd place in the 2003 NIST Language Recognition Evaluation

3rd place in the 2001 DARPA Speech In Noisy Environments competition on

accuracy and 1st place in speed

Designed and implemented a large-scale statistical based speech

recognition and understanding system using HMM, decision tree, class

based language model, adaptation, search, etc using C, Perl and other

tools. The system runs under a distributed Linux cluster using parallel

programs.

Intern, RadiSys Corporation, Hillsboro, OR

02/2004 - 03/2004

Performed software and system integration testing for a carrier-grade

Linux-based computer cluster (blade) server system mainly using Perl

Intern, Intel China Research Center, Beijing, China

07/1999 - 09/1999

Participated in the Intel Integrated Performance Primitives (IPP)

library project

Publications

1. C. Zheng and Y. Yan, Fusion Based Speech Segmentation in DARPA SPINE2

Task. In ICASSP 2004, Montreal, Canada, 2004.

2. Y. Yan, C. Zheng and et. al, A Dynamic Cross-Reference Pruning Strategy

For Multiple Feature Fusion at Decoder Run Time. In EuroSpeech 2003,

Geneva, Switzerland, 2003.

3. C. Zheng and Y. Yan, Run Time Information Fusion in Speech Recognition,

in ICSLP 2002, Denver, 2002.

4. Y. Yan, C. Liu and C. Zheng, A Multiple Feature Front-End Approach to

Speech in Noise. In International Conference on Signal & Image

Processing, 2002.

5. C. Zheng and Y. Yan, Efficiently Using Speaker Adaptation Data, in ICSLP

2000, Beijing, China, 2000.

6. C. Zheng and Y. Yan, Improving Speaking Adaptation by Adjusting The

Adaptation Data Set, in ISPACS 2000, Hawaii, 2000.

7. Z. Xu, C. Zheng, Z. Ye, M. Xie, Complex-Valued Multistate Bidirectional

Associative Memory, Acta Electronica Sinica, vol 27, 1999.

8. C. Zheng, X. Liu, Z. Li, A Chinese Speech Database for Network Service,

in ORIENTAL COCOSDA Workshop 1998, Tsukuba, Japan, 1998.

9. C. Zheng and Z. Xu, Sign Language Recognition System Using Image

Processing, Computer Engineering & Application, Special Edition, 1998.

Patent

US Patent 7620553 - Simultaneous support of isolated and connected phrase

command recognition in automatic speech recognition systems

Patent Applications

200-***-**** Device control system employing extensible markup

2 language for defining information resources

200-***-**** System and method for hazard mitigation in voice-driven

8 control applications

200-***-**** Audio, Visual and device data capturing system with

0 real-time speech recognition command and control system

200-***-**** Speech recognition system with user profiles management

1 component



Contact this candidate