Lili Cheng
Email: ****.********@*****.*** Phone: 914-***-****
Background: 5+ years of experience in data analysis, statistical
techniques and software tools (SAS, R, Matlab, Minitab), time series
analysis and forecasting models, Data Mining, Machine Learning
methodologies and analysis. 5+ years of experience in extensive software
development in Java, C++, SQL, Visual C#. 5+ years of experience in
UNIX/LINUX system administration and Shell scripting. Data analysis and
solutions development for several industries including Finance, Energy,
Genomics, Image recognition. Reasonable experience in big data platform
and analysis (Hadoop, MapReduce).
Data Analysis Skills
. 5+ years of experience with statistics and mathematics methodologies
such as forecasting, time series analysis, regression, demand
forecasting, Computational Biology, Finance, etc.
. Extensive data analysis using statistical tools including SAS, R,
Matlab, Minitab, Treeview, Cluster.
. Extensive knowledge and real world experience in applying Machine
Learning and Data Mining approaches for data analysis.
Software Development Skills
. 5+ years of experience in software architecting, and application
development in Java, C++, SQL, Visual C#.Net, Visual C++, Visual Basic.
. 5+ years of experience in UNIX/LINUX system administration, shell
scripting, and software development.
. Significant experience in development of solutions for domains
including Image Recognition, Finance, Computational Biology where
various Artificial Intelligence, data mining, statistics, and machine
learning methods and solutions are used.
. Reasonable experience with big data platform and programming (Hadoop,
MapReduce)
Experience
Data Scientist, Novisync Solutions Jan.2013-present
. Big Data Search Optimization
- Developed Hadoop based web and text analytics optimization for big
data application. Designed and adapted PageRank algorithm with
optimizations. Experience with configuration, management of
Hadoop, HDFS clusters and MapReduce based algorithm development
under UNIX/LINUX environment.
Data Researcher, Dept of Statistics and mathematics at Univ of
Massachusetts at Amherst Sep.2006-May.2008
. Gene Data Analysis using Hierarchical Clustering
- Experience in genomics with large scale gene data set analysis for
efficient computation. Goal was to find similar DNA micro-arrays
in extra large gene data set with high efficiency. A hierarchical
clustering algorithm-- agglomerative method was employed after
experimenting with several clustering and pattern matching
algorithms. Also experimented with several distance metrics and
linkage methods in order to find the most efficient implementation
of clustering algorithm. Implemented in R, Treeview and Cluster
under Windows environment.
. Bankruptcy Prediction using Measurement Error Analysis
- Experience with risk assessment with financial data. Goal was to
predict the probability that a company goes bankruptcy based on
its cash-flow and debt history. Learning algorithms were used to
extract patterns from large set of labeled historical financial
data of companies. Then M.E correction and bootstrapping analysis
were used to identify patterns from the financial data of the
target company. Designed and implemented in SAS under UNIX
environment.
. Human Face Recognition using Support Vector Machine Algorithm,
Microsoft Research in China Oct.2004-
Apr-2005
- Significant experience with digital image processing for face
recognition. Goal was to recognize specific human faces from live
video streams from airport civilian cameras. Various digital image
processing methods were used to prepare and enhance the raw
images. The SVM based algorithm was employed in recognition
process in order to handle high feature dimensions of human face
images. Implemented in C# and used several image processing
libraries under Windows environment.
Data Research Analyst, Hai Dian Hospital Nov.2002-Mar.2003
. Sediment Detection in Medical Images using Edge Detection and Pattern
Recognition Algorithm
- Goal was to automatically detect and count specific shapes of
sediments from urine sample images. Various image enhancing, edge
detection, and pattern recognition algorithms were employed.
Implemented in Visual C++ under Windows environment.
Data Analyst, Hua Neng Power Plant Oct.2000-May.2001
. Prediction of Price on Electrical Power Market using Time Series
Analysis
- Goal was to predict electric price and analyze real time power
generation cost in order to assist Huaneng Power Plant, the major
energy company in China, to bid on the electrical power market.
The ARIMA time series model was used in electric price prediction.
A detail model of the entire power generation process was built
and live data streams from enterprise database were streamed into
the model to calculate generation cost. Finally the results were
aggregated in a cost-profit analysis and bidding assistance
system. Implemented in Visual Basic, C, SQL. Worked with a large
scale enterprise database in SQL Server 2000
Education
University of Massachusetts Amherst, MA
Master of Science in Mathematics and Statistics (G.P.A: 3.9/4.0)
- Thesis: Hierarchical Cluster Analysis in Micro-array
Experiments
Beihang University Beijing, China
Master of Engineering in Pattern Recognition and Artificial
Intelligence
- Thesis: Human Face Recognition Based on Support Vector Machine
Harbin Institute of Technology Harbin, China
Bachelor of Engineering in Computer Science(Distinct Graduate)