Jian Sun
SAS Programmer/ Data Analyst
Summary:
Over 7 years of experience in statistical software packages: SAS (BASE/STAT/MACRO/SQL/ODS),
R, Python, SPSS, solid skills of MS Access and Excel.
Proficient in Data Analysis/Data Management/Database Designing/Business Analysis.
Certified in SAS Advanced and Base programming.
Proficient in SAS 9.2/9.3, SAS Business Intelligence (BI) platform with Enterprise Guide, BI platform
with Ab Initio.
Experience in working with SAS on different platform like UNIX and Windows.
Professional in Developing SAS Programs for Data Requirement Analysis and Data Mapping for ETL
(Extract, Transform, Loading) process.
Strong skills in Developing SAS Programs for manipulating Analysis of Variance (ANOVA), Linear
Regression, Logistics Regression, Multivariate Analysis and Statistical Modeling.
Experience in pulling data from data-ware housing (Teradata, Oracle, and SQL Server).
Proven skills in data cleaning, data validation, data integrity, ad-hoc reporting, and coding with SAS
on various environments.
Experience in Hadoop (MapReduce and HDFS), Online Analytical Processing (OLAP: SQL
constructors with Cube and Roll Up).
Solid knowledge of Stochastic Process, Time Series, Game Theory, Operation Research.
Expertise in Machine Learning (Supervised and Unsupervised) and Data Mining.
Experience in Microsoft Office tools like MS Access, MS SharePoint, MS word, MS PowerPoint, MS
Excel.
Professional in quantitative and qualitative research.
Knowledge of Medicaid and Medicare, Public Health, Clinical Trial (Good understanding of CDISC,
SDTM and ADAM standards).
Responsible for database recovery, leading software testing, troubleshooting.
Proven abilities in establishing effective task priorities as a team player with result oriented attitude.
Excellent communication and interpersonal skills.
Education:
University of California, Berkeley
B.A. in Statistics
B.S. in Mathematics
Certifications:
SAS Certified Advanced Programmer
SAS Certified Base Programmer
Technical Skills:
R, SAS (BASE/STAT/GRAPH/SQL/MACRO/ODS/ETL/ACCESS), SAS Enterprise Guide, Python,
SPSS, MATLAB, VBA, UNIX, Outlook, Prezi, Teradata, SQL Server, Oracle, MS Office, Data Mining,
Machine Learning, Predictive Modeling, Stochastic Process, Intermediate Accounting, Operation
Research, Game Theory, Time Series, Mathematical Analysis, Financial Analysis
Professional Accomplishments
InVentiv Health Clinical, Princeton, NJ Sep 2013 to Till Date
SAS Programmer
Responsibilities:
Invoked SQL Assistant to analyze and query clinical data from Teradata DB
Performed data extraction by converting MS Word documents and MS Excel tables into SAS data
sets.
Utilized SAS/Base (Proc Means, Proc Freq, Proc Sort) to clean and validate large datasets, and used
PROC SQL to extract columns and to join tables
Used Proc SQL pass through Facility to create and update Teradata
Conducted Logistic Regression Analysis by using Proc Logistic in SAS to predict the factors that
affect mortality
Generated residuals diagnostic to calculate the p-value for testing outliers, produced t-test based on
classification variables
Proceeded Proc ANOVA to evaluate the equality of mean values for mean systolic and diastolic
variable across the categories
Initiated SAS Macro to create table that displays the median systolic and diastolic measurements for
each quartile of women’s body mass index (BMI)
Used Output Delivery System (ODS) facility to write an analytical report directing SAS output to
HTML file which includes statistical results tables, analysis summary, and data interpretation
Presented the HTML report to clients by using Prezi
Environment and Skill: SAS/BASE, SAS/MACRO, SAS/SQL, SAS/STAT, SAS/GRAPH, SAS/ODS,
MS Excel, Prezi, Teradata DB.
Merkle, Columbia, MD Mar 2012 to Aug 2013
SAS Programmer for Clinical Trial Statistical Learning
Responsibilities:
Used Proc SQL to extract Clinical data from Teradata DB by using SQL assistant to analyze and
query data
Demonstrated data cleansing by using ETL’s Extraction and Loading Phase based on Ab Initio
Performed Data validation by using Proc Freq, Proc Means, Proc Report
Performed Principal Component Analysis (PCA) to reduce the numbers of predictive variables using
SPSS
Generated Cluster Analysis by K-means and K-medoids to reduce dissimilarities based on data
attributes
Used pseudo-F statistic and Cubic Cluster Criterion (CCC) to capture the tightness of clusters and to
measure deviation of the cluster
Built a sparse logistic-regression classifier (Lasso and Ridge Regression) in SAS to classify significant
factors and non-significant factors for patients
Compared three classification methods: CART, bagging, random forests with the sparse logistic -
regression classifier to determine the optimal method
Applied One V.S. All (OVA) and All V.S. All (AVA) Classification to Support Vector Machine (SVM),
Generalized Boosted Method (GBM) and compared them to make final optimal decision
Designed and developed code with SAS/EG to support ad-hoc requests
Used SAS/ODS to produce HTML reports and presented data visualization and analysis results in a
poster
Developed rule-based model and collaborated with pharmacists to discuss risk factors
Environment and Skill: SAS/BASE, SAS/SQL, SAS/STAT, SAS/ODS, R, SPSS, MS Excel, Machine
Learning, Teradata DB, Teradata SQL Assistant, Clinical Trial
Oct 2010 to Feb 2012
Keas, San Francisco, CA
Data Analyst/ R Programmer
Responsibilities:
Performed SQL syntax to pull Clinical data from Oracle by using Oracle SQL Developer
Converted ORACLE data tables into MS Excel format and read into R program
Constructed UNIX platform to clean the data according to the specifications
Computed parameters of family dies out by MLE and maximum number of generations of family
Generated Poisson simulation and probability distribution by Monte Carlo method in R program
Initiated shell script in UNIX to manipulate XML formatted data to extract information of family
name
Created data frame and family tree to compare the summary statistics across parameters
Implemented R programming to display characterizations of data to test whether data manifests a
time series trend
Extracted and manipulated data and fitted regression model for number of generations
Created tables to report frequencies of peaks within a calendar year, and fitted harmonic trends to
find cycle of maximum of generations of family
Made GARCH model to predict the future conditional standard deviation of future number of
generations of family
Fitted autoregressive integrated moving average (ARIMA) model to forecast future generation trend
Presented data results through graphs and formatted HTML table report to conclude the simulation
Environment and Skill: R program, UNIX, XML, HTML, MS Excel, Oracle, Oracle SQL Developer,
Mathematical Statistics Concept, Time Series, Clinical Trial
K.L.D. ltd, Yantai, China Apr 2009 to Sep 2010
SAS /SQL Programmer
Responsibilities:
Implemented SQL to extract data from different relational database management systems and used
SQL Assistant to analyze and query from the Oracle and manipulated Excel Macro to perform the
repeated task
Collected data and established Enhanced Entity Relationship (EER) diagram using animal shelter
database
Built relational database model and wrote structured queries in SQL and designed database in
Microsoft Access
Developed queries on the existing Oracle databases to provide ad-hoc requests
Performed normalization analysis to detect the primary and non-primary key among all attributes
Composed ad hoc report and documented SQL codes files for supervisor’s review
Environment and Skill: SQL, MS Access, MS Visio, MS Excel, Data Modeling, EER diagram, Operation
Research, Oracle
HOYN Capital, Beijing, China
Financial Project Coordinator/SAS Analyst Aug 2007 to Mar 2009
Responsibilities:
Extracted data from financial database by using SQL Assistant and cleaned data by using Excel
Loaded dataset into SAS and used Proc Means and Proc Univariate to check data validation
Estimated products’ future sales, profits, and cash flows throughout its five-year life cycle and
analyzed 10-K reports by calculating financial ratios in Excel and MATLAB
Calculated internal rate of return and net present value to determine how many shares of common
stock must be issued
Applied financial analysis to show how large a mortgage the prospective company can afford and
how much the total interest will be paid over
Performed financial calculation to determine the effective annual yield to maturity on each of the
bond
Determined the true economic break-even point based on sales volume
Helped company to make decision whether the product is reasonable to purchase and introduce
Environment and Skill: MS Excel, MS Word, SAS, SQL Assistant, MATLAB, Financial Analysis,
Intermediate Accounting