GOODNESS-OF-FIT TESTING FOR ACCIDENT MODELS WITH LOW
MEANS
* ******** *********, ******* ************** Institute, Montana State University,
* *******, **, ***, *-mail: *****.**@***.*******.***
8
9 Yunlong Zhang
10 Associate Professor, Zachry Department of Civil Engineering, Texas A&M University,
11 College Station, TX, USA, e-mail: ******@*****.****.***
12
13 Dominique Lord
14 Associate Professor, Zachry Department of Civil Engineering, Texas A&M University
15 College Station, TX, USA, e-mail: *-****@****.***
16
17
Submitted to the 3rd International Conference on Road Safety and Simulation,
18
19 September 14-16, 2011, Indianapolis, USA
20
21
22 ABSTRACT
23
24 The modeling of relationships between motor vehicle crashes and underlying factors has been
25 investigated for more than three decades. Recently, many highway safety studies have
26 documented the use of Poisson regression models, negative binomial (NB) regression models or
both. Pearson s X 2 and the scaled deviance ( G 2 ) are two common test statistics that have been
27
28 proposed as measures of goodness-of-fit (GOF) for Poisson or NB models. Unfortunately,
29 transportation safety analysts often deal with crash data that are characterized by low sample
30 mean values. Under such conditions, the traditional test statistics may not perform very well.
31
32 This study has two objectives. The first objective is to examine the accuracy and reliability of
33 traditional test statistics for the GOF of accident models subjected to low sample means. The
34 second objective intends to identify a superior test statistic for evaluating the GOF of accident
35 prediction models. For Poisson models, this paper proposes a better yet easy to use test statistic
36 that can be applied for almost all sample mean values, except when the mean value is extremely
37 low, for which no traditional test statistic can be accurate. For Poisson-Gamma models, this
38 study demonstrates that traditional test statistics are not accurate and robust. A more complex
method (grouped G 2 ) proposed in a previous study is recommended. Guidance on the use of the
39
grouped G 2 methods is further provided. Examples using observed data are used to help
40
41 illustrate the performance of different test statistics and support the findings of this study.
42
43 Keywords: crash data, generalized linear model, goodness-of-fit, power-divergence.
44
45
1
46 INTRODUCTION
47
48 The modeling of relationships between motor vehicle crashes and underlying factors, such as
49 traffic volume and highway geometric features has been investigated for more than three
50 decades. The statistical models (sometimes referred to as crash prediction models) from which
51 these relationships are developed can be used for various purposes, including predicting crashes
52 on transportation facilities and determining which variables significantly influence crashes.
53 Recently, many highway safety studies have documented the use of Poisson regression models
54 (Joshua and Garber, 1990; Miaou et al., 1992; Ivan and Bernardo, 2000; Lord and Bonneson,
55 2007), negative binomial (NB) regression models (Miaou and Lum, 1993; Poch and Mannering,
56 1996; Miaou and Lord, 2003; Maycock and Hall, 1984; Lord et al., 2005) or both (Miaou, 1994;
57 Maher and Summersgill, 1996). With the Poisson or Poisson-Gamma (or NB) models, the
58 relationships between motor vehicle crashes and explanatory variables can then be developed by
59 means of the Generalized Linear Model (GLM) framework.
60
Pearson s X 2 and the scaled deviance ( G 2 ) are two common test statistics that have been
61
62 proposed as measures of GOF for Poisson or NB models (Maher and Summersgill, 1996).
63 Statistical software (e.g., SAS) also uses these two statistics for assessing the GOF of a GLM
64 (SAS Institute Inc., 1999). Unfortunately, transportation safety analysts often deal with crash
65 data that are subjected to low sample mean values. Under such conditions, the traditional test
66 statistics may not perform very well. This has been referred to in the highway safety literature as
67 the low mean problem (LMP). The study by Sukhatme (1938) concluded that, for samples from
a Poisson distribution with mean as low as one, Pearson s X 2 test for goodness of fit is not
68
69 good. In the field of traffic safety, this issue was first raised by Maycock and Hall (1984) and
70 further discussed by Maher and Summersgill (1996), Fridstrom et al., (1995), and Agrawal and
Lord (2006). Wood (2002) proposed a more complex technique, the grouped G 2 method, to
71
solve this problem. The grouped G2 method is based on the knowledge that through grouping,
72
the data become approximately normally distributed and the test statistics follow a 2
73
74 distribution. Some issues regarding this method are discussed in the third section. It should be
75 noted that the comparison of different models can be achieved by means of Akaike s Information
76 Criterion (AIC) (Akaike, 1974) or Bayesian Information Criterion (BIC) (Schwarz, 1978).
77 However, similar to the previous studies (Maher and Summersgill, 1996; Wood, 2002; Agrawal
78 and Lord, 2006), this research intends to study statistics for the GOF of a given model (either
Poisson model or NB model); thus, we mainly focused on the study of the statistics of X 2, G 2
79
80 and the proposed statistic (Power-Divergence).
81
82 This study expands on the work of Wood (2002) and has two objectives. The first objective is to
83 examine the accuracy and reliability of traditional test statistics for the GOF of GLMs subjected
84 to low sample means. The second objective intends to identify a superior test statistic for
85 evaluating the GOF of crash prediction models. The study is accomplished by first theoretically
86 deriving the problems related with these traditional tests. Observed data are then used to
87 demonstrate the problems noted in the first part of the paper.
88
89 This paper is divided into five sections. The second section describes the characteristics of
90 Poisson and NB models used in traffic crash modeling. The third section provides an analysis
91 and comparison of different GOF test statistics for the Poisson and NB models. Observed crash
2
92 data are used for this analysis. In the fourth section, several important issues related to the GOF
93 test statistics are discussed. The last section summarizes the key findings of this study.
94
95 STATISTICAL MODELS
96
97 GLMs represent a class of fixed-effect regression models for dependent variables (McCullagh
98 and Nelder, 1989), such as crash counts in traffic accident models. Common GLMs include
99 linear regression, logistic regression, and Poisson regression. Given the characteristics of motor
100 vehicle collisions (i.e., random, discrete, and non-negative independent events), stochastic
101 modeling methods need to be used over deterministic methods. The two most common stochastic
102 modeling methods utilized for analyzing motor vehicle crashes are the Poisson and the NB
103 regression models. For these models, the relationship between traffic accidents and explanatory
104 variables is established through a loglinear function (i.e., canonical link or linear predictor). For
105 example, to establish the crash-flow relationship at intersections, the fitted model can follow the
form 0 F1 1 F2 2, where is the estimated number of crashes, F1 and F2 are the
106
107 entering AADTs (Average Annual Daily Traffic) for major and minor approaches, and
0, 1, 2 are the estimated coefficients. This fitted model can thus be used for predicting
108
109 crashes for different flow values.
110
111 Poisson Regression Model
112
The Poisson regression model aims at modeling a crash count variable Y, which follows a
113
Poisson distribution with a parameter (or mean) . The probability that the number of crashes
114
iy e u
i i
takes the value y i on the ith entity is P (Yi y i ) f Yi ( y i ; i ), i 1,2 n . For a
115
yi !
116 Poisson distribution, the variance is equal to the mean.
117
118 The systematic portion of the model involves the explanatory variables x1, x 2 x m, such as
119 traffic volumes, highway geometrics, v/c (volume/capacity) ratios and so on. The model is then
established through a linear predictor . This predictor is usually a linear function of the
120
k
logarithm of the explanatory variables in traffic crash models: 0 i xi, where i is the
121
i 1
122 Poisson regression coefficient for the ith explanatory variable x i . The coefficients are estimated
123 based on observed data. Finally, the model is estimated through a logarithm link function
j g ( j ) log( j ) (Myers et al., 2002).
124
125
126 Negative Binomial Regression Models
127
128 Although Poisson regression models are rather simple, crash data often exhibit overdispersion,
129 meaning that the variance is greater than the mean. The NB regression models are thus used for
130 modeling such data. The NB regression models have the same forms of linear predictor and
3
131 logarithm link function as the Poisson regression models, except that the response variable Y
132 follows a NB distribution, in which the probability mass function (pmf) is defined as follows:
133
( y i ) i yi
P (Yi y i ) f Yi ( y i ; ; i )
134 ), ( y i 1) i i
135
where is a Gamma function, and is the inverse dispersion parameter. The relationship
136
between the variance and the mean of NB distribution is presented as Var (Yi ) i i2 / . The
137
138 inverse dispersion parameter is usually assumed to be fixed and can be estimated from observed
139 data using the method of moments or the (Bootstrapped) maximum likelihood (Anscombe, 1949;
140 Fisher, 1941; Zhang et al., 2007). However, recent research have shown that the inverse
141 dispersion parameters may be related to the explanatory variables (Miaou and Lord, 2003; Mitra
142 and Washington, 2007).
143
144 GOODNESS-OF-FIT TEST STATISTICS
145
146 GOF tests use the properties of a hypothesized distribution to assess whether or not observed
147 data are generated from a given distribution (Read and Cressie, 1988). The most well-known
GOF test statistics are Pearson s X 2 and the scaled deviance ( G 2 ). Pearson s X 2 is generally
148
yi i
2
n
calculated as follows: X [ ], where y i is the observed data, i is the true mean
2
149
i
i 1
from the model, and i is the error and is usually represented by the standard deviation of y i .
150
151 The scaled deviance is calculated as twice the difference between the log-likelihood under the
152 maximum model and the log-likelihood under the reduced (or unsaturated) model:
G 2 2(log Lmax . log Lred . ) (Wood, 2002).
153
154
Previous research has shown that both the Pearson s X 2 and G 2 statistics are not 2 distributed
155
156 under low sample mean conditions (Maycock and Hall, 1984; Maher and Summersgill, 1996;
157 Wood, 2002; Fridstrom et al., 1995; Agrawal and Lord, 2006). To solve this problem, Maher and
Summersgill (1996) proposed a test statistic ( G 2 / E (G 2 ) ) for GOF tests. Wood (2002) showed
158
159 that this test still failed with low sample mean values. Wood (2002) then suggested a grouped
G 2 test statistic for solving this problem. The development of the grouped G 2 is based on the
160
161 knowledge that by increasing the mean value, the data are approximately normally distributed
and the statistics follow a 2 distribution. This method first determines an appropriate group
162
size r, which is the minimum grouping size. The raw data are then grouped so that each
163
observation is in a group of size at least as large as r . Additional details about the other steps
164
165 can be found in Wood (2002).
166
167 There are some issues with this method, however, that need to be addressed with the method
168 proposed by Wood (2002). First, the grouping size may vary from group to group with a
169 minimum grouping size, which is determined by the sample mean of a Poisson model or the
170 critical mean values in a NB model, as defined in Wood (2002). Thus, it is possible that changing
171 grouping sizes while maintaining the same minimum grouping size may lead to different testing
4
172 results. Second, through grouping, the sample size will be smaller, and that may become an issue
173 especially when the grouping size is not small. Thus, as commented by Wood, a compromise has
174 to be made between strong grouping (which ensures that the Chi-square assumption for the
175 distribution of the test statistic holds) and weak grouping (which allows to test against a richer
alternative hypothesis). Finally, the grouped G 2, which includes five steps, is not a simple
176
177 procedure for practitioners or average transportation safety analysts who frequently analyze
178 crash data.
179
180 To summarize, several GOF test statistics have been proposed to evaluate the fit of models, but
181 their performance and complexity vary greatly. Therefore, simple but accurate and reliable
182 alternative test statistics are highly desirable to account for the LMP commonly observed in
183 crash studies.
184
185 In Wood s study (2002), a simple criterion to assess whether or not a test statistic is appropriate
186 for testing the GOF of regression models is to examine the test statistic s performance for a
single distribution (Poisson or NB) with known parameters. For this criterion, the grouped G 2
187
188 method was developed to improve the normality of observations and allow the mean and
variance of the G 2 statistic (for low mean values) to be close to 1 and 2 ( 12 distributed),
189
190 respectively. Similarly in this study, we examine the mean and variance of different statistics
191 under a single distribution context to judge their appropriateness for the GOF of GLM.
192
193 Test Statistics for Poisson Models
194
195 Characteristics of Statistical Tests
196
The most common test statistics are Pearson s X 2 and the scaled deviance ( G 2 ). For a Poisson
197
model, the variance is equal to the mean and Pearson s X 2 is presented below:
198
199
yi i ( yi i ) 2
2
n n
X 2 ( ; n) [ ]
200 (1)
i i
i 1 i 1
201
202 The scaled deviance for a Poisson model is (Maher and Summersgill, 1996)
203
n
yi
G 2 ( ; n) 2[ y i log ( y i i )]
204 (2)
i
i 1
205
206 In this paper, we investigate other test statistics for the GOF test of the Poisson model, especially
207 when it is characterized by low sample mean values. This research draws from some other work
208 in the statistical literature.
209
Cressie and Read (1984 & 1988) incorporated the Pearson s X 2 and G 2 statistics into a family
210
of Power-Divergence Statistics ( PD, R ). In this family, each member PD is the sum of
211
212 deviance between the observed and expected counts:
5
213
PD a ( y i, i )
i
214 (3)
n
y
2
[yi (( i ) 1)],
( 1) i 1 i
215
where a denotes the distance function. Different values of lead to different GOF statistics
216
(Cressie and Read, 1984 & 1988; Baggerly, 1998), such as the Pearson s X 2 statistic
217
n
when 1, the Freeman-Tukey statistic F 2 PD 1 / 2 4 ( y i i ) 2 when 1 / 2
218
i 1
X2
219 (Freeman and Tukey, 1950), and the Neyman-modified statistic
( y i ) 2
n
NM 2 PD 2 i when 2 (Neyman, 1949). The Power-Divergence statistic
220
yi
i 1
221 can be also written as (Cressie and Read, 1989)
222
n
y
2
[yi (( i ) 1)] ( yi i )],
PD
223 (4)
( 1) i 1 i
224
Hence, when 0, the power divergence leads to the G 2 statistic (Cressie and Read, 1989).
225
226
Cressie and Read (1988) recommended 2 / 3, with which the statistic PD 2 / 3 will be
227
approximated by the 2 distribution in many situations and give the most reasonable power for
228
GOFs. When 2 / 3, the test statistic of Power-Divergence becomes
229
n
9y y 6
PD 2 / 3 [ i (( i ) 2 / 3 1) ( y i u i )], as derived from Equation 4.
230
5 i 5
i 1
231
232 GOF tests using different statistics rest on the assumption that the statistics follow an
approximate 2 distribution that has a mean of 1 and a variance of 2. Thus, to evaluate a test
233
statistic for GOF tests, we can investigate how well its components follow a 2 distribution.
234
235 With this criterion, different test statistics can be compared and evaluated.
236
Pearson s X 2, the G 2, Power-Divergence with 2 / 3 ( PD 2 / 3 ), and the Freeman-Tukey
237
n
statistic F 2 4 ( yi i ) 2 (Freeman and Tukey, 1950) are used for the examination of the
238
i 1
fit of distributions. In the case that crash data have zero counts at some locations, the
2
239
Neyman-modified X 2 goes to infinity and is therefore excluded from the comparison analysis.
240
241 Figure 1 shows the mean and variance of the components of those four statistics, for the Poisson
mean less than 10. The following equations show the calculations of mean and variance of the
242
Pearson s X 2 statistic, given a known Poisson mean value :
243
6
E ( X 2 ) X 2 fY ( k ; )
244 (5)
y 0
V ( X 2 ) [ X 2 E ( X 2 )]2 fY (k ; )
245 (6)
y 0
where fY (k ; ) is the pmf of Poisson distributions and k is the number of occurrence of an
246
event. The mean and variance of other statistics over different values can be calculated in this
247
248 way.
249
The comparisons are first conducted for values varying from 1 to 10. They are shown in
250
Figure 1. From this figure, Pearson s X 2 has a mean value (E(X2)) of 1 for all values, but its
251
variance (V(X2)) is greater than 2. With the decrease of, the variance increases. Thus, for low
252
conditions, Pearson s X 2 is not reliable and as a result, tends to overestimate GOF values. In
253
fact, V(X2) is equal to 2 1 / and this has also been described in the study by Wood (2002).
254
The mean of the scaled deviance (E(G2)) is slightly larger than 1 (when >1) and moves toward
255
1 as rises; the variance (V(G2)) increases from less than 1 to around 2.4 and then decreases
256
toward 2. The Freeman-Tukey statistic does not have a good fit of 2 distributions even when
257
10 . The mean and the variance of the PD 2 / 3 statistic, however, are rather close to 1 and 2
258
respectively. The components of the PD 2 / 3 statistic fit 2 distributions almost perfectly as
259
long as u >1. Therefore, the PD 2 / 3 is recommended for GOF tests for [1, 10] .
260
261
Figure 2 shows the comparison of mean and variance of X 2, G 2, and PD 2 / 3, for 0.1 1 .
262
It can be observed that E(G2) varies from 0.47 to 1.15, while E(PD) increases from 0.7 to 0.98.
263
Overall, E(PD) is more stable based on the rate of increase and is much closer to 1.0 than E(G2).
264
For 0.3, the difference between E(PD) and E(X2), which is exactly 1, is very small and
265
negligible. V(G2) is always less than 2 and even less than 1 given 0.7 ; V(PD) has the same
266
tendency as V(X2), but is more stable and gets close to 2.0 even when is as small as 0.3, while
267
V(X2) stays above 3.0 at 1 . It can be also seen that V(PD) performs like a compromise
268
between V(X2) and V(G2). From the above comparisons, for [0.3, 1], the components of
269
PD 2 / 3 are approximately 2 distributed and PD 2 / 3 performs better than the other statistics.
270
For 0.3, no statistic is reliable for GOF tests, and practitioners may consider turning to the
271
more complicated grouped G 2 method.
272
273
274 Based on Figures 1 and 2, PD 2 / 3 is better than the other statistics and its components generally
fit 2 distributions well for 0.3 . Pearson s X 2 is slightly better than G 2 for 3, but
275
even when 10, Pearson s X 2 and the G 2 are not satisfactory, with means and variances of
276
(E(X2)=1.00, V(X2)=2.10) and (E(G2)=1.02, V(G2)= 2.09), respectively.
277
278
7
Ye, Zhang, and Lord 8
a ) C hi quare
-S b) S cal D evi
ed ance
ance
8 8
ance
E (X2) E (G 2)
E xpecton and V ari
E xpecton and V ari
V (X 2) V (G 2)
6
6
4
4
i
i
2
2
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
P oi
sson M ean
P oi
sson M ean
d) P ow er-D i
vengence
c) F reem an-T ukey
ance
ance
8
8 E (P D )
E (F 2)
E xpecton and V ari
E xpecton and V ari
V (P D )
6
6 V (F 2)
4
4
2
i
2
i
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
P oi
sson M ean P oi
sson M ean
279
Figure 1 Mean and variance of the components of different test statistics for 0 u 10
280
8
a) Comparison of Variance
6
5.33
5
4
Variance
V(X2)
3.00
3 V(G2)
V(PD)
2.58
2 1.99
1.36
1
0.66
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Poisson Mean
b) Comparison of Expectation
2
Expectation
1.15 E(X2)
1.00
E(G2)
1
0.87
0.98 E(PD)
0.84
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Poisson Mean
281
Figure 2 Mean and variance of components of different test statistics for 0.1 u 1
282
283
9
284 Example Applications with Observed Data
285
286 To show how different GOF test statistics affect the fit of Poisson models, two examples using
287 observed crash data are provided. It is worth noting that the core of the study is to statistically
288 investigate the performance of different test statistics for GOF under low mean conditions. The
289 following data are used as examples to help support the findings from statistical investigation,
290 but not to serve as alternative approach to investigate their performance.
291
292 For the first example, the data were collected at 59 four-legged unsignalized intersections in
293 1991 in Toronto, Ontario (Lord, 2000). The dataset includes the number of crashes and entering
294 AADT for the major and minor approaches at each site. Both Poisson and NB GLM were used
295 for modeling this dataset, but the NB model converged to Poisson, with an inverse dispersion
296 parameter that tended towards infinity (Lord and Bonneson, 2007). The mean of this dataset is
297 0.97. The variance is roughly the same as the mean. Thus, a Poisson GLM could be used for
modeling this dataset. The functional form 0 F1 1 F2 2 is used for the prediction of the
298
299 number of crashes. As stated by Lord (2006), it is the most common functional form used by
300 transportation safety analysts for modeling crash data at intersections. The outputs of the fitted
301 model are shown in Table 1. It can be seen that all coefficients are still significant even at the
302 significance level of 0.01.
303
304 Table 1 Modeling outputs of the Poisson model
Coefficients Est. Value Std. Error z value Pr(> z )
0 2.3439E-06 4.2895 -3.022 0.0025
1 0.8175 0.3145 2.599 0.0093
2 0.6348 0.2349 2.7303 0.0069
305
Pearson s X 2, G 2, PD 2 / 3, and F 2 are used for the GOF test of this Poisson model. The
306
307 results of the GOF tests are summarized in Table 2. The PD 2 / 3 statistic has a lower GOF value
and correspondingly a higher p-value than the Pearson s X 2 statistic. The GOF value of G 2 is
308
higher than Pearson s X 2 . The F 2 statistic has the lowest p-value. To explain their differences,
309
Table 2 also lists the mean and variance of those test statistics given the Poisson mean 0.97 .
310
311 The mean and variance of the distribution of the test statistics can also be seen from Figure 1 or
Figure 2. It is clear that the components of the PD 2 / 3 statistic are rather close to a 2
312
distribution. For the Pearson s X 2 statistic, E ( X 2 ) 1 and V ( X 2 ) 3.03 . The variance V(X2)
313
is larger than 2 and may have overestimated the GOF value given E ( X 2 ) 1 . For the G 2
314
statistic, although V (G 2 ) 1.23 2, the mean E (G 2 ) 1.14 is higher than 1 and can also result
315
in overestimations of GOF values. Similarly, the F 2 statistic will also overestimate GOF values.
316
317
318
319
320
10
321 Table 2 Results of GOF tests for the Poisson model
X2 G2 F2
Statistics PD
GOF value 52.71 51.76 59.85 93.41
Degrees of Freedom 56 56 56 56
p-value 0.60 0.64 0.34 0.00
Expectation 1.00 0.98 1.14 1.81
Variance 3.03 1.99 1.23 3.20
322 : The means of test statistics when the Poisson mean is 0.97.
323 : The variances of test statistics when the Poisson mean is 0.97.
324
325 For the second example, the data were collected at 88 frontage road segments in the State of
326 Texas (Lord and Bonneson, 2007). The dataset includes the number of serious injury crashes
327 (KAB or K=Fatal, Injury Type A incapacitated, and Injury Type B non-incapacitated),
328 segment length, and AADT. The mean of this dataset is 1.386 and the variance is 1.642. Both
329 Poisson and NB GLM were used for modeling this dataset, but the NB model converged to
330 Poisson, with an inverse dispersion parameter that tended towards infinity (Lord and Bonneson,
2007). The functional form 0 * L * F 1 was used for the prediction of the number of crashes,
331
332 where L represents the segment length and F is the AADT. The modeling results are shown in
333 Table 3Table 3. It can be seen that both coefficients are significant at the significance level of
334 0.01.
335
336 Table 3 Modeling outputs of the Poisson model
Coefficients Est. Value Std. Error z value Pr(> z )
0 0.01536 0.8374 -4.987 6.14e-07
1 0.5874 0.1195 4.916 8.82e-07
337
Again, Pearson s X 2, G 2, PD 2 / 3, and F 2 are used for the GOF test of this Poisson model. As
338
339 can be seen from Table 4Table 4, the GOF testing results are consistent with those of the first
340 example, which does not warrant further discussion.
341
342 Table 4 Results of GOF tests for the Poisson model
X2 G2 F2
Statistics PD
GOF value 104.84 103.01 116.08 168.87
Degrees of Freedom 86 86 86 86
p-value 0.08 0.10 0.02 0.00
Expectation* 1.00 0.99 1.14 1.75
Variance** 2.81 1.99 1.70 4.76
343 : The means of test statistics when the Poisson mean is 1.386.
344 : The variances of test statistics when the Poisson mean is 1.386.
345
346
347
348
349
11
350 Test Statistics for Negative Binomial Models
351
352 Characteristics of Statistical Tests
353
For NB distributions, the variance can be calculated as Var (Yi ) i i2 / . Thus, the
354
yi i [ yi i ]2
2
n n
Pearson s X statistic becomes X ( ; n) [ ]
2 2
355 . Based on the definition
i i i2 /
i 1 i 1
2
356 of the scaled deviance (Wood, 2002), the G statistic for a NB model is calculated by
y n
G 2 ( ; ; n) 2 [ log( i ) yi log( i i
357 )] .
yi i ( yi )
i 1
358
To show the accuracy and reliability of the Pearson s X 2 and G 2 statistics for GOF tests, the
359
360 components of these statistics are examined again, using the same kind of calculations shown in
Equations 5 and 6, in which fY is now the pmf of the NB distribution. Note that the Power-
361
362 Divergence statistics were not used as test statistics in this study for the NB distribution, since
they do not exist in the statistical literature. The mean and variance of the Pearson s X 2 and G 2
363
statistics with different parameter settings are shown in Figure 3. The NB mean varies from
364
0 to 10; the inverse dispersion parameters are 1, 3 and 5, respectively. It can be observed that
365
has a great effect on the distributions of those two statistics. For the Pearson s X 2 statistic,
366
the smaller the inverse dispersion parameter, the larger the V(X2) value, given a known NB mean
367
value. The components of the Pearson s X 2 statistic do not fit 2 distributions, as V(X2) is
368
generally much larger than 2 for low values. V(X2) is still larger than 3 even when 10 and
369
5 . Therefore, the Pearson s X 2 statistic will underestimate the degree of fit (p-value) and
370
tend to reject fitted models more easily in practice. For the G 2 statistic, V(G2) may increase or
371
decrease drastically for 1, then gradually stabilizes depending on . When is as high as 10
372
and 1, the variance V(G2) is still not quite stable.
373
12
Ye, Zhang, and Lord 13
b) V(G2)
a ) V (X2)
3
12
Variance (V(X2))
Variance (V(G2))
10
8 2
phi=1 phi=1
6 phi=3 phi=3
4 1
phi=5 phi=5
2
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Negative Binomial Mean Negative Binomial Mean
c) E(X2) d) E(G2)
2
Expectation
Expectation
1
(E(X2))
(E(G2))
phi=1 phi=1
1 phi=3 phi=3
phi=5 phi=5
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Negative Binomial Mean Negative Binomial Mean
374
Figure 3 Mean and variance of components of the X 2 and G 2 statistics
375
13
With the increase of, the value for V(G2) to become stable decreases. For example, when
376
3, V(G2) becomes stable when is around 4, and when 5, V(G2) will be relatively
377
stable when is around 3. E(G2) is generally greater than 1 for 1 and less than 1
378
for 0 1 . [Important note: the inverse dispersion parameter is assumed to be properly
379
380 estimated. As discussed by Lord (2006), the inverse dispersion parameter can become
381 misestimated as the sample mean values decrease and the sample size becomes small.]
382
Overall, both Pearson s X 2 and G 2 statistics are not quite accurate and reliable for the GOF test
383
384 of NB models with low sample means, especially when the crash data are highly overdispersed
( is small). As a result, the authors recommend the use of the grouped G 2 method for the GOF
385
386 test of NB models. An example is given below to show the differences between GOF test
387 statistics for NB models.
388
389 Example Applications with Observed Data
390
391 An annual crash-flow dataset was collected from 255 signalized 3-legged intersections in
392 Toronto, Ontario (Lord, 2000). This dataset includes the number of serious injury crashes and
393 entering AADTs for the major and minor approaches at each intersection. The crash counts are
394 overdispersed with a mean of 1.43 and a variance of 3.49. A NB regression model was thus used
for the modeling of this dataset. The functional form 0 F1 1 F2 2 was again used for the
395
396 prediction of the number of crashes. The results of the fitted model are summarized in Table
397 5Table 5. All the coefficients are significant at the significant level of 0.01. The inverse
398 dispersion parameter was estimated to be 2.76.
399
400 Table 5 Modeling outputs of the Negative Binomial model
Coefficients Est. Value Std. Error z value Pr(> z )
0 7.988E-07 2.0122 -6.978 3.00E-12
1 1.0241 0.1951 5.249 1.53E-07
2 0.4868 0.0821 5.926 3.10E-09
401
Pearson s X 2, G 2 and the grouped G 2 were used for evaluating the GOF test of the NB model.
402
403 According to the grouping rules in (Wood, 2002), the minimum grouping size for the dataset is
grouped G 2
404 equal to 2, and the expression for calculating the is
y n
Grouped _ G 2 2 ri [ log( i ) y i log( i i )], where ri is the grouping size for the
405
yi i ( yi )
i 1
ith group.
406
407
408 The results of GOF tests are summarized in Table 6Table 6. The degrees of freedom are 252 for
Pearson s X 2 and G 2, and 125 for the grouped G 2 . All three test statistics accepted the fitted
409
model at the significance level of 0.05. The grouped G 2 statistic and the Pearson s X 2 statistic
410
have the highest and lowest p-values, respectively. The table also shows the expectations and
411
variances of the components of Pearson s X 2 and G 2 statistics, given the known parameters
412
14
( and ). It can be seen that E(X2) is 1 and V(X2) is 4.63. V(X2) is much larger than 2 and this
413
has caused the overestimation of GOF values. Thus, the p-value (0.09) of the X 2 is lower than
414
the actual value. E(G2) and V(G2) with low NB mean values are shown in Figure 4 for
415
f = 2.76 . When is around 1.43, E(G2) is higher than 1, which may have resulted in the
416
overestimation of GOF values and underestimation of the power of fit; V(G2) is very unstable for
417
low values. The p-value of the grouped G 2 statistic is slightly higher than that of G 2 . This is
418
expected since the G 2 statistic has underestimated the true p-value. Thus, this example shows
419
that the grouped G 2, although more complicated than the traditional methods, provides better
420
421 results for the GOF test of NB models.
422
423 Table 6 Results of GOF tests for the Negative Binomial model
X2 G2 Grouped_G2
Statistics
GOF value 282.20 269.80 136.46
Degrees of Freedom 252 252 125
p-value 0.09 0.21 0.23
Expectation* 1 1.12 N/A
Variance** 4.63 1.42 N/A
424 : The means of test statistics when the NB mean is 1.43 and the inverse dispersion parameter is 2.756.
425 : The variances of test statistics when the NB mean is 1.43 and the inverse dispersion parameter is 2.756.
426
2.5
2
E(G2) and V(G2)
1.421
1.5
E(G2)
V(G2)
1
1 .1167
0.5
0
0 1 2 3 4 5
Negative Binomial Mean
427
Figure 4 E(G2) and V(G2) versus NB mean with 2.756
428
429
430
15
431 DISCUSSION
432
The results of this study show that the Pearson s X 2 statistic tends to overestimate GOF values
433
( yi i ) 2
for low values, since V(X ) are larger than 2. This is because the components (i.e.,
2
434
i
for Poisson models) will be inflated when the predicted values ( i ) are low. For instance, with
435
436 the observed crash dataset in the first case, the Poisson model predicted 1.02 crashes per year for
437 one of the intersections. However, 4 crashes were observed at that intersection. The contribution
to X 2 would be (4 1.02) 2 /1.02 8.71 and larger than the nominal value. The phenomenon
438
explains why V(X2)>2 for low values.
439
440
Undoubtedly, for Poisson regression models, the Power-Divergence statistic ( PD 2 / 3 ) follows
441
an approximate 2 distribution and is the best test statistic for measuring the GOF for these
442
models. This statistic performs better than the other three statistics for almost all values,
443
except when is very low. However, when is very small, no test statistics can provide
444
accurate and stable results of GOF tests. This statistic is preferred to the Pearson s X 2 statistic
445
for all cases. For 1, the variance of PD 2 / 3 statistic (V(PD)) performs like a compromise
446
between V(X2) and V(G2), and contributes to more accurate and stable GOF tests.
447
448
From Figures 1 and 3, it is also observed that the performance of Pearson s X 2 and G 2 becomes
449
450 worse with the increase in overdispersion. The Poisson model is a special case of the NB model,
451 in which the inverse dispersion parameter is infinite. Therefore, the estimation of the inverse
452 dispersion parameter from observed data will affect the results of GOF tests. It should be noted
453 that the traditional estimators of the inverse dispersion parameter do not have accurate and stable
454 estimations under low mean conditions, as described above (Lord, 2006). For NB models, both
Pearson s X 2 and G 2 do not have accurate results of GOF tests, especially under low sample
455
mean conditions. Under such conditions, the grouped G 2 method is recommended, as it will
456
457 provide better results for GOF tests of NB models.
458
The results of this study provide guidance on the use of the grouped G 2 method. Based on the
459
curves of G 2 illustrated in Figure 1, it is found that the G 2 method or the grouped G 2 method is
460
461 an appropriate test statistic only when the grouped mean is 1.5 or higher. Theoretically, the
grouped G 2 method can be used for samples with extreme low means (e.g. less than 0.3).
462
463 However, when grouping a sample with a low mean value to achieve a grouped mean of 1.5 or
464 higher, the grouped sample size will be significantly reduced, which may lead to issues
465 associated with small samples. For NB regression models, the problem becomes more complex
as the minimum grouped mean is determined by the inverse dispersion parameter . The
466
467 recommended minimum means (or group means) for different inverse dispersion parameters are
shown in Figure 5. The minimum mean decreases when increases. For less than 1, the
468
minimum mean increases sharply with a decreasing . Thus, when using the grouped G 2
469
470 method, the grouped mean is suggested to meet the requirements presented in this figure. With
16
471 the increase of the inverse dispersion parameter towards infinite (Poisson model), the
472 recommended minimum mean decreases slowly to approximately 1.5.
473
474
475 Figure 5 Recommended Minimum Means versus Inverse Dispersion Parameter of the NB model
476
477 CONCLUSIONS AND FUTURE WORK
478
479 The Poisson and NB regression models are the two most commonly used types of models for
480 analyzing traffic crashes. These models help establish the relationship between traffic crashes
481 (response variable) and traffic flow, highway geometrics, and other explanatory variables. To
482 evaluate their statistical performance, GOF tests need to be used. Since crash data are often
483 characterized by low sample mean values and it has been found that traditional GOF statistics do
484 not perform very well under these conditions. Consequently, there was need to determine
485 whether