Test Data

Location:

Bozeman, MT

Posted:

January 11, 2013

Contact this candidate

Resume:

GOODNESS-OF-FIT TESTING FOR ACCIDENT MODELS WITH LOW

MEANS

* ****** **

* ******** *********, ******* ************** Institute, Montana State University,

* *******, **, ***, *-mail: *****.**@***.*******.***

9 Yunlong Zhang

10 Associate Professor, Zachry Department of Civil Engineering, Texas A&M University,

11 College Station, TX, USA, e-mail: ******@*****.****.***

13 Dominique Lord

14 Associate Professor, Zachry Department of Civil Engineering, Texas A&M University

15 College Station, TX, USA, e-mail: *-****@****.***

Submitted to the 3rd International Conference on Road Safety and Simulation,

19 September 14-16, 2011, Indianapolis, USA

22 ABSTRACT

24 The modeling of relationships between motor vehicle crashes and underlying factors has been

25 investigated for more than three decades. Recently, many highway safety studies have

26 documented the use of Poisson regression models, negative binomial (NB) regression models or

both. Pearson s X 2 and the scaled deviance ( G 2 ) are two common test statistics that have been

28 proposed as measures of goodness-of-fit (GOF) for Poisson or NB models. Unfortunately,

29 transportation safety analysts often deal with crash data that are characterized by low sample

30 mean values. Under such conditions, the traditional test statistics may not perform very well.

32 This study has two objectives. The first objective is to examine the accuracy and reliability of

33 traditional test statistics for the GOF of accident models subjected to low sample means. The

34 second objective intends to identify a superior test statistic for evaluating the GOF of accident

35 prediction models. For Poisson models, this paper proposes a better yet easy to use test statistic

36 that can be applied for almost all sample mean values, except when the mean value is extremely

37 low, for which no traditional test statistic can be accurate. For Poisson-Gamma models, this

38 study demonstrates that traditional test statistics are not accurate and robust. A more complex

method (grouped G 2 ) proposed in a previous study is recommended. Guidance on the use of the

grouped G 2 methods is further provided. Examples using observed data are used to help

41 illustrate the performance of different test statistics and support the findings of this study.

43 Keywords: crash data, generalized linear model, goodness-of-fit, power-divergence.

46 INTRODUCTION

48 The modeling of relationships between motor vehicle crashes and underlying factors, such as

49 traffic volume and highway geometric features has been investigated for more than three

50 decades. The statistical models (sometimes referred to as crash prediction models) from which

51 these relationships are developed can be used for various purposes, including predicting crashes

52 on transportation facilities and determining which variables significantly influence crashes.

53 Recently, many highway safety studies have documented the use of Poisson regression models

54 (Joshua and Garber, 1990; Miaou et al., 1992; Ivan and Bernardo, 2000; Lord and Bonneson,

55 2007), negative binomial (NB) regression models (Miaou and Lum, 1993; Poch and Mannering,

56 1996; Miaou and Lord, 2003; Maycock and Hall, 1984; Lord et al., 2005) or both (Miaou, 1994;

57 Maher and Summersgill, 1996). With the Poisson or Poisson-Gamma (or NB) models, the

58 relationships between motor vehicle crashes and explanatory variables can then be developed by

59 means of the Generalized Linear Model (GLM) framework.

Pearson s X 2 and the scaled deviance ( G 2 ) are two common test statistics that have been

62 proposed as measures of GOF for Poisson or NB models (Maher and Summersgill, 1996).

63 Statistical software (e.g., SAS) also uses these two statistics for assessing the GOF of a GLM

64 (SAS Institute Inc., 1999). Unfortunately, transportation safety analysts often deal with crash

65 data that are subjected to low sample mean values. Under such conditions, the traditional test

66 statistics may not perform very well. This has been referred to in the highway safety literature as

67 the low mean problem (LMP). The study by Sukhatme (1938) concluded that, for samples from

a Poisson distribution with mean as low as one, Pearson s X 2 test for goodness of fit is not

69 good. In the field of traffic safety, this issue was first raised by Maycock and Hall (1984) and

70 further discussed by Maher and Summersgill (1996), Fridstrom et al., (1995), and Agrawal and

Lord (2006). Wood (2002) proposed a more complex technique, the grouped G 2 method, to

solve this problem. The grouped G2 method is based on the knowledge that through grouping,

the data become approximately normally distributed and the test statistics follow a 2

74 distribution. Some issues regarding this method are discussed in the third section. It should be

75 noted that the comparison of different models can be achieved by means of Akaike s Information

76 Criterion (AIC) (Akaike, 1974) or Bayesian Information Criterion (BIC) (Schwarz, 1978).

77 However, similar to the previous studies (Maher and Summersgill, 1996; Wood, 2002; Agrawal

78 and Lord, 2006), this research intends to study statistics for the GOF of a given model (either

Poisson model or NB model); thus, we mainly focused on the study of the statistics of X 2, G 2

80 and the proposed statistic (Power-Divergence).

82 This study expands on the work of Wood (2002) and has two objectives. The first objective is to

83 examine the accuracy and reliability of traditional test statistics for the GOF of GLMs subjected

84 to low sample means. The second objective intends to identify a superior test statistic for

85 evaluating the GOF of crash prediction models. The study is accomplished by first theoretically

86 deriving the problems related with these traditional tests. Observed data are then used to

87 demonstrate the problems noted in the first part of the paper.

89 This paper is divided into five sections. The second section describes the characteristics of

90 Poisson and NB models used in traffic crash modeling. The third section provides an analysis

91 and comparison of different GOF test statistics for the Poisson and NB models. Observed crash

92 data are used for this analysis. In the fourth section, several important issues related to the GOF

93 test statistics are discussed. The last section summarizes the key findings of this study.

95 STATISTICAL MODELS

97 GLMs represent a class of fixed-effect regression models for dependent variables (McCullagh

98 and Nelder, 1989), such as crash counts in traffic accident models. Common GLMs include

99 linear regression, logistic regression, and Poisson regression. Given the characteristics of motor

100 vehicle collisions (i.e., random, discrete, and non-negative independent events), stochastic

101 modeling methods need to be used over deterministic methods. The two most common stochastic

102 modeling methods utilized for analyzing motor vehicle crashes are the Poisson and the NB

103 regression models. For these models, the relationship between traffic accidents and explanatory

104 variables is established through a loglinear function (i.e., canonical link or linear predictor). For

105 example, to establish the crash-flow relationship at intersections, the fitted model can follow the

form 0 F1 1 F2 2, where is the estimated number of crashes, F1 and F2 are the

106

107 entering AADTs (Average Annual Daily Traffic) for major and minor approaches, and

0, 1, 2 are the estimated coefficients. This fitted model can thus be used for predicting

108

109 crashes for different flow values.

110

111 Poisson Regression Model

112

The Poisson regression model aims at modeling a crash count variable Y, which follows a

113

Poisson distribution with a parameter (or mean) . The probability that the number of crashes

114

iy e u

i i

takes the value y i on the ith entity is P (Yi y i ) f Yi ( y i ; i ), i 1,2 n . For a

115

yi !

116 Poisson distribution, the variance is equal to the mean.

117

118 The systematic portion of the model involves the explanatory variables x1, x 2 x m, such as

119 traffic volumes, highway geometrics, v/c (volume/capacity) ratios and so on. The model is then

established through a linear predictor . This predictor is usually a linear function of the

120

logarithm of the explanatory variables in traffic crash models: 0 i xi, where i is the

121

i 1

122 Poisson regression coefficient for the ith explanatory variable x i . The coefficients are estimated

123 based on observed data. Finally, the model is estimated through a logarithm link function

j g ( j ) log( j ) (Myers et al., 2002).

124

125

126 Negative Binomial Regression Models

127

128 Although Poisson regression models are rather simple, crash data often exhibit overdispersion,

129 meaning that the variance is greater than the mean. The NB regression models are thus used for

130 modeling such data. The NB regression models have the same forms of linear predictor and

131 logarithm link function as the Poisson regression models, except that the response variable Y

132 follows a NB distribution, in which the probability mass function (pmf) is defined as follows:

133

( y i ) i yi

P (Yi y i ) f Yi ( y i ; ; i )

134 ), ( y i 1) i i

135

where is a Gamma function, and is the inverse dispersion parameter. The relationship

136

between the variance and the mean of NB distribution is presented as Var (Yi ) i i2 / . The

137

138 inverse dispersion parameter is usually assumed to be fixed and can be estimated from observed

139 data using the method of moments or the (Bootstrapped) maximum likelihood (Anscombe, 1949;

140 Fisher, 1941; Zhang et al., 2007). However, recent research have shown that the inverse

141 dispersion parameters may be related to the explanatory variables (Miaou and Lord, 2003; Mitra

142 and Washington, 2007).

143

144 GOODNESS-OF-FIT TEST STATISTICS

145

146 GOF tests use the properties of a hypothesized distribution to assess whether or not observed

147 data are generated from a given distribution (Read and Cressie, 1988). The most well-known

GOF test statistics are Pearson s X 2 and the scaled deviance ( G 2 ). Pearson s X 2 is generally

148

yi i

calculated as follows: X [ ], where y i is the observed data, i is the true mean

149

i 1

from the model, and i is the error and is usually represented by the standard deviation of y i .

150

151 The scaled deviance is calculated as twice the difference between the log-likelihood under the

152 maximum model and the log-likelihood under the reduced (or unsaturated) model:

G 2 2(log Lmax . log Lred . ) (Wood, 2002).

153

154

Previous research has shown that both the Pearson s X 2 and G 2 statistics are not 2 distributed

155

156 under low sample mean conditions (Maycock and Hall, 1984; Maher and Summersgill, 1996;

157 Wood, 2002; Fridstrom et al., 1995; Agrawal and Lord, 2006). To solve this problem, Maher and

Summersgill (1996) proposed a test statistic ( G 2 / E (G 2 ) ) for GOF tests. Wood (2002) showed

158

159 that this test still failed with low sample mean values. Wood (2002) then suggested a grouped

G 2 test statistic for solving this problem. The development of the grouped G 2 is based on the

160

161 knowledge that by increasing the mean value, the data are approximately normally distributed

and the statistics follow a 2 distribution. This method first determines an appropriate group

162

size r, which is the minimum grouping size. The raw data are then grouped so that each

163

observation is in a group of size at least as large as r . Additional details about the other steps

164

165 can be found in Wood (2002).

166

167 There are some issues with this method, however, that need to be addressed with the method

168 proposed by Wood (2002). First, the grouping size may vary from group to group with a

169 minimum grouping size, which is determined by the sample mean of a Poisson model or the

170 critical mean values in a NB model, as defined in Wood (2002). Thus, it is possible that changing

171 grouping sizes while maintaining the same minimum grouping size may lead to different testing

172 results. Second, through grouping, the sample size will be smaller, and that may become an issue

173 especially when the grouping size is not small. Thus, as commented by Wood, a compromise has

174 to be made between strong grouping (which ensures that the Chi-square assumption for the

175 distribution of the test statistic holds) and weak grouping (which allows to test against a richer

alternative hypothesis). Finally, the grouped G 2, which includes five steps, is not a simple

176

177 procedure for practitioners or average transportation safety analysts who frequently analyze

178 crash data.

179

180 To summarize, several GOF test statistics have been proposed to evaluate the fit of models, but

181 their performance and complexity vary greatly. Therefore, simple but accurate and reliable

182 alternative test statistics are highly desirable to account for the LMP commonly observed in

183 crash studies.

184

185 In Wood s study (2002), a simple criterion to assess whether or not a test statistic is appropriate

186 for testing the GOF of regression models is to examine the test statistic s performance for a

single distribution (Poisson or NB) with known parameters. For this criterion, the grouped G 2

187

188 method was developed to improve the normality of observations and allow the mean and

variance of the G 2 statistic (for low mean values) to be close to 1 and 2 ( 12 distributed),

189

190 respectively. Similarly in this study, we examine the mean and variance of different statistics

191 under a single distribution context to judge their appropriateness for the GOF of GLM.

192

193 Test Statistics for Poisson Models

194

195 Characteristics of Statistical Tests

196

The most common test statistics are Pearson s X 2 and the scaled deviance ( G 2 ). For a Poisson

197

model, the variance is equal to the mean and Pearson s X 2 is presented below:

198

199

yi i ( yi i ) 2

n n

X 2 ( ; n) [ ]

200 (1)

i i

i 1 i 1

201

202 The scaled deviance for a Poisson model is (Maher and Summersgill, 1996)

203

G 2 ( ; n) 2[ y i log ( y i i )]

204 (2)

i 1

205

206 In this paper, we investigate other test statistics for the GOF test of the Poisson model, especially

207 when it is characterized by low sample mean values. This research draws from some other work

208 in the statistical literature.

209

Cressie and Read (1984 & 1988) incorporated the Pearson s X 2 and G 2 statistics into a family

210

of Power-Divergence Statistics ( PD, R ). In this family, each member PD is the sum of

211

212 deviance between the observed and expected counts:

213

PD a ( y i, i )

214 (3)

[yi (( i ) 1)],

( 1) i 1 i

215

where a denotes the distance function. Different values of lead to different GOF statistics

216

(Cressie and Read, 1984 & 1988; Baggerly, 1998), such as the Pearson s X 2 statistic

217

when 1, the Freeman-Tukey statistic F 2 PD 1 / 2 4 ( y i i ) 2 when 1 / 2

218

i 1

219 (Freeman and Tukey, 1950), and the Neyman-modified statistic

( y i ) 2

NM 2 PD 2 i when 2 (Neyman, 1949). The Power-Divergence statistic

220

i 1

221 can be also written as (Cressie and Read, 1989)

222

[yi (( i ) 1)] ( yi i )],

223 (4)

( 1) i 1 i

224

Hence, when 0, the power divergence leads to the G 2 statistic (Cressie and Read, 1989).

225

226

Cressie and Read (1988) recommended 2 / 3, with which the statistic PD 2 / 3 will be

227

approximated by the 2 distribution in many situations and give the most reasonable power for

228

GOFs. When 2 / 3, the test statistic of Power-Divergence becomes

229

9y y 6

PD 2 / 3 [ i (( i ) 2 / 3 1) ( y i u i )], as derived from Equation 4.

230

5 i 5

i 1

231

232 GOF tests using different statistics rest on the assumption that the statistics follow an

approximate 2 distribution that has a mean of 1 and a variance of 2. Thus, to evaluate a test

233

statistic for GOF tests, we can investigate how well its components follow a 2 distribution.

234

235 With this criterion, different test statistics can be compared and evaluated.

236

Pearson s X 2, the G 2, Power-Divergence with 2 / 3 ( PD 2 / 3 ), and the Freeman-Tukey

237

statistic F 2 4 ( yi i ) 2 (Freeman and Tukey, 1950) are used for the examination of the

238

i 1

fit of distributions. In the case that crash data have zero counts at some locations, the

239

Neyman-modified X 2 goes to infinity and is therefore excluded from the comparison analysis.

240

241 Figure 1 shows the mean and variance of the components of those four statistics, for the Poisson

mean less than 10. The following equations show the calculations of mean and variance of the

242

Pearson s X 2 statistic, given a known Poisson mean value :

243

E ( X 2 ) X 2 fY ( k ; )

244 (5)

y 0

V ( X 2 ) [ X 2 E ( X 2 )]2 fY (k ; )

245 (6)

y 0

where fY (k ; ) is the pmf of Poisson distributions and k is the number of occurrence of an

246

event. The mean and variance of other statistics over different values can be calculated in this

247

248 way.

249

The comparisons are first conducted for values varying from 1 to 10. They are shown in

250

Figure 1. From this figure, Pearson s X 2 has a mean value (E(X2)) of 1 for all values, but its

251

variance (V(X2)) is greater than 2. With the decrease of, the variance increases. Thus, for low

252

conditions, Pearson s X 2 is not reliable and as a result, tends to overestimate GOF values. In

253

fact, V(X2) is equal to 2 1 / and this has also been described in the study by Wood (2002).

254

The mean of the scaled deviance (E(G2)) is slightly larger than 1 (when >1) and moves toward

255

1 as rises; the variance (V(G2)) increases from less than 1 to around 2.4 and then decreases

256

toward 2. The Freeman-Tukey statistic does not have a good fit of 2 distributions even when

257

10 . The mean and the variance of the PD 2 / 3 statistic, however, are rather close to 1 and 2

258

respectively. The components of the PD 2 / 3 statistic fit 2 distributions almost perfectly as

259

long as u >1. Therefore, the PD 2 / 3 is recommended for GOF tests for [1, 10] .

260

261

Figure 2 shows the comparison of mean and variance of X 2, G 2, and PD 2 / 3, for 0.1 1 .

262

It can be observed that E(G2) varies from 0.47 to 1.15, while E(PD) increases from 0.7 to 0.98.

263

Overall, E(PD) is more stable based on the rate of increase and is much closer to 1.0 than E(G2).

264

For 0.3, the difference between E(PD) and E(X2), which is exactly 1, is very small and

265

negligible. V(G2) is always less than 2 and even less than 1 given 0.7 ; V(PD) has the same

266

tendency as V(X2), but is more stable and gets close to 2.0 even when is as small as 0.3, while

267

V(X2) stays above 3.0 at 1 . It can be also seen that V(PD) performs like a compromise

268

between V(X2) and V(G2). From the above comparisons, for [0.3, 1], the components of

269

PD 2 / 3 are approximately 2 distributed and PD 2 / 3 performs better than the other statistics.

270

For 0.3, no statistic is reliable for GOF tests, and practitioners may consider turning to the

271

more complicated grouped G 2 method.

272

273

274 Based on Figures 1 and 2, PD 2 / 3 is better than the other statistics and its components generally

fit 2 distributions well for 0.3 . Pearson s X 2 is slightly better than G 2 for 3, but

275

even when 10, Pearson s X 2 and the G 2 are not satisfactory, with means and variances of

276

(E(X2)=1.00, V(X2)=2.10) and (E(G2)=1.02, V(G2)= 2.09), respectively.

277

278

Ye, Zhang, and Lord 8

a ) C hi quare

-S b) S cal D evi

ed ance

ance

8 8

ance

E (X2) E (G 2)

E xpecton and V ari

V (X 2) V (G 2)

0 1 2 3 4 5 6 7 8 9 10

P oi

sson M ean

P oi

sson M ean

d) P ow er-D i

vengence

c) F reem an-T ukey

ance

8 E (P D )

E (F 2)

E xpecton and V ari

V (P D )

6 V (F 2)

0 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

P oi

sson M ean P oi

sson M ean

279

Figure 1 Mean and variance of the components of different test statistics for 0 u 10

280

a) Comparison of Variance

5.33

Variance

V(X2)

3.00

3 V(G2)

V(PD)

2.58

2 1.99

1.36

0.66

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Poisson Mean

b) Comparison of Expectation

Expectation

1.15 E(X2)

1.00

E(G2)

0.87

0.98 E(PD)

0.84

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Poisson Mean

281

Figure 2 Mean and variance of components of different test statistics for 0.1 u 1

282

283

284 Example Applications with Observed Data

285

286 To show how different GOF test statistics affect the fit of Poisson models, two examples using

287 observed crash data are provided. It is worth noting that the core of the study is to statistically

288 investigate the performance of different test statistics for GOF under low mean conditions. The

289 following data are used as examples to help support the findings from statistical investigation,

290 but not to serve as alternative approach to investigate their performance.

291

292 For the first example, the data were collected at 59 four-legged unsignalized intersections in

293 1991 in Toronto, Ontario (Lord, 2000). The dataset includes the number of crashes and entering

294 AADT for the major and minor approaches at each site. Both Poisson and NB GLM were used

295 for modeling this dataset, but the NB model converged to Poisson, with an inverse dispersion

296 parameter that tended towards infinity (Lord and Bonneson, 2007). The mean of this dataset is

297 0.97. The variance is roughly the same as the mean. Thus, a Poisson GLM could be used for

modeling this dataset. The functional form 0 F1 1 F2 2 is used for the prediction of the

298

299 number of crashes. As stated by Lord (2006), it is the most common functional form used by

300 transportation safety analysts for modeling crash data at intersections. The outputs of the fitted

301 model are shown in Table 1. It can be seen that all coefficients are still significant even at the

302 significance level of 0.01.

303

304 Table 1 Modeling outputs of the Poisson model

Coefficients Est. Value Std. Error z value Pr(> z )

0 2.3439E-06 4.2895 -3.022 0.0025

1 0.8175 0.3145 2.599 0.0093

2 0.6348 0.2349 2.7303 0.0069

305

Pearson s X 2, G 2, PD 2 / 3, and F 2 are used for the GOF test of this Poisson model. The

306

307 results of the GOF tests are summarized in Table 2. The PD 2 / 3 statistic has a lower GOF value

and correspondingly a higher p-value than the Pearson s X 2 statistic. The GOF value of G 2 is

308

higher than Pearson s X 2 . The F 2 statistic has the lowest p-value. To explain their differences,

309

Table 2 also lists the mean and variance of those test statistics given the Poisson mean 0.97 .

310

311 The mean and variance of the distribution of the test statistics can also be seen from Figure 1 or

Figure 2. It is clear that the components of the PD 2 / 3 statistic are rather close to a 2

312

distribution. For the Pearson s X 2 statistic, E ( X 2 ) 1 and V ( X 2 ) 3.03 . The variance V(X2)

313

is larger than 2 and may have overestimated the GOF value given E ( X 2 ) 1 . For the G 2

314

statistic, although V (G 2 ) 1.23 2, the mean E (G 2 ) 1.14 is higher than 1 and can also result

315

in overestimations of GOF values. Similarly, the F 2 statistic will also overestimate GOF values.

316

317

318

319

320

321 Table 2 Results of GOF tests for the Poisson model

X2 G2 F2

Statistics PD

GOF value 52.71 51.76 59.85 93.41

Degrees of Freedom 56 56 56 56

p-value 0.60 0.64 0.34 0.00

Expectation 1.00 0.98 1.14 1.81

Variance 3.03 1.99 1.23 3.20

322 : The means of test statistics when the Poisson mean is 0.97.

323 : The variances of test statistics when the Poisson mean is 0.97.

324

325 For the second example, the data were collected at 88 frontage road segments in the State of

326 Texas (Lord and Bonneson, 2007). The dataset includes the number of serious injury crashes

327 (KAB or K=Fatal, Injury Type A incapacitated, and Injury Type B non-incapacitated),

328 segment length, and AADT. The mean of this dataset is 1.386 and the variance is 1.642. Both

329 Poisson and NB GLM were used for modeling this dataset, but the NB model converged to

330 Poisson, with an inverse dispersion parameter that tended towards infinity (Lord and Bonneson,

2007). The functional form 0 * L * F 1 was used for the prediction of the number of crashes,

331

332 where L represents the segment length and F is the AADT. The modeling results are shown in

333 Table 3Table 3. It can be seen that both coefficients are significant at the significance level of

334 0.01.

335

336 Table 3 Modeling outputs of the Poisson model

Coefficients Est. Value Std. Error z value Pr(> z )

0 0.01536 0.8374 -4.987 6.14e-07

1 0.5874 0.1195 4.916 8.82e-07

337

Again, Pearson s X 2, G 2, PD 2 / 3, and F 2 are used for the GOF test of this Poisson model. As

338

339 can be seen from Table 4Table 4, the GOF testing results are consistent with those of the first

340 example, which does not warrant further discussion.

341

342 Table 4 Results of GOF tests for the Poisson model

X2 G2 F2

Statistics PD

GOF value 104.84 103.01 116.08 168.87

Degrees of Freedom 86 86 86 86

p-value 0.08 0.10 0.02 0.00

Expectation* 1.00 0.99 1.14 1.75

Variance** 2.81 1.99 1.70 4.76

343 : The means of test statistics when the Poisson mean is 1.386.

344 : The variances of test statistics when the Poisson mean is 1.386.

345

346

347

348

349

350 Test Statistics for Negative Binomial Models

351

352 Characteristics of Statistical Tests

353

For NB distributions, the variance can be calculated as Var (Yi ) i i2 / . Thus, the

354

yi i [ yi i ]2

n n

Pearson s X statistic becomes X ( ; n) [ ]

2 2

355 . Based on the definition

i i i2 /

i 1 i 1

356 of the scaled deviance (Wood, 2002), the G statistic for a NB model is calculated by

y n

G 2 ( ; ; n) 2 [ log( i ) yi log( i i

357 )] .

yi i ( yi )

i 1

358

To show the accuracy and reliability of the Pearson s X 2 and G 2 statistics for GOF tests, the

359

360 components of these statistics are examined again, using the same kind of calculations shown in

Equations 5 and 6, in which fY is now the pmf of the NB distribution. Note that the Power-

361

362 Divergence statistics were not used as test statistics in this study for the NB distribution, since

they do not exist in the statistical literature. The mean and variance of the Pearson s X 2 and G 2

363

statistics with different parameter settings are shown in Figure 3. The NB mean varies from

364

0 to 10; the inverse dispersion parameters are 1, 3 and 5, respectively. It can be observed that

365

has a great effect on the distributions of those two statistics. For the Pearson s X 2 statistic,

366

the smaller the inverse dispersion parameter, the larger the V(X2) value, given a known NB mean

367

value. The components of the Pearson s X 2 statistic do not fit 2 distributions, as V(X2) is

368

generally much larger than 2 for low values. V(X2) is still larger than 3 even when 10 and

369

5 . Therefore, the Pearson s X 2 statistic will underestimate the degree of fit (p-value) and

370

tend to reject fitted models more easily in practice. For the G 2 statistic, V(G2) may increase or

371

decrease drastically for 1, then gradually stabilizes depending on . When is as high as 10

372

and 1, the variance V(G2) is still not quite stable.

373

Ye, Zhang, and Lord 13

b) V(G2)

a ) V (X2)

Variance (V(X2))

Variance (V(G2))

8 2

phi=1 phi=1

6 phi=3 phi=3

4 1

phi=5 phi=5

0 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Negative Binomial Mean Negative Binomial Mean

c) E(X2) d) E(G2)

Expectation

(E(X2))

(E(G2))

phi=1 phi=1

1 phi=3 phi=3

phi=5 phi=5

0 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Negative Binomial Mean Negative Binomial Mean

374

Figure 3 Mean and variance of components of the X 2 and G 2 statistics

375

With the increase of, the value for V(G2) to become stable decreases. For example, when

376

3, V(G2) becomes stable when is around 4, and when 5, V(G2) will be relatively

377

stable when is around 3. E(G2) is generally greater than 1 for 1 and less than 1

378

for 0 1 . [Important note: the inverse dispersion parameter is assumed to be properly

379

380 estimated. As discussed by Lord (2006), the inverse dispersion parameter can become

381 misestimated as the sample mean values decrease and the sample size becomes small.]

382

Overall, both Pearson s X 2 and G 2 statistics are not quite accurate and reliable for the GOF test

383

384 of NB models with low sample means, especially when the crash data are highly overdispersed

( is small). As a result, the authors recommend the use of the grouped G 2 method for the GOF

385

386 test of NB models. An example is given below to show the differences between GOF test

387 statistics for NB models.

388

389 Example Applications with Observed Data

390

391 An annual crash-flow dataset was collected from 255 signalized 3-legged intersections in

392 Toronto, Ontario (Lord, 2000). This dataset includes the number of serious injury crashes and

393 entering AADTs for the major and minor approaches at each intersection. The crash counts are

394 overdispersed with a mean of 1.43 and a variance of 3.49. A NB regression model was thus used

for the modeling of this dataset. The functional form 0 F1 1 F2 2 was again used for the

395

396 prediction of the number of crashes. The results of the fitted model are summarized in Table

397 5Table 5. All the coefficients are significant at the significant level of 0.01. The inverse

398 dispersion parameter was estimated to be 2.76.

399

400 Table 5 Modeling outputs of the Negative Binomial model

Coefficients Est. Value Std. Error z value Pr(> z )

0 7.988E-07 2.0122 -6.978 3.00E-12

1 1.0241 0.1951 5.249 1.53E-07

2 0.4868 0.0821 5.926 3.10E-09

401

Pearson s X 2, G 2 and the grouped G 2 were used for evaluating the GOF test of the NB model.

402

403 According to the grouping rules in (Wood, 2002), the minimum grouping size for the dataset is

grouped G 2

404 equal to 2, and the expression for calculating the is

y n

Grouped _ G 2 2 ri [ log( i ) y i log( i i )], where ri is the grouping size for the

405

yi i ( yi )

i 1

ith group.

406

407

408 The results of GOF tests are summarized in Table 6Table 6. The degrees of freedom are 252 for

Pearson s X 2 and G 2, and 125 for the grouped G 2 . All three test statistics accepted the fitted

409

model at the significance level of 0.05. The grouped G 2 statistic and the Pearson s X 2 statistic

410

have the highest and lowest p-values, respectively. The table also shows the expectations and

411

variances of the components of Pearson s X 2 and G 2 statistics, given the known parameters

412

( and ). It can be seen that E(X2) is 1 and V(X2) is 4.63. V(X2) is much larger than 2 and this

413

has caused the overestimation of GOF values. Thus, the p-value (0.09) of the X 2 is lower than

414

the actual value. E(G2) and V(G2) with low NB mean values are shown in Figure 4 for

415

f = 2.76 . When is around 1.43, E(G2) is higher than 1, which may have resulted in the

416

overestimation of GOF values and underestimation of the power of fit; V(G2) is very unstable for

417

low values. The p-value of the grouped G 2 statistic is slightly higher than that of G 2 . This is

418

expected since the G 2 statistic has underestimated the true p-value. Thus, this example shows

419

that the grouped G 2, although more complicated than the traditional methods, provides better

420

421 results for the GOF test of NB models.

422

423 Table 6 Results of GOF tests for the Negative Binomial model

X2 G2 Grouped_G2

Statistics

GOF value 282.20 269.80 136.46

Degrees of Freedom 252 252 125

p-value 0.09 0.21 0.23

Expectation* 1 1.12 N/A

Variance** 4.63 1.42 N/A

424 : The means of test statistics when the NB mean is 1.43 and the inverse dispersion parameter is 2.756.

425 : The variances of test statistics when the NB mean is 1.43 and the inverse dispersion parameter is 2.756.

426

2.5

E(G2) and V(G2)

1.421

1.5

E(G2)

V(G2)

1 .1167

0.5

0 1 2 3 4 5

Negative Binomial Mean

427

Figure 4 E(G2) and V(G2) versus NB mean with 2.756

428

429

430

431 DISCUSSION

432

The results of this study show that the Pearson s X 2 statistic tends to overestimate GOF values

433

( yi i ) 2

for low values, since V(X ) are larger than 2. This is because the components (i.e.,

434

for Poisson models) will be inflated when the predicted values ( i ) are low. For instance, with

435

436 the observed crash dataset in the first case, the Poisson model predicted 1.02 crashes per year for

437 one of the intersections. However, 4 crashes were observed at that intersection. The contribution

to X 2 would be (4 1.02) 2 /1.02 8.71 and larger than the nominal value. The phenomenon

438

explains why V(X2)>2 for low values.

439

440

Undoubtedly, for Poisson regression models, the Power-Divergence statistic ( PD 2 / 3 ) follows

441

an approximate 2 distribution and is the best test statistic for measuring the GOF for these

442

models. This statistic performs better than the other three statistics for almost all values,

443

except when is very low. However, when is very small, no test statistics can provide

444

accurate and stable results of GOF tests. This statistic is preferred to the Pearson s X 2 statistic

445

for all cases. For 1, the variance of PD 2 / 3 statistic (V(PD)) performs like a compromise

446

between V(X2) and V(G2), and contributes to more accurate and stable GOF tests.

447

448

From Figures 1 and 3, it is also observed that the performance of Pearson s X 2 and G 2 becomes

449

450 worse with the increase in overdispersion. The Poisson model is a special case of the NB model,

451 in which the inverse dispersion parameter is infinite. Therefore, the estimation of the inverse

452 dispersion parameter from observed data will affect the results of GOF tests. It should be noted

453 that the traditional estimators of the inverse dispersion parameter do not have accurate and stable

454 estimations under low mean conditions, as described above (Lord, 2006). For NB models, both

Pearson s X 2 and G 2 do not have accurate results of GOF tests, especially under low sample

455

mean conditions. Under such conditions, the grouped G 2 method is recommended, as it will

456

457 provide better results for GOF tests of NB models.

458

The results of this study provide guidance on the use of the grouped G 2 method. Based on the

459

curves of G 2 illustrated in Figure 1, it is found that the G 2 method or the grouped G 2 method is

460

461 an appropriate test statistic only when the grouped mean is 1.5 or higher. Theoretically, the

grouped G 2 method can be used for samples with extreme low means (e.g. less than 0.3).

462

463 However, when grouping a sample with a low mean value to achieve a grouped mean of 1.5 or

464 higher, the grouped sample size will be significantly reduced, which may lead to issues

465 associated with small samples. For NB regression models, the problem becomes more complex

as the minimum grouped mean is determined by the inverse dispersion parameter . The

466

467 recommended minimum means (or group means) for different inverse dispersion parameters are

shown in Figure 5. The minimum mean decreases when increases. For less than 1, the

468

minimum mean increases sharply with a decreasing . Thus, when using the grouped G 2

469

470 method, the grouped mean is suggested to meet the requirements presented in this figure. With

471 the increase of the inverse dispersion parameter towards infinite (Poisson model), the

472 recommended minimum mean decreases slowly to approximately 1.5.

473

474

475 Figure 5 Recommended Minimum Means versus Inverse Dispersion Parameter of the NB model

476

477 CONCLUSIONS AND FUTURE WORK

478

479 The Poisson and NB regression models are the two most commonly used types of models for

480 analyzing traffic crashes. These models help establish the relationship between traffic crashes

481 (response variable) and traffic flow, highway geometrics, and other explanatory variables. To

482 evaluate their statistical performance, GOF tests need to be used. Since crash data are often

483 characterized by low sample mean values and it has been found that traditional GOF statistics do

484 not perform very well under these conditions. Consequently, there was need to determine

485 whether

Contact this candidate