Introduction to ZIM

library(ZIM)
data(syph)
count <- syph$a33
count.lag1 <- bshift(count > 0)
trend <- 1:length(count) / 1000
plot(ts(count), xlab = "Week", ylab = "Number of Syphilis Cases", 
     main = "Maryland (2007 - 2010)", type = "o", pch = 20, las = 1)

table(count)
## count
##  0  1  2  3  4  5  6  7  8  9 10 11 12 15 
## 59 10 14 24 26 26 18 14  3  7  4  2  1  1

Observation-Driven Models

ZIP Autoregression

We first fit a ZIP autoregression with an AR(1) correlation structure. The linear trend is included in both the log-linear and logistic parts of the model.

m1 <- zim(count ~ count.lag1 + trend | trend)
m1
## 
## Call:
## zim(formula = count ~ count.lag1 + trend | trend)
## 
## Coefficients (log-linear): 
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.48942    0.11995 12.4175  < 2e-16 ***
## count.lag1   0.22111    0.10072  2.1954  0.02813 *  
## trend       -1.01004    0.66687 -1.5146  0.12987    
## 
## Coefficients (logistic): 
##             Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) -1.93321    0.37196 -5.1974 2.021e-07 ***
## trend        8.60517    2.80827  3.0642  0.002182 ** 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Test for overdispersion (H0: ZIP vs. H1: ZINB) 
## score.test: 2.6031 
## p.value: 0.0046196 
## 
## Criteria for assessing goodness of fit 
## loglik: -454.3903 
## aic: 918.7806 
## bic: 935.4683 
## tic: 920.7761 
## 
## Number of EM-NR iterations: 11 
## Maximum absolute gradient: 7.81597e-14

The EM-NR algorithm is used as the default algorithm in the zim function. The score test for overdispersion suggests that the ZINB model could provide a better fit to the syphilis data (p = 0.0046).

ZINB Autoregression

As suggested by the score test, we next fit a ZINB autoregression, with all the other components remaining the same as in the ZIP autoregression.

m2 <- zim(count ~ count.lag1 + trend | trend, dist = "zinb")
m2
## 
## Call:
## zim(formula = count ~ count.lag1 + trend | trend, dist = "zinb")
## 
## Coefficients (log-linear): 
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.47240    0.13873 10.6132  < 2e-16 ***
## count.lag1   0.23164    0.11522  2.0105  0.04438 *  
## trend       -1.00364    0.77154 -1.3008  0.19332    
## 
## Coefficients (logistic): 
##             Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) -1.97940    0.38563 -5.1329 2.853e-07 ***
## trend        8.71684    2.88697  3.0194  0.002533 ** 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for negative binomial taken to be 15.4711) 
## 
## Criteria for assessing goodness of fit 
## loglik: -451.7464 
## aic: 915.4927 
## bic: 935.5179 
## tic: 915.974 
## 
## Number of EM-NR iterations: 11 
## Maximum absolute gradient: 5.08779e-08

The AIC and TIC suggest a marginal improvement when the ZINB autoregression is used. However, the BIC values for the ZIP and ZINB autoregressions are not distinguishable. This should not be surprising as BIC tends to penalize more for complexity.

Parameter-Driven Models

Dynamic ZIP Model

We now fit a dynamic ZIP model to the syphilis data. The trend is included as a deterministic covariate in the log-linear model. The zero-inflation parameter is assumed to be constant over time.

set.seed(123)
system.time(m3 <- dzim(count ~ trend, dist = "zip", N = 200, R = 200, niter = 100))
##    user  system elapsed 
## 377.758   0.176 377.977
m3
## 
## Call:
## dzim(formula = count ~ trend, dist = "zip", N = 200, R = 200,     niter = 100)
## 
## (Zero-inflation parameter taken to be 0.2679) 
## 
## Coefficients (log-linear): 
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.681576   0.088425 19.0170  < 2e-16 ***
## trend       -1.646273   0.741270 -2.2209  0.02636 *  
## 
## Coefficients (autoregressive): 
##      Estimate Std. Error z value Pr(>|z|)
## ar1 -0.073878   0.406740 -0.1816   0.8559
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Standard deviation parameter taken to be 0.2504) 
## 
## Criteria for assessing goodness of fit 
## loglik: -460.6898 
## aic: 931.3796 
## bic: 948.0913
dzim.plot(m3)

Dynamic ZINB Model

We next fit a dynamic ZINB model to see whether a need remains for the NB dispersion parameter.

set.seed(123)
system.time(m4 <- dzim(count ~ trend, dist = "zinb", N = 200, R = 200, niter = 100))
##    user  system elapsed 
## 389.125   0.047 389.199
m4
## 
## Call:
## dzim(formula = count ~ trend, dist = "zinb", N = 200, R = 200,     niter = 100)
## 
## (Zero-inflation parameter taken to be 0.2694) 
## 
## (Dispersion parameter for negative binomial taken to be 41.5243) 
## 
## Coefficients (log-linear): 
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.684280   0.081973 20.5468  < 2e-16 ***
## trend       -1.494516   0.732988 -2.0389  0.04146 *  
## 
## Coefficients (autoregressive): 
##     Estimate Std. Error z value Pr(>|z|)
## ar1 -0.08078    2.27932 -0.0354   0.9717
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Standard deviation parameter taken to be 0.204) 
## 
## Criteria for assessing goodness of fit 
## loglik: -460.4108 
## aic: 932.8217 
## bic: 952.8757
dzim.plot(m4)