****************************************** *** FITTING MODELS WITH OVERDISPERSION *** *** ("extra-Poisson" variation) *** ****************************************** * These models are useful for parasites * * distributions * ****************************************** ***************************************************************************** * Adapted from: MJ Moroney 'Facts From Figures' * * (London: Penguin Books, 1951; 2nd ed 1953) * * * * In Poisson models E(x)=V(x)=lambda (only one parameter) * * Probability function: * * P(x)=e^(-lambda)*(lambda^x)/x! * * * * In over-dispersed distributions V(X)>E(x) * * Negative binomial law is a prototype of overdispersed distributions * * * * In Negative Binomial models one extra parameter (k) is needed: * * Parameter Estimated by: * * E(x)=lambda ----------------> sample mean * * V(x)=lambda*(1+lambda/k) ---> sample variance * * k = EČ(x)/(V(x)-E(x)) -> meanČ/(variance-mean) ('moments method') * * Comment: Maximum Likelihood method is better, but far more complicated! * * (see ParaDis Web page & program) * * Probability function: * * P(x)=[(k/(k+lambda))^k]*[(lambda/(k+lambda))^x]*GAMMA(x+k)/(x!*GAMMA(k)) * ***************************************************************************** * Example dataset (from ParaDis program manual): * (http://www.bondy.ird.fr/~pichon/paradis/parad2.html). DATA LIST FREE/ number freq (2 F8.0). BEGIN DATA 0 70 1 38 2 17 3 10 4 9 5 3 6 2 7 1 END DATA. VARIABLE LABEL number 'Acariens/feuille'. * Add mean, variance & N to the dataset & compute Fisher Dispersion Test *. CACHE. EXECUTE. MATRIX. PRINT /TITLE='DATA & STATISTICS'. GET x/VAR=number. GET freq/VAR=freq. COMPUTE ndata=NROW(x). COMPUTE nt=MSUM(freq). COMPUTE mean=MSUM(freq&*x)/nt. COMPUTE variance=(MSUM(freq&*(x&**2))-nt*mean&**2)/(nt-1). COMPUTE k=(mean&**2)/(variance-mean). PRINT {x,freq} /FORMAT='F8.0' /TITLE='Sample data' /CLABELS='X','Freq'. PRINT nt /FORMAT='F8.0' /TITLE='Total sample size' /RLABEL='Nt'. PRINT {mean,variance,(variance/mean),k} /FORMAT='F8.2' /TITLE='Statistics' /CLABELS='Mean','Variance','DP','K(*)'. PRINT /TITLE='(*) Computed by moments method.'. PRINT /TITLE='FISHER DISPERSION TEST'. COMPUTE fd=(MSUM(freq&*(x-mean)&**2))/mean. COMPUTE fdsig=1-CHICDF(fd,MSUM(freq)-1). PRINT {fd,fdsig} /FORMAT='F8.4' /TITLE='Test statistics (df=Nt-1)' /CLABELS='Chi^2','Sig.'. DO IF fdsig LE 0.05. - PRINT /TITLE='Overdispersion exists! Extra-Poisson variation is present.'. END IF. COMPUTE means=MAKE(ndata,1,mean). COMPUTE n=MAKE(ndata,1,nt). COMPUTE kp=MAKE(ndata,1,k). COMPUTE namevec={'number','n','mean','k'}. SAVE {x,n,means,kp} /OUTFILE='c:\temp\temp.sav' /NAMES=namevec. END MATRIX. MATCH FILES /FILE=* /FILE='C:\Temp\temp.sav' /BY number. EXECUTE. * Calculate expected frequencies under NB assumption * (For Chi-square, 3 df are lost: N, mean & variance are used) *. COMPUTE expect=n*((mean/(k+mean))**number)*((k/(k+mean))**k)*EXP(LNGAMMA(number+k))/(EXP(LNGAMMA(number+1))*EXP(LNGAMMA(k))). COMPUTE sumexp=expect. * Trick to make the sum of expected frequencies equal to N replacing p(last) by p(x GE last) *. DO IF \$casenum GT 1. - COMPUTE sumexp=sumexp+LAG(sumexp). END IF. SORT CASES BY number(D). DO IF \$casenum EQ 1. - COMPUTE expect=expect+(n-sumexp). END IF. SORT CASES BY number(A). * List&Graph Observed vs Expected *. VAR LABEL freq 'Observed' /expect 'Expected'. REPORT /FORMAT=LIST /TITLE='Observed & expected frequencies (assuming over-dispersion)' /VAR=number freq expect. GRAPH /BAR(GROUPED)=VALUE(freq expect) BY number /TITLE='Observed & expected(*) frequencies' /Footnote='(*) Assuming over-dispersion'. * Check for very low Expected frequencies & collapse cells to avoid them *. COMPUTE id=\$casenum. DO IF (expect LT 2). - COMPUTE id=LAG(id). END IF. AGGREGATE /OUTFILE=* /BREAK=id /observed = SUM(freq) /expected = SUM(expect). STRING groups (A2). COMPUTE groups = STRING(id,F2.0) . * Goodness of fit Chi-square test *. MATRIX. PRINT /TITLE='CHI-SQUARE GOODNESS-OF-FIT TEST'. GET groups/VAR=groups. GET obs/VAR=observed. GET expect/VAR=expected. PRINT {obs,expect,(obs-expect);MSUM(obs),MSUM(expect),0} /FORMAT='F8.2' /TITLE='Frequencies (after collapsing categories)' /CLABELS='Observed','Expected','Residual' /RNAMES=groups. COMPUTE k=NROW(obs). PRINT {k-3} /FORMAT='F8.0' /TITLE='Degrees of freedom (k-3)'. COMPUTE chi2=MSUM((obs-expect)&**2/expect). COMPUTE chisig=1-CHICDF(chi2,k-3). PRINT {chi2,chisig} /FORMAT='F8.4' /TITLE='Test statistics' /CLABELS='Chi^2','Sig.'. COMPUTE minexp=CMIN(expect). COMPUTE flag=0. LOOP i=1 TO k. - DO IF expect(i) LT 5. - COMPUTE flag=flag+1. - END IF. END LOOP. COMPUTE pflag=100*flag/k. DO IF flag GT 0. - PRINT pflag /FORMAT='F8.1' /TITLE='WARNING: Some cells with expected frequencies less than 5.' /RLABEL='cells(%)='. - PRINT minexp /FORMAT='F8.1' /TITLE='The minimum expected cell frequency is:' /RLABEL='Exp='. END IF. END MATRIX. ********************** * END OF SYNTAX.