**** GOODNESS OF FIT TEST FOR POISSON DISTRIBUTIONS **** * Example dataset *. DATA LIST FREE/ NUMBER(f8.0). BEGIN DATA 0 2 0 0 2 2 0 0 1 1 3 0 0 1 0 0 1 0 1 4 0 0 1 4 2 0 0 1 0 0 2 2 1 1 0 6 0 5 1 3 0 1 0 1 8 END DATA. VAR LABEL number 'Nr of Ixodes trianguliceps/mouse'. * One sample K-S (low sensitivity if Lilliefors correction not applied manually) *. NPAR TESTS /K-S(POISSON)= number /MISSING ANALYSIS. * Notice the gap between 6 & 8; it must be filled to compute expected values correctly (this part is only for didactic purposes, can be removed from final code) *. FREQUENCIES VARIABLES=number /ORDER= ANALYSIS . *** Filling gaps in sequence *** * First: aggregate the dataset *. AGGREGATE /OUTFILE=* /BREAK=number /obs=N. EXEC. * Create auxiliary file with every possible value from 0 to max *. PRESERVE. SET ERRORS=NONE RESULTS=NONE. MATRIX. GET number /VAR=number. COMPUTE max=CMAX(number). RELEASE number. COMPUTE number=MAKE(max+1,1,1). COMPUTE wgt=MAKE(max+1,1,0.0001). LOOP i=1 TO max+1. - COMPUTE number(i)=i-1. END LOOP. COMPUTE namevec={"number","wgt"}. SAVE {number,wgt} /OUTFILE='c:\\temp\\temp.sav' /NAMES=namevec. END MATRIX. * Match it with working file to fill the gaps *. MATCH FILES /FILE=* /FILE='C:\\Temp\\temp.sav' /BY NUMBER. EXECUTE. IF MISSING(obs) obs=wgt. MATCH FILES /FILE=* /DROP=wgt. EXECUTE. WEIGHT BY obs. RESTORE. * Now the gaps are filled *. FREQUENCIES VARIABLES=number /ORDER= ANALYSIS . *** Goodness of fit Chi-square test (more sensitive than K-S) *** * Calculate mean & N *. COMPUTE const = 1 . EXECUTE . AGGREGATE /OUTFILE='C:\\temp\\aggr.sav' /BREAK=const /mean = MEAN(number) /N=N. WEIGHT OFF. PRESERVE. SET ERRORS=NONE. MATCH FILES /FILE=* /FILE='C:\\temp\\aggr.sav' /BY const. EXECUTE. RESTORE. DO IF MISSING(mean). - COMPUTE mean=LAG(mean). - COMPUTE n=LAG(n). END IF. EXECUTE. * Compute expected frequencies *. DO IF $casenum=1. - COMPUTE expect=n*CDF.POISSON(number,mean) . END IF. DO IF $casenum GT 1. - COMPUTE expect=n*CDF.POISSON(number,mean)-n*CDF.POISSON(LAG(number),mean) . END IF. SORT CASES BY number(D). DO IF $casenum=1. - COMPUTE expect=n*(1-CDF.POISSON(number-1,mean)). END IF. EXEC. SORT CASES BY number(A). * Graphical comparison before collapsing categories, although not part of the test, it's useful for visual cheking of departures from Poisson fit; must be done before collapsing categories *. GRAPH /BAR(GROUPED)=VALUE( obs expect ) BY number . * Check for Expected LT 1 & collapse cells if necessary (can be modified to collapse any Exp frequency below 5) *. COMPUTE id=$casenum. DO IF expect LT 1. - COMPUTE id=LAG(id). END IF. EXECUTE. AGGREGATE /OUTFILE=* /BREAK=id /observed = SUM(obs) /expected = SUM(expect). * Test statistic & significance *. STRING groups (A2). COMPUTE groups = STRING(id,F2.0) . EXECUTE . MATRIX. GET groups/VAR=groups. GET obs/VAR=observed. GET expect/VAR=expected. PRINT {obs,expect;MSUM(obs),MSUM(expect)} /FORMAT='F8.2' /TITLE='Observed & expected frequencies' /CLABELS='Obs','Exp' /RNAMES=groups. COMPUTE k=NROW(obs). PRINT {k-2} /FORMAT='F8.0' /TITLE='Degrees of freedom (k-2)'. COMPUTE chi2=MSUM((obs-expect)&**2/expect). COMPUTE chisig=1-CHICDF(chi2,k-2). PRINT {chi2,chisig} /FORMAT='F8.4' /TITLE='GOODNESS OF FIT TEST' /CLABELS='Chi^2','Sig.'. COMPUTE minexp=CMIN(expect). COMPUTE flag=0. LOOP i=1 TO k. - DO IF expect(i) LT 5. - COMPUTE flag=flag+1. - END IF. END LOOP. COMPUTE pflag=100*flag/k. DO IF flag GT 0. - PRINT pflag /FORMAT='F8.1' /TITLE='WARNING: EXPECTED<5 IN AT LEAST ONE CELL' /RLABEL='%cells='. - PRINT minexp /FORMAT='F8.1' /TITLE='Minimum expected count is:' /RLABEL='Exp='. END IF. END MATRIX.