Solution ID: 100000322
Question Subtype: Statistical Distributions
Title:
Generating multivariate hypergeometric random variables in SPSS
Description:
Q.
How can I use SPSS to generate variables with a multivariate
hypergeometric distribution for a specified number of cases?
A.
If you draw n observations without replacement from a population
with k classes of objects, where k>2, the k numbers of objects
sampled from the respective classes have a multivariate hypergeometric
distribution. The following macro generates the cases and variables
with such a distribution. You supply the macro with the number of
cases to be generated (ncases), the number of classes of objects
(classes), the number of objects to sample (or 'draw') for each case
(samsize), and the population sizes for each of these classes (popc).
The algorithm is similar to the directed, or ball-in-urn method, for
generating a multinomial distribution. [see Johnson, N. L., Kotz, S.,
& Balakrishnan, N. (1997). "Discrete Multivariate Distributions",
Wiley.]
1. The population sizes for each class (pop1 to pop(k)) are
initialized from the respective values of popc and the sample sizes
for all classes (sam1 to sam(k)) are initiialized as 0. The total
sample size is calculated as the sum of the population sizes for the
classes, i.e, the sum of pop(k), and stored as poptot.
2. For each of the samsize sample units to be drawn:
(i). A discrete uniform random number from 1 to poptot is drawn
and stored as Y.
(ii). For each of the k classes, the variable psum is calculated as
the sum of class populations considered to that point. If Y is less
than or equal to psum but greater than psum for the previously-
considered classes, the observation is considered a draw from the
current class. The sample size for that class is incremented by 1 and
its population size is decreased by 1, as is poptot. [Note that psum
is not decremented, so there is no danger of a single y matching the
range for both of 2 adjacent classes].
* macro to generate a multivariate hypergeometric distribution.
* First example call has 3 classes with pop sizes of 50, 30, & 20.
* 25 items are sampled without replacement and
* sam1 to sam3 hold the respective counts.
* 200 cases are generated.
* Second example call has 4 classes with pop sizes of 20, 10, 30, & 20.
* 30 items are sampled without replacement and
* sam1 to sam4 hold the respective counts.
* 300 cases are generated .
* .
*************************************************************.
define mvhypgen
(ncases = !tokens(1)
/classes = !tokens(1)
/samsize = !tokens(1)
/popc = !enclose('[',']') ).
new file.
input program .
loop id = 1 to !ncases .
vector pop sam (!classes , F8).
+ do repeat popn = pop1 to !concat('pop',!classes)
/samn = sam1 to !concat('sam',!classes)
/pc = !popc .
+ compute popn = pc.
+ compute samn = 0.
+ end repeat.
+ compute poptot = sum(pop1 to !concat('pop',!classes)).
+ loop #j = 1 to !samsize .
+ compute y = trunc(uniform(poptot)) + 1.
+ compute psum = 0.
+ loop #k = 1 to !classes .
+ compute psum = psum + pop(#k).
+ do if (y le psum and y gt (psum - pop(#k))).
+ compute sam(#k) = sam(#k) + 1.
+ compute pop(#k) = pop(#k) - 1.
+ compute poptot = poptot - 1.
+ end if.
+ end loop.
+ end loop.
+ end case.
end loop.
end file.
end input program.
execute.
!enddefine .
mvhypgen ncases = 200 classes = 3 samsize = 25
popc = [ 50 30 20 ] .
mvhypgen ncases = 300 classes = 4 samsize = 30
popc = [ 20 10 30 20 ] .