Solution ID: 100000320
Product: SPSS Base
Version:
O/S:
Question Type: Statistics
Question Subtype: Statistical Distributions
Title:
Generating multinomial random variables in SPSS
Description:
Q.
How can I use SPSS to generate variables with a multinomial
distribution for a specified number of cases?
A.
If you draw n observations with replacement from a population
with k classes of objects, where k>2, the k numbers of objects
sampled from the respective classes have a multinomial distribution.
The following macro generates the cases and variables with such a
distribution. You supply the macro with the number of cases to be
generated (ncases), the number of classes of objects (classes),
the number of objects to sample (or 'draw') for each case (samsize),
and the population proportions or sizes for each of these classes
(prop).
If you enter population sizes for each class, rather than proportions,
you should indicate this by specifying "norp = 1" on the macro call.
(If you enter sizes but neglect to specify that norp = 1, the macro
will "infer" that sizes were entered if the sum(prop1 to prop(k)) > 1.001
and rescale prop1 to prop(k) to proportions. The extra .001 in the above
threshold is to allow for any numerical imprecision in the summing of
prop values.
The algorithm employed is the directed, or ball-in-urn method, as
described in Johnson, Kotz, & Balakrishnan
[(1997). "Discrete Multivariate Distributions", Wiley. pp. 68-69].
As noted by those authors, there are faster algorithms available to
generate multinomial variables. However, those methods do not appear
to lend themselves to efficient use within the structure of SPSS data
files. The algorithm is summarized as follows.
1. The population proportions for each class (pop1 to pop(k)) are
initialized from the respective values of prop and the sample sizes
for all classes (sam1 to sam(k)) are initiialized as 0. The total
sample size is calculated as the sum of the prop values for the
classes, i.e, the sum of pop1 to pop(k), and stored as poptot. If
proportions were correctly entered for the prop parameter, poptot will
equal 1. If counts were entered for prop, poptot > 1.001, and the
values of pop1 to pop(k) are rescaled by division over poptot.
2. For each of the samsize observations to be drawn for each case:
(i). A uniform(0,1) random number is drawn and stored as Y.
(ii). For each of the k classes, the variable psum is calculated as
the sum of class population proportions considered to that point. If Y
is less than or equal to psum but greater than psum for the previously-
considered classes, the observation is considered a draw from the
current class. The sample size for that class is therefore incremented
by 1 * macro to generate a multinomial distribution.
* First example call has 3 classes with pop proportions of .5, .3, & .2.
* 125 items are sampled with replacement and
* sam1 to sam3 hold the respective counts.
* 200 cases are generated.
* Second example call has 4 classes with pop sizes of 20, 10, 30, & 20.
* 30 items are sampled with replacement and
* sam1 to sam4 hold the respective counts.
* norp = 1 in the macro call indicates that population sizes,
* rather than proportions, are input to the prop parameter.
* 300 cases are generated .
* Third example call is like the second, except that norp has not been set.
* The macro infers that pop sizes were entered because the sum of
* of the prop elements is greater than 1.001 .
* .
*************************************************************.
define multnomg
(ncases = !tokens(1)
/classes = !tokens(1)
/samsize = !tokens(1)
/norp= !default (0) !tokens(1)
/prop = !enclose('[',']') ).
new file.
input program .
loop id = 1 to !ncases .
+ compute np = !norp .
vector pop (!classes , F8.5) sam (!classes , F8) .
+ do repeat popn = pop1 to !concat('pop',!classes)
/samn = sam1 to !concat('sam',!classes)
/pc = !prop .
+ compute popn = pc.
+ compute samn = 0.
+ end repeat.
+ compute poptot = sum(pop1 to !concat('pop',!classes)).
+ do if (np = 1 or poptot > 1.001).
+ do repeat propn = pop1 to !concat('pop',!classes).
compute propn = propn/poptot.
+ end repeat.
+ end if.
+ loop #j = 1 to !samsize .
+ compute y = uniform(1) .
+ compute psum = 0.
+ loop #k = 1 to !classes .
+ compute psum = psum + pop(#k).
+ if (y le psum and y gt (psum - pop(#k)))
sam(#k) = sam(#k) + 1.
+ end loop.
+ end loop.
+ end case.
end loop.
end file.
end input program.
execute.
!enddefine .
multnomg ncases = 200 classes = 3 samsize = 125
prop = [ 0.5 0.3 0.2 ] .
multnomg ncases = 300 classes = 4 samsize = 30 norp = 1
prop = [ 20 10 30 20 ] .
multnomg ncases = 300 classes = 4 samsize = 30
prop = [ 20 10 30 20 ] .