1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
Solution ID:	 	100000320	
Product:	 	SPSS Base 	
Version:	 		
O/S:	 		
Question Type:	 	Statistics	
Question Subtype:	 	Statistical Distributions	
Title:
Generating multinomial random variables in SPSS 
Description:
Q. 
How can I use SPSS to generate variables with a multinomial 
distribution for a specified number of cases? 
A. 
If you draw n observations with replacement from a population 
with k classes of objects, where k>2, the k numbers of objects 
sampled from the respective classes have a multinomial distribution. 
The following macro generates the cases and variables with such a 
distribution. You supply the macro with the number of cases to be 
generated (ncases), the number of classes of objects (classes), 
the number of objects to sample (or 'draw') for each case (samsize), 
and the population proportions or sizes for each of these classes 
(prop). 
If you enter population sizes for each class, rather than proportions, 
you should indicate this by specifying "norp = 1" on the macro call. 
(If you enter sizes but neglect to specify that norp = 1, the macro 
will "infer" that sizes were entered if the sum(prop1 to prop(k)) > 1.001 
and rescale prop1 to prop(k) to proportions. The extra .001 in the above 
threshold is to allow for any numerical imprecision in the summing of 
prop values. 
The algorithm employed is the directed, or ball-in-urn method, as 
described in Johnson, Kotz, & Balakrishnan 
[(1997). "Discrete Multivariate Distributions", Wiley. pp. 68-69]. 
As noted by those authors, there are faster algorithms available to 
generate multinomial variables. However, those methods do not appear 
to lend themselves to efficient use within the structure of SPSS data 
files. The algorithm is summarized as follows. 
1. The population proportions for each class (pop1 to pop(k)) are 
initialized from the respective values of prop and the sample sizes 
for all classes (sam1 to sam(k)) are initiialized as 0. The total 
sample size is calculated as the sum of the prop values for the 
classes, i.e, the sum of pop1 to pop(k), and stored as poptot. If 
proportions were correctly entered for the prop parameter, poptot will 
equal 1. If counts were entered for prop, poptot > 1.001, and the 
values of pop1 to pop(k) are rescaled by division over poptot. 
2. For each of the samsize observations to be drawn for each case: 
(i). A uniform(0,1) random number is drawn and stored as Y. 
(ii). For each of the k classes, the variable psum is calculated as 
the sum of class population proportions considered to that point. If Y 
is less than or equal to psum but greater than psum for the previously- 
considered classes, the observation is considered a draw from the 
current class. The sample size for that class is therefore incremented 
by 1 * macro to generate a multinomial distribution. 
* First example call has 3 classes with pop proportions of .5, .3, & .2. 
* 125 items are sampled with replacement and 
* sam1 to sam3 hold the respective counts. 
* 200 cases are generated. 
* Second example call has 4 classes with pop sizes of 20, 10, 30, & 20. 
* 30 items are sampled with replacement and 
* sam1 to sam4 hold the respective counts. 
* norp = 1 in the macro call indicates that population sizes, 
* rather than proportions, are input to the prop parameter. 
* 300 cases are generated . 
* Third example call is like the second, except that norp has not been set. 
* The macro infers that pop sizes were entered because the sum of 
* of the prop elements is greater than 1.001 . 
* . 
*************************************************************. 
define multnomg 
(ncases = !tokens(1) 
/classes = !tokens(1) 
/samsize = !tokens(1) 
/norp= !default (0) !tokens(1) 
/prop = !enclose('[',']') ). 
new file. 
input program . 
loop id = 1 to !ncases . 
+ compute np = !norp . 
vector pop (!classes , F8.5) sam (!classes , F8) . 
+ do repeat popn = pop1 to !concat('pop',!classes) 
/samn = sam1 to !concat('sam',!classes) 
/pc = !prop . 
+ compute popn = pc. 
+ compute samn = 0. 
+ end repeat. 
+ compute poptot = sum(pop1 to !concat('pop',!classes)). 
+ do if (np = 1 or poptot > 1.001). 
+ do repeat propn = pop1 to !concat('pop',!classes). 
compute propn = propn/poptot. 
+ end repeat. 
+ end if. 
+ loop #j = 1 to !samsize . 
+ compute y = uniform(1) . 
+ compute psum = 0. 
+ loop #k = 1 to !classes . 
+ compute psum = psum + pop(#k). 
+ if (y le psum and y gt (psum - pop(#k))) 
sam(#k) = sam(#k) + 1. 
+ end loop. 
+ end loop. 
+ end case. 
end loop. 
end file. 
end input program. 
execute. 
!enddefine . 
multnomg ncases = 200 classes = 3 samsize = 125 
prop = [ 0.5 0.3 0.2 ] . 
multnomg ncases = 300 classes = 4 samsize = 30 norp = 1 
prop = [ 20 10 30 20 ] . 
multnomg ncases = 300 classes = 4 samsize = 30 
prop = [ 20 10 30 20 ] .