1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Solution ID:	 	100000322	
Question Subtype:	 	Statistical Distributions	

Title:
Generating multivariate hypergeometric random variables in SPSS 
Description:
Q. 
How can I use SPSS to generate variables with a multivariate 
hypergeometric distribution for a specified number of cases? 
A. 
If you draw n observations without replacement from a population 
with k classes of objects, where k>2, the k numbers of objects 
sampled from the respective classes have a multivariate hypergeometric 
distribution. The following macro generates the cases and variables 
with such a distribution. You supply the macro with the number of 
cases to be generated (ncases), the number of classes of objects 
(classes), the number of objects to sample (or 'draw') for each case 
(samsize), and the population sizes for each of these classes (popc). 
The algorithm is similar to the directed, or ball-in-urn method, for 
generating a multinomial distribution. [see Johnson, N. L., Kotz, S., 
& Balakrishnan, N. (1997). "Discrete Multivariate Distributions", 
Wiley.] 

1. The population sizes for each class (pop1 to pop(k)) are 
initialized from the respective values of popc and the sample sizes 
for all classes (sam1 to sam(k)) are initiialized as 0. The total 
sample size is calculated as the sum of the population sizes for the 
classes, i.e, the sum of pop(k), and stored as poptot. 

2. For each of the samsize sample units to be drawn: 
(i). A discrete uniform random number from 1 to poptot is drawn 
and stored as Y. 
(ii). For each of the k classes, the variable psum is calculated as 
the sum of class populations considered to that point. If Y is less 
than or equal to psum but greater than psum for the previously- 
considered classes, the observation is considered a draw from the 
current class. The sample size for that class is incremented by 1 and 
its population size is decreased by 1, as is poptot. [Note that psum 
is not decremented, so there is no danger of a single y matching the 
range for both of 2 adjacent classes]. 
* macro to generate a multivariate hypergeometric distribution. 
* First example call has 3 classes with pop sizes of 50, 30, & 20. 
* 25 items are sampled without replacement and 
* sam1 to sam3 hold the respective counts. 
* 200 cases are generated. 
* Second example call has 4 classes with pop sizes of 20, 10, 30, & 20. 
* 30 items are sampled without replacement and 
* sam1 to sam4 hold the respective counts. 
* 300 cases are generated . 
* . 

*************************************************************. 
define mvhypgen 
(ncases = !tokens(1) 
/classes = !tokens(1) 
/samsize = !tokens(1) 
/popc = !enclose('[',']') ). 
new file. 
input program . 
loop id = 1 to !ncases . 
vector pop sam (!classes , F8). 
+ do repeat popn = pop1 to !concat('pop',!classes) 
/samn = sam1 to !concat('sam',!classes) 
/pc = !popc . 
+ compute popn = pc. 
+ compute samn = 0. 
+ end repeat. 
+ compute poptot = sum(pop1 to !concat('pop',!classes)). 
+ loop #j = 1 to !samsize . 
+ compute y = trunc(uniform(poptot)) + 1. 
+ compute psum = 0. 
+ loop #k = 1 to !classes . 
+ compute psum = psum + pop(#k). 
+ do if (y le psum and y gt (psum - pop(#k))). 
+ compute sam(#k) = sam(#k) + 1. 
+ compute pop(#k) = pop(#k) - 1. 
+ compute poptot = poptot - 1. 
+ end if. 
+ end loop. 
+ end loop. 
+ end case. 
end loop. 
end file. 
end input program. 
execute. 
!enddefine . 
mvhypgen ncases = 200 classes = 3 samsize = 25 
popc = [ 50 30 20 ] . 
mvhypgen ncases = 300 classes = 4 samsize = 30 
popc = [ 20 10 30 20 ] .