1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
SPSS AnswerNet: Result 

Solution ID:	 	100000760	
Product:	 	SPSS Base 	
Version:	 		
O/S:	 		
Question Type:	 	Syntax/Batch/Scripting	
Question Subtype:	 	Data Transformations	
Title:
Complex matching/sampling without replacement 
Description:
Q. 
I have a file consisting of three types of subjects. One 
group of subjects are of primary interest in my inquiry. 
There are other subjects who are in the same family as the 
primary subjects. A third group of subjects are persons 
who are not related to the primary subjects. I need to 
match the first group of subjects with members of the first 
two groups by age and gender. In doing so I need to be sure 
that no person is matched with someone from the same family 
as well as ensure that no one is sampled more than once from 
the file. I am not sure how to proceed. 
A. 
The task is nontrivial, but this is how to do it. The key 
is to associate all ID's which match on the AGE and GENDER 
variables with the appropriate cases. These ID's are then 
randomly sampled and then removed from further consideration. 
If a match is found then the data from paired cases are 
written to a new data set called 'YOKED.SAV'. All unmatched 
primary cases are written to another system file called 
'UNPAIRED.SAV'. The following program can be adapted by 
changing the constant 20 to reflect the maximum number of 
cases in a particular GENDER AGE combination. 
data list FREE/ FAMILY AGE GENDER Z PRIMARY. 
BEGIN DATA 
1 1 1 2 1 1 2 1 4 2 2 3 2 2 1 2 4 2 3 2 3 3 2 3 1 3 2 1 2 2 
4 4 2 3 1 4 3 2 1 2 5 1 1 3 1 5 2 2 2 2 6 3 1 2 1 6 4 1 2 2 
7 2 2 2 1 7 3 2 1 2 8 2 1 2 1 8 3 2 1 2 9 1 1 1 1 9 2 2 3 2 
10 3 2 9 1 10 2 1 1 2 11 2 1 2 1 11 2 1 2 2 12 3 1 2 1 12 3 2 1 2 
13 1 1 3 1 14 2 2 1 1 15 1 2 2 1 16 2 1 5 1 17 4 2 2 1 18 3 2 1 1 
19 4 1 5 1 20 4 1 1 1 21 1 2 5 1 22 3 2 4 1 23 4 1 5 1 24 2 2 8 1 
25 4 1 2 1 26 3 2 1 1 27 2 1 5 1 28 4 1 1 1 29 3 2 3 1 30 2 2 6 1 
31 4 1 4 1 32 3 1 2 1 33 1 2 2 1 34 3 1 1 1 35 4 2 1 1 36 2 1 4 1 
END DATA . 
COMPUTE OLDSEQ=$CASENUM. 
SAVE OUTFILE 'RAWDATA.SAV' . 

* Create File with primary family id's merged on record *. 
* Append every case with other ids with matching variables *. 
SORT CASES BY FAMILY (A) PRIMARY (D). 
IF FAMILY=LAG(FAMILY) TWIN=LAG(OLDSEQ). 

* Create counter for cases within each AGE/GENDER strata *. 
SORT CASES BY AGE GENDER . 
IF (MISSING(LAG(OLDSEQ))) STCNT=1. 
IF (AGE=LAG(AGE) AND GENDER=LAG(GENDER)) STCNT=LAG(STCNT) +1. 
IF (MISSING(STCNT)) STCNT=1. 

* Spread ID values within AGE/GENDER and append to each case *. 
* Retain only primary subjects in resulting aggregated file *. 
VECTOR ID_(20). 
COMPUTE ID_(STCNT)=OLDSEQ. 
AGGREGATE OUTFILE 'F:\\TEMP\\TMP.SAV' 
/ PRESORTED/ BREAK=AGE GENDER 
/ TWIN=FIRST(TWIN) / ID_X01 TO ID_X20 = MAX(ID_1 TO ID_20). 
SELECT IF PRIMARY=1. 
MATCH FILES FILE=* / TABLE='F:\\TEMP\\TMP.SAV' / BY AGE GENDER . 

* Test if current case is in the same strata as previous case *. 
* If so then inherit previous cases 'available id flags' *. 
COUNT MAXVALID=ID_X01 TO ID_X20 (LO THRU HI). 
DO IF AGE=LAG(AGE) AND GENDER=LAG(GENDER) . 
+ DO REPEAT ID=ID_X01 TO ID_X20 . 
+COMPUTE ID = LAG(ID) . 
+ END REPEAT. 
END IF. 

* Traverse the vector of available ID flags *. 
VECTOR ID_X=ID_X01 TO ID_X20 . 

* Don't match a person with himself or herself *. 
DO IF NOT MISSING (ID_X(STCNT) ). 

* Initialize status flags *. 
+ DO REPEAT INIT=TAKEN FOUND NTRYS . 
+ COMPUTE INIT=0. 
+ END REPEAT. 

* Search the vector, copy into YOKE and destroy originals *. 
+ LOOP. 
+ COMPUTE WHICH=TRUNC(UNIFORM(MAXVALID))+1. 
+ IF WHICH <> STCNT YOKE=ID_X(WHICH). 
+ DO IF (NOT (ANY(YOKE,TWIN,OLDSEQ,$SYSMIS))) . 
+ COMPUTE ID_X(WHICH)=$SYSMIS . 
+ COMPUTE ID_X(STCNT)=$SYSMIS . 
+ COMPUTE FOUND=1. 
+ END IF . 
+ END LOOP IF FOUND. 
ELSE. 
COMPUTE TAKEN=1. 
END IF. 

* Partition data set and associate data with ID numbers * . 
TEMPORARY . 
SELECT IF (MISSING (YOKE) AND NOT(TAKEN) ). 
SAVE OUTFILE 'UNPAIRED.SAV' . 
SELECT IF (NOT (TAKEN) AND NOT MISSING (YOKE) ). 
SAVE OUTFILE 'TMP' / KEEP OLDSEQ YOKE . 
GET FILE 'TMP'. 
COMPUTE GROUP=$CASENUM . 

* Generate pairs of records keeping track of which pair *. 
VECTOR V = OLDSEQ TO YOKE . 
LOOP X=1 TO 2. 
COMPUTE OLDSEQ=V(X). 
XSAVE OUTFILE 'TMP2' / KEEP GROUP X OLDSEQ . 
END LOOP. 
EXECUTE. 

* Append original data to the new file * IN THIS CASE Z * . 
GET FILE 'TMP2'. 
SORT CASES BY OLDSEQ. 
MATCH FILES FILE = * / TABLE = 'RAWDATA.SAV' / BY OLDSEQ . 
SORT CASES BY GROUP. 
******************************************************** . 
* Spread paired sets of variables: * . 
* You will need to generalize to several variables* . 
* HINT! * . 
* VECTOR ID(2). * . 
* DO REPEAT VAR = Y Z . * . 
* VECTOR VAR(2) . * . 
* COMPUTE VAR(X) = VAR . * . 
* END REPEAT . * . 
* AGGREGATE ...../ Y1 Y2 Z1 Z2 = MAX(Y1 Y2 Z1 Z2).* . 
******************************************************** . 
VECTOR Z(2) / ID(2). 
COMPUTE Z(X) = Z. 
COMPUTE ID(X)=OLDSEQ. 
AGGREGATE OUTFILE * / PRESORTED / BREAK=GROUP GENDER AGE 
/ Z1 TO Z2 = MAX(Z1 TO Z2) / ID1 TO ID2 = MAX(ID1 TO ID2). 
SAVE OUTFILE 'YOKED.SAV'. 
LIST. 

* The resulting file ! * . 
GROUP GENDERAGE Z1 Z2 ID1 ID2 
1 1 1 2 3 1 9 
2 1 1 1 3 17 25 
3 2 1 2 5 27 33 
4 1 2 2 4 15 48 
5 1 2 2 1 21 20 
6 1 2 5 4 28 2 
7 1 2 5 2 39 6 
8 2 2 2 2 13 10 
9 2 2 1 3 26 18 
10 2 2 8 6 36 42 
11 1 3 2 2 11 23 
12 1 3 2 1 44 46 
13 2 3 2 3 3 5 
14 2 3 9 3 19 41 
15 2 3 1 1 30 8 
16 2 3 4 1 34 38 
17 1 4 5 2 31 12 
18 1 4 1 5 32 35 
19 1 4 2 1 37 40 
20 2 4 3 1 7 47 
21 2 4 2 3 29 4 
Number of cases read: 21 Number of cases listed: 21