1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
Chris,
  Here are a few ways of approaching the problem.  Fundamentally
different in their approaches.  I am sure there are other ways as
well, but these come immediately to mind.
Regards, David Marso
SPSS Consulting Services
--------
**************************************************
* A method which compares each variable to each
* preceding variable, clobbers the duplicates and
* then tallys the surviving instances.
**************************************************.
data list free/ id  prog1 prog2 prog3 prog4 prog5.
begin data
001    345     345     876     509     345
002     .      220     220      .      350
end data.


* Copy the array * .
VECTOR P=PROG1 TO PROG5 / #TMP(5).
LOOP #I=1 to 5.
+  compute #TMP(#I)=P(#I).
END LOOP.

* Compare to preceding variables *.
LOOP #I=2 to 5.
+  LOOP #J=1 TO #I-1.
+    IF #TMP(#I)=#TMP(#J) #TMP(#I)=$SYSMIS.
+  END LOOP IF MISSING(#TMP(#I)).
END LOOP.
COMPUTE N=NVALID(#TMP1 TO #TMP5).
EXECUTE.

**************************************************
* A method which restructures the data file into *
* multiple cases per record and then aggregates  *.
**************************************************.
data list free/ id  prog1 prog2 prog3 prog4 prog5.
begin data
001    345     345     876     509     345
002     .      220     220      .      350
end data.
*save file for later merge *.
SAVE OUTFILE 'TMP'.
VECTOR PROG=PROG1 TO PROG5.
loop P=1 to 5.
compute program=PROG(P).
DO IF NOT (MISSING(PROGRAM)).
XSAVE OUTFILE 'PROG' / KEEP ID PROGRAM.
END IF.
END LOOP.
EXECUTE.
GET FILE 'PROG' .
AGGREGATE OUTFILE * / BREAK ID PROGRAM / N=N.
AGGREGATE OUTFILE * / BREAK ID / N=N.
MATCH FILES FILE 'TMP' / FILE * / BY ID.
EXECUTE.
Chris Conway wrote:
>
> I'm posting this for a colleague:
>
> I'm working on a student registration data file.  Each unique student
> record has information on up to five different registrations.  For each
> of the five registrations, I have a variable indicating what program the
> student was registered in.
>
> The record layout therefore is:
>
> ID   prog1 prog2 prog3 prog4 prog5
>
> ID and Prog1 - prog5 are numeric.  There are missing data fields. What I
> want to do is determine how many unique programs the student has
> registered in?
>
> For example, for the following cases, the answers are "3 unique
> programs" and "2 unique programs" respectively:
>
> 001    345     345     876     509     345
> 002            220     220             350
> (id)  (prog1) (prog2) (prog3) (prog4) (prog5)
>
> Any help would be appreciated.
>
> Thanks, Chris Conway