1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
*(Q) When comparing two groups (treated and untreated) it is useful to
adjust for confounding differences between the groups. Maybe, for instance,
one treatment receives "harder patients" than the other. One way of doing so
is to create what is called "propensity scores." Essentially the idea is
that we compare those who are similar to each other (=have similar
propensity scores). One way of creating these propensity scores is to use
logistic regression. I have done all this. The three key colums are then:
A: The column which says whether a patient has received the treatment (0 or
1)
B: A column with a propensity score (which says how likely it is that a
person was in the group receiving treatment given certain other values -
sex, gender, history i.e. the values used in the logistic regression)
C: A column with the result of the treatment (e.g. absolute or percentage
improvement)

Now, the question is not about the theory or about statistics, it is simply
this:
I want to create a fourth colum of "control cases." The values in this
fourth colum should be the improvement for the person who has the closest
propensity score (is most similar) to the treated person (for each row with
a treated person).

*(A)Posted to SPSSX-L by rlevesque@videotron.ca on 2001/11/07.
* http://www.spsstools.net

* The solution assumes that the number of cases receiving the treatment is known.
* This could restriction could be removed if necessary.

* Create a data file for illustration purposes.

INPUT PROGRAM.
SET SEED=2365847.
LOOP caseid=1 TO 20.
COMPUTE treatm=TRUNC(UNIFORM(1)+.5).
COMPUTE propen=UNIFORM(100).
COMPUTE improv=UNIFORM(100).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
SORT CASES BY treatm(D) propen.
COMPUTE idx=$CASENUM.
SAVE OUTFILE='c:\\temp\\mydata.sav'.

* Erase the previous temporary result file, if any.
ERASE FILE='c:\\temp\\results.sav'.
COMPUTE key=1.
SELECT IF (1=0).
* Create an empty data file to receive results.
SAVE OUTFILE='c:\\temp\\results.sav'.

********************************************.
* Define a macro which will do the job.
********************************************.

SET MPRINT=no.
*////////////////////////////////.
DEFINE !match (nbtreat=!TOKENS(1))
!DO !cnt=1 !TO !nbtreat
GET FILE='c:\\temp\\mydata.sav'.
SELECT IF idx=!cnt OR treatm=0.

DO IF $CASENUM=1.
COMPUTE #target=propen.
ELSE.
COMPUTE delta=propen-#target.
END IF.
EXECUTE.
SELECT IF ~MISSING(delta).
IF (delta<0) delta=-delta.

SORT CASES BY delta.
SELECT IF $CASENUM=1.
COMPUTE key=!cnt.
ADD FILES FILE=* 
	/FILE='c:\\temp\\results.sav'.
SAVE OUTFILE='c:\\temp\\results.sav'.
!DOEND
!ENDDEFINE.
*////////////////////////////////.

SET MPRINT=yes.

**************************.
* Call macro (we know that there are 7 treatment cases).
**************************.
!match nbtreat=7.

* Sort results file to allow matching.

GET FILE='c:\\temp\\results.sav'.
SORT CASES BY key.
SAVE OUTFILE='c:\\temp\\results.sav'.

* Match each treatment cases with the most similar non treatment case.

GET FILE='c:\\temp\\mydata.sav'.
MATCH FILES /FILE=*
 /FILE='C:\\Temp\\results.sav'
 /RENAME (idx = d0) caseid=caseid2 improv=improv2 propen=propen2
  treatm=treatm2 key=idx
 /BY idx
 /DROP= d0.
EXECUTE.

* That's it!.