1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
*(Q) Received by email on 2002/05/09.
I have reaction time data for manipulated visual comparisons. Of course it 
highly positively skewed since participants do 2 hours of responses to what 
takes on average 2 to 3 seconds. Seems like there are many cases where they 
got bored or spaced out for 20 seconds or so, or moved around in their 
chair. There are also several cases where their reaction time was less than 
100 miliseconds (college kids are fast, but not that fast). Anyhow, let me 
get to the point. I want to replace outliers with the mean reaction time, 
BUT based on the mean reaction time for that participant as well as on two 
independant variables. Let me sketch an example data file....

ID sides change RT
1  4     7      750
1  4     10     1000
1  5     7      850
1  4     10     22000
2  4     7      14750
2  4     10     1000
2  5     7      50
2  4     10     900
...

ID = pariticpant ID
sides = number of sides to shape
change = percent difference in shape
RT = latency
outliers = RT > 600 & RT < 10000

Basically, find outliers and identify what participant it is, how many sides 
and what percent change. Replace that outlier with the mean reaction time 
for that participant's responses to the other times they responded to cases 
with that number of sides and that percent change.

*(A) By Ray on 2002/05/09.


DATA LIST LIST /ID sides change RT.
BEGIN DATA
1  4     7      750
1  4     10     1000
1  5     7      850
1  4     10     22000
2  4     7      14750
2  4     10     1000
2  5     7      50
2  4     10     900
END DATA.
LIST.

SORT CASES BY id sides change.
SAVE OUTFILE='c:\\temp\\original data.sav'.
SELECT IF RANGE(RT,600,10000).
AGGREGATE
  /OUTFILE='C:\\temp\\aggr.sav'
  /BREAK=id sides change
  /mean_rt = MEAN(rt).

MATCH FILES /FILE='c:\\temp\\original data.sav'
	/TABLE='C:\\temp\\aggr.sav'
	/BY=id sides change.
COMPUTE newrt=rt.
IF ~RANGE(RT,600,10000) newrt=mean_rt.
EXECUTE.

* Note that 2 cases end up with missing values because there are no cases to 
	compare them with. You could either leave these cases as missing or assign 
	the current values to it. Or apply the overall mean to it. Many otions exist. 

* It would be also possible (but this is more complicated) to use a case which 
	is 'similar'. For an example, see syntax number 12 in
	http://pages.infinit.net/rlevesqu/SampleSyntax.htm#RandomSampling.