1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
* SPSS PROCEDURE FOR CALCULATING White's Standard Errors for Large, Intermediate and Small Samples.



*(i)  HC0: This is the original White (1980) procedure applicable when sample sizes are large (n > 500).

* 1st step: Open up your data file and save it under a new name since the following procedure will alter it.
* 2nd step: Run you OLS regression and save UNSTANDARDISED residuals as RES_1:.

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN
  /DEPENDENT mp_pc
  /METHOD=ENTER xp_pc   gdp_pc
  /SAVE RESID(RES_1) .

* 3rd step: create a variable called ESQ = square of those residuals:. 

COMPUTE ESQ = RES_1 * RES_1.
EXECUTE.

* 4th step: create a variable called CONSTANT = constant of value 1 for all observations in the sample.

FILTER OFF.
USE ALL.
EXECUTE .
COMPUTE CONSTANT = 1.
EXECUTE.

* 5th step: Filter out missing values and Enter Matrix syntax mode .

FILTER OFF.
USE ALL.
SELECT IF(MISSING(ESQ) = 0).
EXECUTE .

* 6th step: Tell the matrix routine to get your variables.
     * you need to enter the names of the Y and X variables from your regression here.
and  Use matrix syntax to calculate White's standard errors for large samples:.
*******Note that the only thing you need to do here is alter the variable names in lines 2 and 3 below so that they match those of your regression.
MATRIX.
GET Y / VARIABLES = mp_pc.   
GET X / VARIABLES = CONSTANT, xp_pc, gdp_pc  
/ NAMES = XTITLES.
GET RESIDUAL / VARIABLES = RES_1.
GET ESQ / VARIABLES = ESQ.
COMPUTE XRTITLES = TRANSPOS(XTITLES).
COMPUTE N = NROW(ESQ).
COMPUTE K = NCOL(X).
COMPUTE O = MDIAG(ESQ).
COMPUTE WHITEV = (INV(TRANSPOS(X) * X)) *TRANSPOS(X)* O * X*INV(TRANSPOS(X) * X).
COMPUTE WDIAG = DIAG(WHITEV).
COMPUTE WHITE_SE = SQRT(WDIAG).
PRINT WHITE_SE 
  / FORMAT = "E13"
  / TITLE = "White's (Large Sample) Corrected Standard Errors"
  / RNAMES = XRTITLES.
COMPUTE B = (INV(TRANSPOS(X) * X)) * (TRANSPOS(X) * Y).
PRINT B
/ FORMAT = "E13"
/TITLE = "OLS Coefficients"
/  RNAMES = XRTITLES.
COMPUTE WT_VAL = B / WHITE_SE.
PRINT WT_VAL
/ FORMAT = "E13"
/ TITLE = "t-values based on Whites (large sample) corrected SEs"
/  RNAMES = XRTITLES.
COMPUTE SIG_WT = 2*(1- TCDF(ABS(WT_VAL), N)) .
PRINT SIG_WT
/ FORMAT = "E13"
/ TITLE = "Prob(t < tc) based on Whites (large n) SEs"
/  RNAMES = XRTITLES.
COMPUTE SIGMASQ = (TRANSPOS(RESIDUAL)*RESIDUAL)/(N-K).
COMPUTE SE_SQ = SIGMASQ*INV(TRANSPOS(X)*X).
COMPUTE SESQ_ABS = ABS(SE_SQ).
COMPUTE SE = SQRT(DIAG(SESQ_ABS)).
PRINT SE
  / FORMAT = "E13"
  / TITLE = "OLS Standard Errors"
  / RNAMES = XRTITLES.
COMPUTE OLST_VAL = B / SE.
PRINT OLST_VAL
/ FORMAT = "E13"
/ TITLE = "OLS t-values"
/  RNAMES = XRTITLES.
COMPUTE SIG_OLST = 2*(1- TCDF(ABS(OLST_VAL), N)) .
PRINT SIG_OLST
/ FORMAT = "E13"
/ TITLE = "Prob(t < tc) based on OLS SEs"
/  RNAMES = XRTITLES.
COMPUTE WESTIM = {B, SE, WHITE_SE, WT_VAL, SIG_WT}.
PRINT WESTIM 
/ FORMAT = "E13"
/ RNAMES = XRTITLES
/ CLABELS = B, SE, WHITE_SE, WT_VAL, SIG_WT.
END MATRIX. 

Notes:
? Don't save your data file under the same name since the above procedure has removed from the data all observations with missing values.
? If you already have a variable called res_1, you will need to delete or rename it before you run the syntax.   This means that if you run the procedure on several regressions, you will need to delete the newly created res_1 and ESQ variables after each run.
? Note that the output will use scientific notation, so 20.7 will be written as 2.07E+01, and 0.00043 will be written as 4.3E-04.
? Note that the last table just collects together the results of five of the other tables.  
? WT_VAL" is an abbreviation for "White's t-values" and "SIG_WT" is the significance level of these t values.

Example of White's Standard Errors:
If we run the matrix syntax on our earlier regression of floor area on age of dwelling, bedrooms and bathrooms, we get:

Run MATRIX procedure:

White's (Large Sample) Corrected Standard Errors
CONSTANT  4.043030E-02
AGE_DWEL  1.715285E-04
BATHROOM  2.735781E-02
BEDROOMS  1.284207E-02

OLS Coefficients
CONSTANT  3.536550E+00
AGE_DWEL  1.584464E-03
BATHROOM  2.258710E-01
BEDROOMS  2.721069E-01

t-values based on Whites (large sample) corrected SEs
CONSTANT  8.747276E+01
AGE_DWEL  9.237322E+00
BATHROOM  8.256180E+00
BEDROOMS  2.118870E+01

Prob(t < tc) based on Whites (large n) SEs
CONSTANT  0.000000E+00
AGE_DWEL  0.000000E+00
BATHROOM  2.220446E-16
BEDROOMS  0.000000E+00

OLS Standard Errors
CONSTANT  3.514394E-02
AGE_DWEL  1.640008E-04
BATHROOM  2.500197E-02
BEDROOMS  1.155493E-02

OLS t-values
CONSTANT  1.006304E+02
AGE_DWEL  9.661319E+00
BATHROOM  9.034130E+00
BEDROOMS  2.354899E+01

Prob(t < tc) based on OLS SEs
CONSTANT  0.000000E+00
AGE_DWEL  0.000000E+00
BATHROOM  0.000000E+00
BEDROOMS  0.000000E+00

WESTIM
                     B            SE      WHITE_SE        WT_VAL        SIG_WT
CONSTANT  3.536550E+00  3.514394E-02  4.043030E-02  8.747276E+01  0.000000E+00
AGE_DWEL  1.584464E-03  1.640008E-04  1.715285E-04  9.237322E+00  0.000000E+00
BATHROOM  2.258710E-01  2.500197E-02  2.735781E-02  8.256180E+00  2.220446E-16
BEDROOMS  2.721069E-01  1.155493E-02  1.284207E-02  2.118870E+01  0.000000E+00




*(ii) HC2 and HC3: Matrix Procedure for Corrected Standard Errors when the sample is < 500 :

*When the sample size is small, it has been found that White's stand ard errors are not reliable .
*MacKinnon and White (1985) proposed three tests to be used when the sample size is small.  
*Long and Ervin (1999) found that the third of these tests, what they call HC3, is the most reliable.
*But unless one has a great deal of RAM on your computer, you may run into difficulties if your sample size is greater than 250.  
*As a result, I would recommend the following:.

*n < 250	use HC3 irrespective of whether your tests for heteroscedasticity prove positive (Long and Ervin found that the tests are not very powerful in small samples).
*250 < n < 500	use HC2 since this is more reliable than HC0 (HC0 = White's original SE as computed above).
*n > 500		use either HC2 or HC0.

*Syntax for computing HC2 is presented below.  Follow the first 5 steps as before, and then run the following:


*HC2.
MATRIX.
GET Y / VARIABLES = flarea_l.   
GET X / VARIABLES = CONSTANT, age_dwel, bathroom, bedrooms
/ NAMES = XTITLES.
GET RESIDUAL / VARIABLES = RES_1.
GET ESQ / VARIABLES = ESQ.
COMPUTE XRTITLES = TRANSPOS(XTITLES).
COMPUTE N = NROW(ESQ).
COMPUTE K = NCOL(X).
COMPUTE O = MDIAG(ESQ).
/*Computing HC2*/.
COMPUTE XX = TRANSPOS(X) * X.
COMPUTE XX_1 = INV(XX).
COMPUTE X_1 = TRANSPOS(X).
COMPUTE H = X*XX_1*X_1.
COMPUTE H_MONE =  h * -1.
COMPUTE ONE_H = H_MONE + 1.
COMPUTE O_HC2 = O &/ ONE_H.
COMPUTE HC2_a = XX_1 * X_1 *O_HC2.
COMPUTE HC2 = HC2_a * X*XX_1.
COMPUTE HC2DIAG = DIAG(HC2).
COMPUTE HC2_SE = SQRT(HC2DIAG).
PRINT HC2_SE 
  / FORMAT = "E13"
  / TITLE = "HC2 Small Sample Corrected Standard Errors"
  / RNAMES = XRTITLES.
COMPUTE B = XX_1 * X_1 * Y.
PRINT B
/ FORMAT = "E13"
/TITLE = "OLS Coefficients"
/  RNAMES = XRTITLES.
COMPUTE HC2_TVAL = B / HC2_SE.
PRINT HC2_TVAL
/ FORMAT = "E13"
/ TITLE = "t-values based on HC2 corrected SEs"
/  RNAMES = XRTITLES.
COMPUTE SIG_HC2T = 2*(1- TCDF(ABS(HC2_TVAL), N)) .
PRINT SIG_HC2T
/ FORMAT = "E13"
/ TITLE = "Prob(t < tc) based on HC2 SEs"
/  RNAMES = XRTITLES.
END MATRIX.

*Sample output from this syntax is as follows:.

*HC2 Small Sample Corrected Standard Errors.
*CONSTANT  4.077517E-02.
*AGE_DWEL  1.726199E-04.
*BATHROOM  2.761153E-02.
*BEDROOMS  1.293651E-02.

*OLS Coefficients.
*CONSTANT  3.536550E+00.
*AGE_DWEL  1.584464E-03.
*BATHROOM  2.258710E-01.
*BEDROOMS  2.721069E-01.

*t-values based on HC2 corrected SEs.
*CONSTANT  8.673291E+01.
*AGE_DWEL  9.178915E+00.
*BATHROOM  8.180314E+00.
*BEDROOMS  2.103402E+01.

*Prob(t < tc) based on HC2 SEs.
*CONSTANT  0.000000E+00.
*AGE_DWEL  0.000000E+00.
*BATHROOM  1.998401E-15.
*BEDROOMS  0.000000E+00.

*For HC3, you need to make sure that your sample is not too large otherwise the computer may crash.  
*You can temporarily draw a random sub-sample by using the TEMPORARY. 
*SAMPLE p. where p is the proportion of the sample (e.g. if p = 0.5, you have selected 40% of your sample for the following operations).


*HC3.
/*when Computing HC3 make sure n is < 250 (e.g. use TEMPORARY. SAMPLE 0.4.) */.
TEMPORARY.
SAMPLE 0.4.
MATRIX.
GET Y / VARIABLES = flarea_l.   
GET X / VARIABLES = CONSTANT, age_dwel, bathroom, bedrooms
/ NAMES = XTITLES.
GET RESIDUAL / VARIABLES = RES_1.
GET ESQ / VARIABLES = ESQ.
COMPUTE XRTITLES = TRANSPOS(XTITLES).
COMPUTE N = NROW(ESQ).
COMPUTE K = NCOL(X).
COMPUTE O = MDIAG(ESQ).
COMPUTE XX = TRANSPOS(X) * X.
COMPUTE XX_1 = INV(XX).
COMPUTE X_1 = TRANSPOS(X).
COMPUTE H = X*XX_1*X_1.
COMPUTE H_MONE =  h * -1.
COMPUTE ONE_H = H_MONE + 1.
/*Computing HC3*/.
COMPUTE  ONE_H_SQ = ONE_H &** 2.
COMPUTE O_HC3 = O &/ ONE_H_SQ.
COMPUTE HC3_a = XX_1 * X_1 *O_HC3.
COMPUTE HC3 = HC3_a * X*XX_1.
COMPUTE HC3DIAG = DIAG(HC3).
COMPUTE HC3_SE = SQRT(HC3DIAG).
COMPUTE B = XX_1 * X_1 * Y.
PRINT B
/ FORMAT = "E13"
/TITLE = "OLS Coefficients".
PRINT HC3_SE  
  / FORMAT = "E13"
  / TITLE = "HC3 Small Sample Corrected Standard Errors"
  / RNAMES = XRTITLES.
COMPUTE HC3_TVAL = B / HC3_SE.
PRINT HC3_TVAL
/ FORMAT = "E13"
/ TITLE = "t-values based on HC3 corrected SEs"
/  RNAMES = XRTITLES.
COMPUTE SIG_HC3T = 2*(1- TCDF(ABS(HC3_TVAL), N)) .
PRINT SIG_HC3T
/ FORMAT = "E13"
/ TITLE = "Prob(t < tc) based on HC3 SEs"
/  RNAMES = XRTITLES.
END MATRIX.

*Sample output from the above syntax is as follows:.

*OLS Coefficients.
*  3.530325E+00.
*  1.546620E-03.
*  2.213146E-01.
*  2.745376E-01.

*HC3 Small Sample Corrected Standard Errors.
*CONSTANT  4.518059E-02.
*AGE_DWEL  1.884062E-04.
*BATHROOM  3.106637E-02.
*BEDROOMS  1.489705E-02.

*t-values based on HC3 corrected SEs.
*CONSTANT  7.813809E+01.
*AGE_DWEL  8.208966E+00.
*BATHROOM  7.123928E+00.
*BEDROOMS  1.842899E+01.

*Prob(t < tc) based on HC3 SEs.
*CONSTANT  0.000000E+00.
*AGE_DWEL  2.220446E-15.
*BATHROOM  4.005019E-12.
*BEDROOMS  0.000000E+00.



*References:.
*H. White.  1980. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity." Econometrica, 48, 817-838.
*MacKinnon, J.G. and H. White. (1985), 'Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties'. Journal of Econometrics, 29, 53-57.
*Long, J. S. and Laurie H. Ervin (1999) "Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model", Mimeo, Indiana University. http://www.indiana.edu/~jsl650/files/hccm/99TAS.pdf.