1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386
*(Q) I was wondering if anyone had used SPSS to do any kind of piecewise regression. I have some data which are linearly related however they contain discontinuities (jumps), which I would like to include in the model. * (A) Posted to SPSSX-L on 2001/10/03 by David Matheson (SPSS Technical Support) I've pasted a pair of Solutions below from the SPSS AnswerNet. The first deals with piecewise regression when the knot points are known. The second solution deals with the situation where the knot points are estimated from the data. (However, my understanding is that you can't estimate both a knot point and a jump parameter at that knot. See the Seber reference in the second solution). Solution 100000148 : Spline regression (a.k.a. piecewise polynomials) Q. How can I test a regression model where I expect the slope to shift up or down, or even change sign, at a particular value of the predictor? What if I expect all values of the response variable to be increased by some constant if the predictor exceeds a certain value? A. Regression models in which the function changes at one or more points along the range of the predictor are called splines, or piecewise polynomials, and the location of these shifts are called knots. If the knots are fixed by the analyst, then splines can be fitted quite easily with the REGRESSION procedure. The properties of these splines are described below, followed by a description of the process of fitting them with SPSS. A spline model is hypothesized when the analyst expects that the relationship between the predictor and the response variable is altered at some value or values along the range of the predictor. The shift at the knot points could involve a change in the form of the relationship, such as a shift from a linear to a quadratic relationship, the addition or subtraction of a constant to all predicted response values to the right of the knot, or simply a change in the slope, acceleration, etc. of the regression function. Suppose we have a predictor XA and a response variable Y, where we expect the regression function to change if XA is greater than some fixed value K1. Suppose we have also computed the squared and cubed values of XA in XA2 and XA3, allowing us to test a cubic model. The spline is fit by creating an additional set of predictor variables, XB0 to XB3, which are all equal to 0 if XA <= K1. These new variables are used to fit the change in the intercept, linear, quadratic, and cubic terms, respectively, of the model for XA > K1. If XA > K1, then: XB0=1 ; XB1=(XA-K1) ; XB2=(XA-K1)**2 ; XB3=(XA-K1)**3 . Including XB0 in the model allows for a discontinuity, or jump point, in the regression function. Excluding XB0 means that the regression line may change direction at the knot, but the two pieces of the line will be joined at the knot. Note that some texts define the XB0 dummy variable as 1 if XA is greater than OR EQUAL TO K1, whereas others define it as 1 for XA > K1, as shown above. Hopefully, the process that is believed to underlie the shift in slope will provide guidance as to whether the function is right-continuous (XB0=1 if XA=K1) or left-continuous (XB0=0 if XA=K1) at the knot. For all the higher-order XB terms, the value of the XB variable will be 0 at K1 in either case. If we wanted to test the model that the regression line was continuous at K1 and linear on both sides of K1, but that the slope changed at K1, then XA and XB1 would be included as predictors. The t-test for the significance of the XB1 coefficient, in the presence of XA, would indicate whether there was a significant change in slope at K1. The sign and magnitude of the XB1 coefficient would indicate the direction and magnitude of the change. In choosing a spline model, there is a tradeoff between the smoothness of the function at the knots and obtaining a good fit to the data. For a regression function of degree R, maximum smoothness is obtained by fixing all derivatives up to R-1 to be equal for the two pieces. This constraint is achieved by only adding the Rth degree variable from the XB set. In the above example, Y would be predicted by XA, XA2, XA3, and XB3. Only the cubic term is altered after the knot. If this model fits the data or theory poorly, the analyst may want to relax the continuity constraints by entering lower order (XA-K1) terms (i.e. XB variables) into the model. An excellent introduction to defining and testing spline models is available in a paper by Patricia Smith ("Splines as Useful and Convenient Statistical Tools", in American Statistician, 1979). Smith includes a discussion of the constraints required when the function after the knot point is of a lower order than the function before the knot point. Keep in mind that as the number of knots increases, as well as the number of parameters for each piece between the knots, multicollinearity can quickly become a problem. Four examples of spline model definition and testing are provided below. In each example, assume that Y and XA are observed variables that exist in the active file. The regression commands are rather sparse in the example, i.e. without residual plots or other diagnostic information, in order to focus on the aspects that are specific to spline fitting. (However, the use of plots, collinearity and influence diagnostics should be routinely used with REGRESSION to check model fit, adherence to assumptions, and the effect of outliers.) EXAMPLE 1 In the first example, the linear model has 2 knots, at XA=15 and XA=25, and is continuous at both knot points. COMPUTE xb1 = xa - 15. COMPUTE xc1 = xa - 25. RECODE xb1 xc1 (lo thru 0 = 0). REGRESSION VARIABLES = y xa xb1 xc1 /STATISTICS = DEFAULT /DEP = y /METHOD ENTER xa /METHOD ENTER xb1 /METHOD ENTER xc1. XB1 and XC1 are entered separately to test whether slope does change at the first knot without considering the second knot, by testing whether the coefficient for XB1 = 0 in the presence of XA. The XC1 coefficient is then tested in the presence of XA and XB1. EXAMPLE 2 In the second example, there is one knot at XA=15. The spline is continuous at the knot, but the order of the polynomial changes. Before the knot, the relationship is linear; after the knot, quadratic. XB1 and XB2 are created to model this effect. Note that the quadratic effect for the XB piece is tested in the context of the linear term for that piece of the spline. Testing higher-order terms in the presence of all lower-order terms is standard practice in fitting polynomials. However, this practice may be contrasted with that in Example 4, where the higher-order terms for the XB piece are entered first. In the latter case, it is the continuity restrictions that are being tested sequentially, with each new XB term improving the fit of the regression function and reducing its smoothness. COMPUTE xb1 = xa - 15. RECODE xb1 (LO THRU 0 = 0). COMPUTE xb2 = xb1*xb1. REGRESSION VARIABLES = y xa xb1 xb2 /STATISTICS = DEFAULT CHA /DEP = y /METHOD ENTER xa /METHOD ENTER xb1 /METHOD ENTER xb2. EXAMPLE 3 In the third example, the spline is discontinuous at the knot, XA=15. The model is that the function is quadratic both before and after the knot. The function is right continuous at the knot, i.e. if XA=15 then XB0=1 (the function 'jumps' at 15, not after 15). COMPUTE XA2 = XA*XA. COMPUTE XB0 = (XA GE 15). COMPUTE XB1 = XB0 * (XA - 15). COMPUTE XB2 = XB1 * XB1. REGRESSION VARIABLES = y xa xa2 xb0 xb1 xb2 /STATISTICS = DEFAULT CHA /DEP = y /METHOD ENTER xa xa2 /METHOD ENTER xb0 xb1 xb2. Here, the model with a single quadratic function is tested first. The need for the spline is then tested by testing the XB variables as a block . If the change in R-squared is not significant, it would seem that a single function describes the relationship of X to Y along the full range of X in the data. EXAMPLE 4 In the fourth example, there is one knot at XA=15. A cubic spline is fitted with the constraint that all second-order and lower-order derivatives are equal on both sides of the knot, which is equivalent to saying that only the cubic term of the function is adjusted after the knot. The continuity constraints are gradually relaxed through a series of METHOD subcommands that introduce the lower-order XB terms into the model. Note that lower-order terms are present for the full range of X through the XA linear and quadratic terms, even when the lower-order XB terms are absent from the model. COMPUTE XA2 = XA*XA. COMPUTE XA3 = XA2*XA. COMPUTE XB0 = (XA GE 15). COMPUTE XB1 = XB0 * (XA - 15). COMPUTE XB2 = XB1 * XB1. COMPUTE XB3 = XB2 * XB1. REGRESSION VARIABLES = y xa xa2 xa3 xb0 xb1 xb2 xb3 /STATISTICS = DEFAULT CHA /DEP = y /METHOD ENTER XA XA2 XA3 /METHOD ENTER XB3 /METHOD ENTER XB1 /METHOD ENTER XB0. ******************************************. Solution 100009298 : Spline Regression with Estimated Knots in SPSS Q. I want to fit a regression model where I expect the slope to shift up or down, or even change sign, at a particular value of the predictor. Also, I want that shift point to be estimated as a parameter in the model. Can this model be fitted in SPSS? A. Regression models in which the function changes at one or more points along the range of the predictor are called splines, or piecewise polynomials, and the location of these shifts are called knots. If the knots are fixed by the analyst, then splines can be fitted quite easily with the SPSS REGRESSION procedure. The properties of these splines are described in Solution 100000148, with by a description of the process of fitting them with REGRESSION. If the knots are to be estimated from the data, i.e. if the knots are variable, then you can fit many such models with the SPSS NLR and CNLR procedures, as described below.These procedures perform Nonlinear Regression and Constrained Nonlinear Regression, respectively. One property of NLR which allows us to fit variable-knot models is the facility to express multiplicative relationships between parameters. The MODEL PROGRAM command from the first example, a 3-phase linear spline model, illustrates this point. MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25. COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa GE knot1) + bc1*(xa-knot2)*(xa GE knot2). NLR y1 with xa / pred = predex1 /save pred resid (residex1). The parameters ba0 and ba1 are the intercept and the linear coefficient, respectively, for the predictor, XA. The parameter bb1 is the second-phase adjustment to the slope. It is applied to (xa - knot1), where knot1 is the first knot. By multiplying this model term by the logical expression (xa GE knot1), we limit the influence of bb1 to values of XA at or above the first knot). The parameter bc1 is a further adjustment to the slope in the third phase. Its relationship to knot2 is similar to the bb1-knot1 relationship. By naming knot1 and knot2 in the MODEL PROGRAM statement, we declare them as parameters to be estimated. The numbers assigned to each of the parameters are starting values for the iterative maximum likelihood estimation process used by NLR and CNLR. However, one constraint must be placed on variable-knot models which is not required in fixed-knot models. The model must be continuous at the knots. There cannot be a "jump", or step, in the model at the knot. This constraint simply requires the absence of a phase 2 intercept, or step parameter, to be enforced in either NLR or CNLR. Also, the examples below are restricted to models where the number of knots is fixed by the analyst, although their location is estimated from the data. Finally, if you have more than one knot in the model, you may wish to constrain 'later' knots to have higher values than 'earlier' knots. The assignment of higher starting values for 'later' knots than 'earlier' knots may be sufficient for this purpose, but in a poorly-identified model, the additional constraint may be helpful. The CNLR procedure must be used for this purpose, with the constraints expressed in the /BOUNDS subcommand. Such a constraint is illustrated in Example 1, which has 2 knot points. See Chapter 9 of Seber, G.A.F., & Wild, C.J. (1989). Nonlinear regression. New York: Wiley, for a discussion of spline regression models with fixed or variable knots. Three examples of analyses of variable-knot spline regression models are shown below. An input program is presented that constructs a data set that allows you to run these commands in SPSS. The predictor in each example is the variable XA. There are 3 replicates at each value of XA. The dependent variables are Y1, Y2, and Y3 for examples 1, 2, and 3, respectively. They are generated with SPSS commands just before their respective analyses. The examples are somewhat contrived and we may have cheated slightly by using parameter starting values that equal the data-generating values for those parameters. However, these examples are merely illustrative of the process of using these commands for spline models. ******************************************. SET SEED = 2000000. title 'test spline regression with estimated knots, good start values'. INPUT PROGRAM. LOOP xa = 1 TO 35. LOOP rep = 1 TO 3. LEAVE xa. END case. END LOOP. END LOOP. END file. END INPUT PROGRAM. EXECUTE. * EXAMPLE 1 In the first example, the linear model has 2 knots, with starting values at XA=15 and XA=25, and is continuous at both knot points. * y1 is dep var for example 1, with knots at 15 and 25, linear . COMPUTE y1=3 + 3*xa + normal(2). IF (xa gt 15) y1=y1 - 4*(xa-15). IF (xa gt 25) y1=y1 + 2*(xa-25). PLOT /PLOT=y1 with xa. * Constraints not placed on knot values. MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25. COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa GE knot1) + bc1*(xa-knot2)*(xa GE knot2). NLR y1 / PRED = predex1 /SAVE pred resid (residex1). * Diagnostic plots of residuals, predicted values . PLOT /PLOT=residex1 y1 with predex1 /PLOT=predex1 with xa. * Note that in SPSS versions prior to 5.0, you must include the * predictor(s) in the NLR and CNLR commands, as in: * NLR y1 WITH xa / PRED = predex1 /SAVE pred resid (residex1). * Including the predictors list returns a warning in later SPSS versions, * but the command will still run. * Example 1 with KNOT2 constrained to be greater than KNOT1 . MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25. COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa ge knot1) + bc1*(xa-knot2)*(xa ge knot2). CNLR y1 / PRED = predex1 /SAVE pred resid (residex1) / BOUNDS knot2 - knot1 > 0 . PLOT /PLOT=residex1 y1 WITH predex1 /PLOT=predex1 WITH xa. * EXAMPLE 2. * In the second example, there is one knot at XA=15. The spline is * continuous at the knot, but the order of the polynomial changes. * Before the knot, the relationship is linear; after the knot, quadratic. * Y2 is dep var for example 2, knot at 15, linear,quadratic. COMPUTE y2 = 4 + 2*xa + normal(.5). IF (xa gt 15) y2=y2 + (xa - 15) - .5*(xa-15)*(xa-15). PLOT /PLOT=y2 with xa. * In this model, we compute an intermediate term, pc2, to * stand for the expression (xa GE knot1). This term is just * used to avoid repetitive typing of the logical expression. * Note that knot1 is still the parameter to be estimated. MODEL PROGRAM ba0=4 ba1=2 bb1=1 bb2=-0.5 knot1=15 . COMPUTE pc2 = (xa GE knot1). COMPUTE predex2 = ba0 + ba1*xa + bb1*(xa-knot1)*pc2 + bb2*((xa-knot1)**2)*pc2. NLR y2 / PRED = predex2 /SAVE pred resid (residex2). PLOT /PLOT=residex2 y2 with predex2 /PLOT=predex2 with xa. * EXAMPLE 3. * In the third example, there is one knot at XA=15. A cubic spline is * fitted with the constraint that all second-order and lower-order * derivatives are equal on both sides of the knot, which is equivalent to * saying that only the cubic term of the function is adjusted after the * knot. The continuity constraints are gradually relaxed in subsequent NLR * runs that introduce the lower-order XB terms into the * model. If you examine the correlations among the parameter estimates, you * will see that the latter 2 models are poorly identified as many of these * correlations are very high. In the final model, some of the parameter estimates * are perfectly correlated. These models are shown here as illustrations * of gradually loosening constraints in the model. COMPUTE y3 = 6 + 3.5*xa - 1.5*xa*xa + .5*xa*xa*xa + normal(.5). IF (xa gt 15) y3 = y3 - 2*(xa-15)*(xa-15)*(xa-15). PLOT /PLOT=y3 with xa. MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5 bb3=-2 knot1=15 . COMPUTE pc2 = (xa ge knot1). COMPUTE predex3 = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa + bb3*((xa-knot1)**3)*pc2. NLR y3 / PRED = predex3 /SAVE pred resid (residex3). PLOT /PLOT=residex3 y3 with predex3 /PLOT=predex3 with xa. * example 3 but with a quadratic term added to 2nd piece . MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5 bb2=0 bb3=-2 knot1=15 . COMPUTE pc2 = (xa ge knot1). COMPUTE pred = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa + bb2*((xa-knot1)**2)*pc2 + bb3*((xa-knot1)**3)*pc2. NLR y3. * example 3 but with a quadratic & linear term added to 2nd piece . MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5 bb1=0 bb2=0 bb3=-2 knot1=15 . COMPUTE pc2 = (xa ge knot1). COMPUTE pred = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa + bb1*(xa-knot1)*pc2 + bb2*((xa-knot1)**2)*pc2 + bb3*((xa-knot1)**3)*pc2. NLR y3.