Multiple Linear Regression
Find Scores That
Contribute to violation of assumptions.Are suspect because they are far removed from the centroid (multidimensional mean)Have undue influence on the solution.
Outliers Among the Predictors
Leverage,hiorHat DiagonalThe larger this statistic, the greater the distance between the data point and the centroid in p-dimensional space.Investigate cases withhigreater than2(p-1)/N.pis the number of parameters in the model, including the intercept.
Distance from the Regression Surface
Standardized Residual (akaStudentizedResidual)Difference between actual Y and predicted Y divided by an appropriate standard errorRstudent(akaStudentizedDeleted Residual) – same except for each case the regression surface is that obtained when this individual case is removed.Investigate if greater than 2.
Influence on the Solution
Cook’s D – how much would the regression surface change if this case were removedInvestigate cases with D > 1.Dfbetas– how much would one parameter (slope or intercept) change if this case were removedInvestigate cases with values > 2.
dataregdiag;[email protected]@;SR_LastEjac=sqrt(1+LastEjac);cards;*<data here>;procunivariateplot;varSpermCount--SR_LastEjac;procreg;modelSpermCount= TogetherSR_LastEjac/influencer;run;*<nonsignificant results>;dataculled;setregdiag;IfSpermCount<700;procreg;modelSpermCount= TogetherSR_LastEjac/influencer;*<Significantresults>;title'One Outlier Culled';run;
Y = sperm countX1= % time recently spent with mateX2= time since last ejaculation
Investigate cases with values greater than2(3)/11 = .55.Case 5 is above this cutoff.It is a univariate outlier on theLastEjacvariable.Further investigation indicates the case is valid, so we retain it.
Case 11 has large residuals, it should be investigated.Notice thatRstudentis much larger than the standardized residualThis indicates that removing this case has a large effect on the solution.
Case 11 has a high value of Cook’s D.It has a highDfbetafor the time since last ejaculation predictor, even after I transformed that variable to reduceskewness.Upon investigation, it was found that this subject did not follow the instructions for gathering the data. His scores were deleted.
Absolute values ofg1in excess of 1 indicate problems with normality assumptions.Dealing with outlier might resolve this problem, or might not.If problems remain after dealing with outliers, consider monotonic transformations.
Select a transformation that preserves order but reduces large scores more than small scores.Root transformations – such as square root.Log transformations.Negative reciprocal transformationRank transformation
Reflect to produce equivalent positive skewness and then apply a transformation that reduces positive skewness.Yexp, whereexp> 1 – for example, square the scores.Exponential transformation – opposite of a log transformation.Rank transformation.
Reverse transform your results to get back to the original unit of measure – for example, from square root pounds to pounds.Report the untransformed statistics, the transformed statistics, and the back-transformed statistics.
Some persons may find your use of transformation suspicious.Consider using an analysis that does not require normally distributed data.Nonparametric and Resampling StatisticsPractical Advice on Transformations– read this document.
Plots of Residuals
These can also be useful, butIt takes some practice to get good at detecting problems from such plotsPlot the residual versus predicted Y
Trying Squaring One Predictor
Residuals not Normal and Variance not Constant
Regression Diagnostics withSPSSProducing and Interpreting Residuals Plots