Publish on 12th November 2019
Category: Birds
0

Multiple Linear Regression

Regression Diagnostics

Find Scores That

Contribute to violation of assumptions.Are suspect because they are far removed from the centroid (multidimensional mean)Have undue influence on the solution.

Outliers Among the Predictors

Leverage,hiorHat DiagonalThe larger this statistic, the greater the distance between the data point and the centroid in p-dimensional space.Investigate cases withhigreater than2(p-1)/N.pis the number of parameters in the model, including the intercept.

Distance from the Regression Surface

Standardized Residual (akaStudentizedResidual)Difference between actual Y and predicted Y divided by an appropriate standard errorRstudent(akaStudentizedDeleted Residual) – same except for each case the regression surface is that obtained when this individual case is removed.Investigate if greater than 2.

Influence on the Solution

Cook’s D – how much would the regression surface change if this case were removedInvestigate cases with D > 1.Dfbetas– how much would one parameter (slope or intercept) change if this case were removedInvestigate cases with values > 2.

SAS Code

dataregdiag;[email protected]@;SR_LastEjac=sqrt(1+LastEjac);cards;*<data here>;procunivariateplot;varSpermCount--SR_LastEjac;procreg;modelSpermCount= TogetherSR_LastEjac/influencer;run;*<nonsignificant results>;dataculled;setregdiag;IfSpermCount<700;procreg;modelSpermCount= TogetherSR_LastEjac/influencer;*<Significantresults>;title'One Outlier Culled';run;

Simple Example

Y = sperm countX1= % time recently spent with mateX2= time since last ejaculation

Leverage

Investigate cases with values greater than2(3)/11 = .55.Case 5 is above this cutoff.It is a univariate outlier on theLastEjacvariable.Further investigation indicates the case is valid, so we retain it.

Residuals

Case 11 has large residuals, it should be investigated.Notice thatRstudentis much larger than the standardized residualThis indicates that removing this case has a large effect on the solution.

Influence

Case 11 has a high value of Cook’s D.It has a highDfbetafor the time since last ejaculation predictor, even after I transformed that variable to reduceskewness.Upon investigation, it was found that this subject did not follow the instructions for gathering the data. His scores were deleted.

Skewness

Absolute values ofg1in excess of 1 indicate problems with normality assumptions.Dealing with outlier might resolve this problem, or might not.If problems remain after dealing with outliers, consider monotonic transformations.

Positive Skewness

Select a transformation that preserves order but reduces large scores more than small scores.Root transformations – such as square root.Log transformations.Negative reciprocal transformationRank transformation

Negative Skewness

Reflect to produce equivalent positive skewness and then apply a transformation that reduces positive skewness.Yexp, whereexp> 1 – for example, square the scores.Exponential transformation – opposite of a log transformation.Rank transformation.

Back Transformation

Reverse transform your results to get back to the original unit of measure – for example, from square root pounds to pounds.Report the untransformed statistics, the transformed statistics, and the back-transformed statistics.

Mistrust

Some persons may find your use of transformation suspicious.Consider using an analysis that does not require normally distributed data.Nonparametric and Resampling StatisticsPractical Advice on Transformations– read this document.

Plots of Residuals

These can also be useful, butIt takes some practice to get good at detecting problems from such plotsPlot the residual versus predicted Y

Heteroscedasticity

Trying Squaring One Predictor

Residuals not Normal and Variance not Constant

SPSS

Regression Diagnostics withSPSSProducing and Interpreting Residuals Plots

0

Embed

Upload

One-Way ANOVA