RA tutorial

Additional Material from Professor Zhao

Error correction and material supplement in the textbook

Error 1: Translocation of X and Y in the plot

reg.data <- read.table("regression.txt",header=T)
attach(reg.data)
names(reg.data)
## [1] "growth" "tannin"
model <- lm(growth~tannin)

Pay attention that growth is the response variable and tannin is the explanatory variable. The right plot is for correction.

par(mfrow=c(1,2))

plot(reg.data,main="Explanatory against Response (wrong plot) ")
abline(model,col="blue")
plot(reg.data$tannin,reg.data$growth, main="Response against Explanatory (right plot)")
abline(model,col="blue")

Error 2: PCT in the part of Jackknife stands for PCI

Additionally, Jackknife deals with the skewness issue in the bootstrap distribution. For the percentile confidence interval (PCI), the endpoints for PCI is adjusted to right if it is positively shewed. Vice versa. The endpoints for PCI is adjusted to left if it is negatively shewed.

Supplement materials and websites for further study

Supplement 1: use ggplot2 package to automatically generate the 95% confident interval for prediction

library(ggplot2)

ggplot(reg.data, aes(tannin, growth)) + 
  geom_point() + 
  geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'

Supplement 2: Combine multiple figures using par function and assign letters for figure caption.

regdat <- read.table("regdat.txt",header=T)
attach(regdat)
model <- lm(response~explanatory)
par(mfrow = c(2,3))
plot(model, which = 1:6, caption = LETTERS[1:6])

Additional websites for further learning and understanding on the topics mentioned.

How to interpret the diagnostic plots for regression analysis

Available at: https://data.library.virginia.edu/diagnostic-plots/

Standardized residuals

\[R_{st} = Res / sd(Res)\]

Available at:

https://www.isixsigma.com/dictionary/standardized-residual/

http://www.r-tutor.com/elementary-statistics/simple-linear-regression/standardized-residual

Leverage

An observation is considered to have high leverage if it has a value (or values) for the predictor variables that are much more extreme compared to the rest of the observations in the dataset.

Available at: https://www.statology.org/leverage-in-r/

Cook’s distance

it measures how much all of the fitted values in the model change when a data point is deleted.

Available at:

https://www.statology.org/how-to-identify-influential-data-points-using-cooks-distance/

https://www.mathworks.com/help/stats/cooks-distance.html

https://www.statology.org/cooks-distance-python/

Excercises for Regression Analysis

Excercise 1

trees is a built-in dataset in R. Try to build up possible multi-factor regressions of Volume against Girth and Height, and determine the best one through regression analysis. Choose methods we have learned today, whicht you think fit in this case most.

Solution for Excercise 1

Read the data first.

trees <- trees

Find the plot out scatter plots of Volume against Girth and Height against Height. Try to find out the correlations separately.

par(mfrow=c(1,2))
plot(Volume~Girth,data = trees)
plot(Volume~Height,data = trees)

We can notice that the correlation between Volume and Girth is likely to be straight fitting, while the correlation between Volume and Height is difficult to determine.

Let’s call the Volume as y, the Girth as x1 and the Height as x2.

y <- trees$Volume
x1 <- trees$Girth
x2 <- trees$Height

We first try \(Y = β~0~ + β~1~X~1~ + β~2~X~2~ + β~3~X~2~^2^ + β~4~X~1~X~2~\)

lm1 <- lm(y ~ x1*x2 + I(x2^2))
summary(lm1)
## 
## Call:
## lm(formula = y ~ x1 * x2 + I(x2^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.8437 -1.0913  0.3604  1.5254  4.5031 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 31.193874  62.958066   0.495  0.62443    
## x1          -6.708175   2.335613  -2.872  0.00801 ** 
## x2          -0.123276   1.814453  -0.068  0.94635    
## I(x2^2)     -0.008744   0.013313  -0.657  0.51710    
## x1:x2        0.145602   0.029747   4.895 4.44e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.738 on 26 degrees of freedom
## Multiple R-squared:  0.976,  Adjusted R-squared:  0.9723 
## F-statistic: 263.9 on 4 and 26 DF,  p-value: < 2.2e-16

We can see although this model passes the t-test and F-test, the p-value of some of the parameters are bigger than 0.05, which means these parameters probably do not exist. Let’s use the step function to optimize the model.

tstep <- step(lm1)
## Start:  AIC=66.98
## y ~ x1 * x2 + I(x2^2)
## 
##           Df Sum of Sq    RSS    AIC
## - I(x2^2)  1     3.233 198.08 65.495
## <none>                 194.85 66.985
## - x1:x2    1   179.545 374.39 85.231
## 
## Step:  AIC=65.49
## y ~ x1 + x2 + x1:x2
## 
##         Df Sum of Sq    RSS    AIC
## <none>               198.08 65.495
## - x1:x2  1    223.84 421.92 86.936

It shows that after deleting the β3X22 the AIC decreased most, so the new model should be \(Y = β~0~ + β~1~X~1~ + β~2~X~2~ + β~3~X~1~X~2~\)

lm2 <- lm(y ~ x1 + x2 + x1:x2)
summary(lm2)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x1:x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5821 -1.0673  0.3026  1.5641  4.6649 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 69.39632   23.83575   2.911  0.00713 ** 
## x1          -5.85585    1.92134  -3.048  0.00511 ** 
## x2          -1.29708    0.30984  -4.186  0.00027 ***
## x1:x2        0.13465    0.02438   5.524 7.48e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.709 on 27 degrees of freedom
## Multiple R-squared:  0.9756, Adjusted R-squared:  0.9728 
## F-statistic: 359.3 on 3 and 27 DF,  p-value: < 2.2e-16

We can see this model passes the t-test and F-test, and the p-value of all the parameters are smaller than 0.05. Next we plot it out to further check the model.

par(mfrow=c(2,2))
plot(lm2)

According to the Q-Q plot, we can see quite number of points do not fall on the line, which means this model is still not good enough. For the next step, I choose to build a new model with log. The new model will be like \(Y = β~0~ + β~1~log(X~1~) + β~2~log(X~2~) + β~3~log(X~1~X~2~)\)

lm3 <- lm(log(y)~log(x1)*log(x2))
summary(lm3)
## 
## Call:
## lm(formula = log(y) ~ log(x1) * log(x2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.165941 -0.048613  0.006384  0.062204  0.132295 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)      -3.6869     7.6996  -0.479    0.636
## log(x1)           0.7942     3.0910   0.257    0.799
## log(x2)           0.4377     1.7788   0.246    0.808
## log(x1):log(x2)   0.2740     0.7124   0.385    0.704
## 
## Residual standard error: 0.08265 on 27 degrees of freedom
## Multiple R-squared:  0.9778, Adjusted R-squared:  0.9753 
## F-statistic: 396.4 on 3 and 27 DF,  p-value: < 2.2e-16

We can see that though the residual standard error looks very good and the t-test and F-test are passed, the significance of all parameters are not significant. Still use step to this model to improve it.

tstep <- step(lm3)
## Start:  AIC=-150.85
## log(y) ~ log(x1) * log(x2)
## 
##                   Df Sum of Sq     RSS     AIC
## - log(x1):log(x2)  1 0.0010105 0.18546 -152.69
## <none>                         0.18445 -150.85
## 
## Step:  AIC=-152.69
## log(y) ~ log(x1) + log(x2)
## 
##           Df Sum of Sq    RSS      AIC
## <none>                 0.1855 -152.685
## - log(x2)  1    0.1978 0.3832 -132.185
## - log(x1)  1    4.6275 4.8130  -53.743

It shows that after deleting β3log(X1X2) the AIC decreases the most. As the result the model will be change to \(Y = β~0~ + β~1~log(X~1~)+ β~2~log(X~2~)\)

lm4 <- lm(log(y)~log(x1)+log(x2))
summary(lm4)
## 
## Call:
## lm(formula = log(y) ~ log(x1) + log(x2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.168561 -0.048488  0.002431  0.063637  0.129223 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.63162    0.79979  -8.292 5.06e-09 ***
## log(x1)      1.98265    0.07501  26.432  < 2e-16 ***
## log(x2)      1.11712    0.20444   5.464 7.81e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08139 on 28 degrees of freedom
## Multiple R-squared:  0.9777, Adjusted R-squared:  0.9761 
## F-statistic: 613.2 on 2 and 28 DF,  p-value: < 2.2e-16

It passes the t-test and F-test, also all the parameters have p-values smaller than 0.05, so the optimization is successful. Next plot it to see if it is good enough.

par(mfrow=c(2,2))
plot(lm4)

All the plots are acceptable for a good regression model, so the possible regression model in this exercise should be \(Y = β~0~ +β~1~log(X~1~) + β~2~log(X~2~)\)

Exercise 2

Use the ozone.data to perform a regression diagnosis using the strategy we have learned in this lesson. We want to figure out the relationship between the response variable ozone level with the three explanatory variables, namely radiation, temperature, and wind.

To give some insights for regression analysis, you can first have a general overview of the data using gam that we have learned in Week 8 and then use the diagnostic methods such as bootstrap, jackknife, diagnostic plot, and robust regression that we newly learned in this lesson. The outcome of this exercise is to generate a satisfying multiple regression model.

Solution for Exercise 2

ozone.pollution <- read.table("ozone.data.txt",header=T)
attach(ozone.pollution)
names(ozone.pollution)
## [1] "rad"   "temp"  "wind"  "ozone"
summary(ozone.pollution)
##       rad             temp            wind            ozone      
##  Min.   :  7.0   Min.   :57.00   Min.   : 2.300   Min.   :  1.0  
##  1st Qu.:113.5   1st Qu.:71.00   1st Qu.: 7.400   1st Qu.: 18.0  
##  Median :207.0   Median :79.00   Median : 9.700   Median : 31.0  
##  Mean   :184.8   Mean   :77.79   Mean   : 9.939   Mean   : 42.1  
##  3rd Qu.:255.5   3rd Qu.:84.50   3rd Qu.:11.500   3rd Qu.: 62.0  
##  Max.   :334.0   Max.   :97.00   Max.   :20.700   Max.   :168.0

Question: How is ozone concentration related to wind speed, air temperature and the intensity of solar radiation?

  1. Check the relationship between the variables by correlation.

We use pairs function to look at all the correlation with pair-wise scatter plots between each variable pair.

pairs(ozone.pollution,panel=panel.smooth)

Alternatively, ggparis function can also generate similar result, which also contains the density plots of the continuous variables and the correlation coefficient between each variable pair.

library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
ggpairs(ozone.pollution)

Besides, if you are interested specifically in the correlation coefficientcorrplot function can be applied to generate a visualized correlation matrix.

library(corrplot)
## corrplot 0.92 loaded
cor <- cor(ozone.pollution)
corrplot(cor,method="shade",type="full",addCoef.col = "black")

As a result, there is a strong negative correlation between ozone and wind (-0.61), a strong positive correlation between ozone and temp (0.7) and between wind and temp (0.5).

As learned in Week 8, we can build the non-parametric smoothers in a generalized additive model GAM like this:

library(mgcv)
## 载入需要的程辑包:nlme
## This is mgcv 1.8-39. For overview type 'help("mgcv-package")'.
library(nlme)
par(mfrow=c(2,2))
model <- gam(ozone.pollution$ozone~s(rad)+s(temp)+s(wind))
plot(model)

  1. Multiple linear regression diagnosis

We start with the most complicated model: this includes curvature terms for each variable, all three two-way interactions and a three-way interaction:

attach(ozone.pollution)
## The following objects are masked from ozone.pollution (pos = 7):
## 
##     ozone, rad, temp, wind
w2 <- wind^2
t2 <- temp^2
r2 <- rad^2
tw <- temp*wind
wr <- wind*rad
tr <- temp*rad
wtr <- wind*temp*rad

model1 <- lm(ozone.pollution$ozone~rad+temp+wind+t2+w2+r2+wr+tr+tw+wtr)
summary(model1)
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ rad + temp + wind + t2 + 
##     w2 + r2 + wr + tr + tw + wtr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.894 -11.205  -2.736   8.809  70.551 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.683e+02  2.073e+02   2.741  0.00725 ** 
## rad         -3.117e-01  5.585e-01  -0.558  0.57799    
## temp        -1.076e+01  4.303e+00  -2.501  0.01401 *  
## wind        -3.237e+01  1.173e+01  -2.760  0.00687 ** 
## t2           5.833e-02  2.396e-02   2.435  0.01668 *  
## w2           6.106e-01  1.469e-01   4.157 6.81e-05 ***
## r2          -3.619e-04  2.573e-04  -1.407  0.16265    
## wr           2.054e-02  4.892e-02   0.420  0.67552    
## tr           8.403e-03  7.512e-03   1.119  0.26602    
## tw           2.377e-01  1.367e-01   1.739  0.08519 .  
## wtr         -4.324e-04  6.595e-04  -0.656  0.51358    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.82 on 100 degrees of freedom
## Multiple R-squared:  0.7394, Adjusted R-squared:  0.7133 
## F-statistic: 28.37 on 10 and 100 DF,  p-value: < 2.2e-16

#conduct the ‘p values on deletion’

model2 <- update(model1,~.-wtr)
summary(model2)
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ rad + temp + wind + t2 + 
##     w2 + r2 + wr + tr + tw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.611 -11.455  -2.901   8.548  70.325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.245e+02  1.957e+02   2.680   0.0086 ** 
## rad          2.628e-02  2.142e-01   0.123   0.9026    
## temp        -1.021e+01  4.209e+00  -2.427   0.0170 *  
## wind        -2.802e+01  9.645e+00  -2.906   0.0045 ** 
## t2           5.953e-02  2.382e-02   2.499   0.0141 *  
## w2           6.173e-01  1.461e-01   4.225 5.25e-05 ***
## r2          -3.388e-04  2.541e-04  -1.333   0.1855    
## wr          -1.127e-02  6.277e-03  -1.795   0.0756 .  
## tr           3.750e-03  2.459e-03   1.525   0.1303    
## tw           1.734e-01  9.497e-02   1.825   0.0709 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.77 on 101 degrees of freedom
## Multiple R-squared:  0.7383, Adjusted R-squared:  0.715 
## F-statistic: 31.66 on 9 and 101 DF,  p-value: < 2.2e-16
model3 <- update(model2,~.-r2)
summary(model3)
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ rad + temp + wind + t2 + 
##     w2 + wr + tr + tw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.188 -11.387  -1.500   8.752  71.289 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 486.346603 194.333075   2.503  0.01392 *  
## rad          -0.043163   0.208535  -0.207  0.83644    
## temp         -9.446780   4.185240  -2.257  0.02613 *  
## wind        -26.471461   9.610816  -2.754  0.00697 ** 
## t2            0.056966   0.023835   2.390  0.01868 *  
## w2            0.599709   0.146069   4.106 8.14e-05 ***
## wr           -0.011359   0.006300  -1.803  0.07435 .  
## tr            0.003160   0.002428   1.302  0.19600    
## tw            0.157637   0.094595   1.666  0.09869 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.83 on 102 degrees of freedom
## Multiple R-squared:  0.7337, Adjusted R-squared:  0.7128 
## F-statistic: 35.12 on 8 and 102 DF,  p-value: < 2.2e-16
model4 <- update(model3,~.-tr)
summary(model4)
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ rad + temp + wind + t2 + 
##     w2 + wr + tw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.379 -11.375  -2.217   8.921  71.247 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 514.401470 193.783580   2.655  0.00920 ** 
## rad           0.212945   0.069283   3.074  0.00271 ** 
## temp        -10.654041   4.094889  -2.602  0.01064 *  
## wind        -27.391965   9.616998  -2.848  0.00531 ** 
## t2            0.067805   0.022408   3.026  0.00313 ** 
## w2            0.619396   0.145773   4.249 4.72e-05 ***
## wr           -0.013561   0.006089  -2.227  0.02813 *  
## tw            0.169674   0.094458   1.796  0.07538 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.89 on 103 degrees of freedom
## Multiple R-squared:  0.7292, Adjusted R-squared:  0.7108 
## F-statistic: 39.63 on 7 and 103 DF,  p-value: < 2.2e-16
model5 <- update(model4,~.-tw)
summary(model5)
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ rad + temp + wind + t2 + 
##     w2 + wr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.478 -10.735  -2.437   9.685  77.543 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 223.573855 107.618223   2.077 0.040221 *  
## rad           0.173431   0.066398   2.612 0.010333 *  
## temp         -5.197139   2.775039  -1.873 0.063902 .  
## wind        -10.816032   2.736757  -3.952 0.000141 ***
## t2            0.043640   0.018112   2.410 0.017731 *  
## w2            0.430059   0.101767   4.226 5.12e-05 ***
## wr           -0.009819   0.005783  -1.698 0.092507 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.08 on 104 degrees of freedom
## Multiple R-squared:  0.7208, Adjusted R-squared:  0.7047 
## F-statistic: 44.74 on 6 and 104 DF,  p-value: < 2.2e-16
model6 <- update(model5,~.-wr)
summary(model6)
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ rad + temp + wind + t2 + 
##     w2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.044 -10.796  -4.138   8.131  80.098 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 291.16758  100.87723   2.886  0.00473 ** 
## rad           0.06586    0.02005   3.285  0.00139 ** 
## temp         -6.33955    2.71627  -2.334  0.02150 *  
## wind        -13.39674    2.29623  -5.834 6.05e-08 ***
## t2            0.05102    0.01774   2.876  0.00488 ** 
## w2            0.46464    0.10060   4.619 1.10e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.25 on 105 degrees of freedom
## Multiple R-squared:  0.713,  Adjusted R-squared:  0.6994 
## F-statistic: 52.18 on 5 and 105 DF,  p-value: < 2.2e-16

Here is a summary table Table 1 to illustrate the model elements, significance score, and the adjusted R-squared value of each model for simplicity.

\[Table~1~Summary~for~Model~1~to~Model~6\]
Model_Name Model_element Significance_score Adj.R.squared
md 1 rad, temp, wind, t2, w2, r2, wr, tr, tw, wtr (Intercept) **, temp *, wind **, t2 *, w2 ***, tw · 0.7133
md 2 rad, temp, wind, t2, w2, r2, wr, tr, tw (Intercept) **, temp *, wind **, t2 *, w2 *** , wr ·, tw · 0.7150
md 3 rad, temp, wind, t2, w2, wr, tr, tw (Intercept) **, temp *, wind **, t2 *, w2 ***, wr ·, tw · 0.7128
md 4 rad, temp, wind, t2, w2, wr, tw (Intercept) *, temp *, wind **, t2 *, w2 ***, wr ·, tw · 0.7108
md 5 rad, temp, wind, t2, w2, wr (Intercept) **, rad **, temp *, wind **, t2 **, w2 ***, wr *, tw · 0.7047
md 6 rad, temp, wind, t2, w2 (Intercept) **, rad **, temp *, wind ***, t2 *, w2 *** 0.6994

At this moment, model 6 is the best one. So we call the regression diagnosis plots to conduct the model check

par(mfrow=c(2,3))
plot(model6,which = 1:6, caption = LETTERS[1:6])

As a result, some regression diagnosis should be performed. First, the plot of Residuals v.s. Fitted (Figure A) looks not good since some patterns is shown. As the fitted value (y hat) becomes smaller, the residuals (y-y hat) starts to aggregate, which violates the assumption of “constancy of variance”. Second, the QQ plot (Figure B) shows that the line is not straight but a semi-S curve, which violates the assumption of “normality of error”. The Cook’s distance (Figure D) reminds us to pay attention to the influential point 77th and outlier point of 85th and 23th.

Let us try transforming the response variable with log to let data distribution scatter.

model7 <- lm(log(ozone.pollution$ozone) ~ rad+temp+wind+t2+w2+r2+wr+tr+tw+wtr)
summary(model7)
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + temp + wind + 
##     t2 + w2 + r2 + wr + tr + tw + wtr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91943 -0.24169 -0.01742  0.28213  1.11802 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  2.803e+00  5.676e+00   0.494   0.6225  
## rad          2.771e-02  1.529e-02   1.812   0.0729 .
## temp        -3.018e-02  1.178e-01  -0.256   0.7983  
## wind        -9.812e-02  3.211e-01  -0.306   0.7605  
## t2           6.034e-04  6.559e-04   0.920   0.3598  
## w2           8.732e-03  4.021e-03   2.172   0.0322 *
## r2          -1.489e-05  7.043e-06  -2.114   0.0370 *
## wr          -2.001e-03  1.339e-03  -1.494   0.1382  
## tr          -2.507e-04  2.056e-04  -1.219   0.2256  
## tw          -1.985e-03  3.742e-03  -0.530   0.5971  
## wtr          2.535e-05  1.805e-05   1.404   0.1634  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4877 on 100 degrees of freedom
## Multiple R-squared:  0.7116, Adjusted R-squared:  0.6827 
## F-statistic: 24.67 on 10 and 100 DF,  p-value: < 2.2e-16
model8 <- update(model7,~.-wtr)
summary(model8)
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + temp + wind + 
##     t2 + w2 + r2 + wr + tr + tw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.99582 -0.24838 -0.04271  0.32080  1.07835 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  5.373e+00  5.398e+00   0.995   0.3219  
## rad          7.896e-03  5.908e-03   1.336   0.1844  
## temp        -6.230e-02  1.161e-01  -0.537   0.5927  
## wind        -3.531e-01  2.660e-01  -1.327   0.1874  
## t2           5.332e-04  6.571e-04   0.811   0.4191  
## w2           8.340e-03  4.030e-03   2.069   0.0411 *
## r2          -1.624e-05  7.010e-06  -2.317   0.0225 *
## wr          -1.368e-04  1.731e-04  -0.790   0.4313  
## tr           2.195e-05  6.783e-05   0.324   0.7469  
## tw           1.784e-03  2.620e-03   0.681   0.4975  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4901 on 101 degrees of freedom
## Multiple R-squared:  0.7059, Adjusted R-squared:  0.6797 
## F-statistic: 26.93 on 9 and 101 DF,  p-value: < 2.2e-16
model9 <- update(model8,~.-tr)
summary(model9)
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + temp + wind + 
##     t2 + w2 + r2 + wr + tw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96263 -0.24298 -0.04081  0.31953  1.09081 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  5.516e+00  5.357e+00   1.030  0.30558   
## rad          9.533e-03  3.036e-03   3.140  0.00221 **
## temp        -6.949e-02  1.134e-01  -0.613  0.54157   
## wind        -3.574e-01  2.645e-01  -1.351  0.17966   
## t2           6.029e-04  6.180e-04   0.976  0.33160   
## w2           8.451e-03  3.998e-03   2.114  0.03697 * 
## r2          -1.584e-05  6.865e-06  -2.307  0.02310 * 
## wr          -1.517e-04  1.662e-04  -0.913  0.36341   
## tw           1.846e-03  2.601e-03   0.710  0.47956   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4879 on 102 degrees of freedom
## Multiple R-squared:  0.7056, Adjusted R-squared:  0.6825 
## F-statistic: 30.56 on 8 and 102 DF,  p-value: < 2.2e-16
model10 <- update(model9,~.-tw)
summary(model10)
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + temp + wind + 
##     t2 + w2 + r2 + wr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89186 -0.26391 -0.03075  0.33076  1.09627 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  2.326e+00  2.907e+00   0.800  0.42544   
## rad          8.875e-03  2.884e-03   3.077  0.00268 **
## temp        -9.290e-03  7.515e-02  -0.124  0.90185   
## wind        -1.772e-01  7.366e-02  -2.405  0.01795 * 
## t2           3.360e-04  4.892e-04   0.687  0.49375   
## w2           6.389e-03  2.739e-03   2.333  0.02162 * 
## r2          -1.515e-05  6.781e-06  -2.235  0.02761 * 
## wr          -1.112e-04  1.557e-04  -0.714  0.47676   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4867 on 103 degrees of freedom
## Multiple R-squared:  0.7041, Adjusted R-squared:  0.684 
## F-statistic: 35.02 on 7 and 103 DF,  p-value: < 2.2e-16
model11 <- update(model10,~.-t2)
summary(model11)
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + temp + wind + 
##     w2 + r2 + wr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82031 -0.25479 -0.02779  0.33595  1.15024 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.989e-01  7.571e-01   0.527  0.59936    
## rad          8.996e-03  2.871e-03   3.133  0.00225 ** 
## temp         4.214e-02  6.246e-03   6.746 8.79e-10 ***
## wind        -1.816e-01  7.320e-02  -2.481  0.01472 *  
## w2           6.758e-03  2.679e-03   2.523  0.01316 *  
## r2          -1.477e-05  6.740e-06  -2.191  0.03071 *  
## wr          -1.368e-04  1.507e-04  -0.908  0.36615    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4855 on 104 degrees of freedom
## Multiple R-squared:  0.7028, Adjusted R-squared:  0.6856 
## F-statistic: 40.98 on 6 and 104 DF,  p-value: < 2.2e-16
model12 <- update(model11,~.-wr)
summary(model12)
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + temp + wind + 
##     w2 + r2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85551 -0.25578  0.00248  0.31349  1.16251 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.724e-01  6.350e-01   1.216 0.226543    
## rad          7.466e-03  2.323e-03   3.215 0.001736 ** 
## temp         4.193e-02  6.237e-03   6.723 9.52e-10 ***
## wind        -2.211e-01  5.874e-02  -3.765 0.000275 ***
## w2           7.390e-03  2.585e-03   2.859 0.005126 ** 
## r2          -1.470e-05  6.734e-06  -2.183 0.031246 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4851 on 105 degrees of freedom
## Multiple R-squared:  0.7004, Adjusted R-squared:  0.6861 
## F-statistic:  49.1 on 5 and 105 DF,  p-value: < 2.2e-16

Here is a summary table Table 2 to illustrate the model elements, significance score, and the adjusted R-squared value of each model for simplicity.

\[Table~2~Summary~for~Model~7~to~Model~12\]
Model_Name Model_element Significance_score Adj.R.squared
md 7 rad, temp, wind, t2, w2, r2, wr, tr, tw, wtr temp ·, w2 *, r2 * 0.6827
md 8 rad, temp, wind, t2, w2, r2, wr, tr, tw w2 *, r2 * 0.6797
md 9 rad, temp, wind, t2, w2, r2, wr, tw rad **, w2 *, r2 * 0.6825
md 10 rad, temp, wind, t2, w2, r2 wr rad **, wind *, w2 *, r2 * 0.6840
md 11 rad, temp, wind, r2, w2, wr rad **, temp ***, wind *, w2 *, r2 * 0.6856
md 12 rad, temp, wind, r2, w2 rad **, temp ***, wind ***, w2 **, r2 * 0.6861
par(mfrow=c(2,3))
plot(model12,which = 1:6, caption = LETTERS[1:6])

Pay attention that sometimes the step function is not reliable but some kind arbitrary when performing the model reduction based only on the AIC minimization. We try to test on the model1 and model 7, which evidence is summarized in the Table 3.

step(model1)
## Start:  AIC=649.8
## ozone.pollution$ozone ~ rad + temp + wind + t2 + w2 + r2 + wr + 
##     tr + tw + wtr
## 
##        Df Sum of Sq   RSS    AIC
## - wr    1      55.9 31798 648.00
## - rad   1      98.9 31841 648.15
## - wtr   1     136.4 31879 648.28
## - tr    1     397.1 32139 649.18
## <none>              31742 649.80
## - r2    1     628.0 32370 649.98
## - tw    1     959.4 32702 651.11
## - t2    1    1881.5 33624 654.19
## - temp  1    1985.4 33728 654.54
## - wind  1    2418.2 34160 655.95
## - w2    1    5486.1 37228 665.50
## 
## Step:  AIC=648
## ozone.pollution$ozone ~ rad + temp + wind + t2 + w2 + r2 + tr + 
##     tw + wtr
## 
##        Df Sum of Sq   RSS    AIC
## - rad   1      74.7 31873 646.26
## <none>              31798 648.00
## - r2    1     588.7 32387 648.03
## - wtr   1    1097.6 32896 649.76
## - tw    1    1246.7 33045 650.27
## - tr    1    1626.1 33424 651.53
## - temp  1    1929.5 33728 652.54
## - t2    1    1956.6 33755 652.63
## - wind  1    2878.3 34676 655.62
## - w2    1    5624.8 37423 664.08
## 
## Step:  AIC=646.26
## ozone.pollution$ozone ~ temp + wind + t2 + w2 + r2 + tr + tw + 
##     wtr
## 
##        Df Sum of Sq   RSS    AIC
## <none>              31873 646.26
## - r2    1     780.0 32653 646.94
## - wtr   1    1341.7 33215 648.83
## - tw    1    1372.8 33246 648.94
## - temp  1    2282.2 34155 651.93
## - t2    1    2565.4 34438 652.85
## - wind  1    3036.1 34909 654.36
## - tr    1    3295.2 35168 655.18
## - w2    1    5861.6 37734 663.00
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ temp + wind + t2 + w2 + 
##     r2 + tr + tw + wtr)
## 
## Coefficients:
## (Intercept)         temp         wind           t2           w2           r2  
##   5.569e+02   -1.098e+01   -3.029e+01    6.337e-02    6.244e-01   -3.826e-04  
##          tr           tw          wtr  
##   4.468e-03    2.060e-01   -1.685e-04
model1_step_op <- lm(ozone.pollution$ozone~temp+wind+t2+w2+r2+tr+tw+wtr)
summary(model1_step_op)
## 
## Call:
## lm(formula = ozone.pollution$ozone ~ temp + wind + t2 + w2 + 
##     r2 + tr + tw + wtr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.008 -11.439  -3.136   8.385  70.125 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.569e+02  1.934e+02   2.879  0.00486 ** 
## temp        -1.098e+01  4.062e+00  -2.703  0.00806 ** 
## wind        -3.029e+01  9.718e+00  -3.117  0.00237 ** 
## t2           6.337e-02  2.212e-02   2.865  0.00506 ** 
## w2           6.244e-01  1.442e-01   4.331 3.48e-05 ***
## r2          -3.826e-04  2.422e-04  -1.580  0.11722    
## tr           4.468e-03  1.376e-03   3.247  0.00158 ** 
## tw           2.060e-01  9.829e-02   2.096  0.03856 *  
## wtr         -1.685e-04  8.132e-05  -2.072  0.04078 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.68 on 102 degrees of freedom
## Multiple R-squared:  0.7383, Adjusted R-squared:  0.7178 
## F-statistic: 35.97 on 8 and 102 DF,  p-value: < 2.2e-16
step(model7)
## Start:  AIC=-148.98
## log(ozone.pollution$ozone) ~ rad + temp + wind + t2 + w2 + r2 + 
##     wr + tr + tw + wtr
## 
##        Df Sum of Sq    RSS     AIC
## - temp  1   0.01562 23.803 -150.91
## - wind  1   0.02221 23.809 -150.88
## - tw    1   0.06690 23.854 -150.67
## - t2    1   0.20130 23.988 -150.05
## - tr    1   0.35368 24.141 -149.34
## <none>              23.787 -148.98
## - wtr   1   0.46883 24.256 -148.82
## - wr    1   0.53121 24.318 -148.53
## - rad   1   0.78131 24.568 -147.40
## - r2    1   1.06316 24.850 -146.13
## - w2    1   1.12186 24.909 -145.87
## 
## Step:  AIC=-150.91
## log(ozone.pollution$ozone) ~ rad + wind + t2 + w2 + r2 + wr + 
##     tr + tw + wtr
## 
##        Df Sum of Sq    RSS     AIC
## - wind  1   0.00766 23.810 -152.88
## - tw    1   0.19954 24.002 -151.98
## - tr    1   0.37762 24.180 -151.16
## <none>              23.803 -150.91
## - wtr   1   0.52236 24.325 -150.50
## - wr    1   0.58077 24.383 -150.24
## - rad   1   0.80362 24.606 -149.22
## - t2    1   0.88137 24.684 -148.88
## - r2    1   1.04911 24.852 -148.12
## - w2    1   1.25401 25.057 -147.21
## 
## Step:  AIC=-152.88
## log(ozone.pollution$ozone) ~ rad + t2 + w2 + r2 + wr + tr + tw + 
##     wtr
## 
##        Df Sum of Sq    RSS     AIC
## <none>              23.810 -152.88
## - tr    1    0.6953 24.506 -151.68
## - wtr   1    0.9420 24.752 -150.57
## - wr    1    1.0036 24.814 -150.29
## - r2    1    1.0415 24.852 -150.12
## - rad   1    1.3407 25.151 -148.79
## - w2    1    1.6988 25.509 -147.22
## - tw    1    1.9332 25.744 -146.21
## - t2    1    4.0503 27.860 -137.44
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + t2 + w2 + r2 + 
##     wr + tr + tw + wtr)
## 
## Coefficients:
## (Intercept)          rad           t2           w2           r2           wr  
##   1.159e+00    2.959e-02    4.821e-04    7.906e-03   -1.448e-05   -2.199e-03  
##          tr           tw          wtr  
##  -2.792e-04   -3.080e-03    2.816e-05
model7_step_op <- lm(log(ozone.pollution$ozone) ~ rad + t2 + w2 + r2 + wr + tr + tw + wtr)
summary(model7_step_op)
## 
## Call:
## lm(formula = log(ozone.pollution$ozone) ~ rad + t2 + w2 + r2 + 
##     wr + tr + tw + wtr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.88083 -0.25112 -0.01786  0.28580  1.12695 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.159e+00  5.547e-01   2.090  0.03915 *  
## rad          2.959e-02  1.235e-02   2.397  0.01837 *  
## t2           4.821e-04  1.157e-04   4.165 6.52e-05 ***
## w2           7.906e-03  2.931e-03   2.698  0.00817 ** 
## r2          -1.448e-05  6.857e-06  -2.112  0.03710 *  
## wr          -2.199e-03  1.060e-03  -2.073  0.04065 *  
## tr          -2.792e-04  1.617e-04  -1.726  0.08740 .  
## tw          -3.080e-03  1.070e-03  -2.878  0.00488 ** 
## wtr          2.816e-05  1.402e-05   2.009  0.04719 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4831 on 102 degrees of freedom
## Multiple R-squared:  0.7113, Adjusted R-squared:  0.6886 
## F-statistic: 31.41 on 8 and 102 DF,  p-value: < 2.2e-16
\[Table~3~Comparison~of~optimal~model~reduction~on~Model1~and~Model7\]
Model_Name Model_element Significance_score Adj.R.squared
omd 1 rad, temp, wind, t2, w2 (Intercept) **, rad **, temp *, wind ***, t2 *, w2 *** 0.6994
omd 1-auto temp, wind, t2, w2, r2, tr, tw, wtr (Intercept) **, temp **, wind **, t2 **, w2 ***, tr **, tw *, wtr * 0.7178
omd 7 rad, temp, wind, t2, w2, r2 wr rad **, temp ***, wind ***, w2 **, r2 * 0.6861
omd 7-auto rad, t2, w2, r2, wr, tr, tw, wtr (Intercept) *, rad *, t2 ***, w2 **, r2 *, wr *, tr ·, tw **, wtr * 0.6886

It is found that for omd1-auto, the step function produces a model without the main explanatory variables of rad and remains some two-way interactions and a three-way interaction. For omd7-auto, the step function produces a model without the main explanatory variables of temp, wind and remains some two-way interactions. Therefore, we should be cautious not to blindly use the step function.

In addition, when we are choosing the optimal model by model reduction, normally the result obtained from the criteria of determination can not satisfy the significance and adjusted R squared at the same time. So we should balance between these 2 criteria of determination based on what we need in certain context.

  1. Evaluation of coefficient estimates by Bootstrap and Jackknife

Bootstrapping

Since we can not directly plot a 4D scatter plot for the multiple linear regression, so calculation both the confidence interval and prediction interval seems challenging for us to perform. However, we can still use the bootstrap and jackknife technique learnt in the week-9-lecture to evaluate.

Remember the re-sample and re-arrange technique introduced in the lecture, these methods does not seems to be useful in this context since the size of computing randomness in the light of the response variable (Y) and so so many explanatory variable (multiple Xs) is unrealistic and computationally intensive.

ozone <-  structure(ozone.pollution)
lines <- nrow(ozone)
lines
## [1] 111

To solve the problem, we randomly pick up same number of target row data (in this case: 111) to build the multiple linear model for many many times by relicate function (in this case: 5000), which means that the selected target can be repetitive and be considered more than 1 time. For example, assuming that we want to select 5 target row data, the sample function is applied to roll the dice 5 times, which reports “1,3,1,4,2”. Therefore, we choose these row data to build the random model via bootstrapping.

In our case, the bootstrapping results for 5000 times is stored in the reps objects and we use the boxplot function to visualize. If you are interested in the distribution of coefficients estimates as mentioned in the lecture, histograms for each individual coefficients estimates is plotted as follows.

reps <- replicate(5000, lm(log(ozone) ~ rad + temp + wind + w2 + r2, 
                           data=ozone[sample(lines, replace = TRUE),])$coefficients)

Jackknife

We can also perform the Jackknife on the model by deleting one data row at a time to evaluate. To implement, the for loop is used to run 111 times and store the information of coefficient estimates into an empty data frame called jack.reg. The following codes also demonstrate the nested for loop that can generate the same result.

jack.reg <- data.frame(intercept=0,rad=0,temp=0,wind=0,w2=0,r2=0)

for (i in 1:111) {
  model <-  lm(log(ozone) ~ rad + temp + wind +I(wind^2) + I(rad^2),  
               data=ozone.pollution[-i,])
  jack.reg[i,1] <-coef(model)[1]
  jack.reg[i,2] <-coef(model)[2]
  jack.reg[i,3] <-coef(model)[3]
  jack.reg[i,4] <-coef(model)[4]
  jack.reg[i,5] <-coef(model)[5]
  jack.reg[i,6] <-coef(model)[6]
}

for (i in 1:111) {
  for(j in 1:6){
    model <-  lm(log(ozone) ~ rad + temp + wind +I(wind^2) + I(rad^2),  
                 data=ozone.pollution[-i,])
    jack.reg[i,j]<-coef(model)[j]
  }
}
summary(jack.reg)
##    intercept           rad                temp              wind        
##  Min.   :0.5529   Min.   :0.005630   Min.   :0.03908   Min.   :-0.2478  
##  1st Qu.:0.7515   1st Qu.:0.007399   1st Qu.:0.04171   1st Qu.:-0.2226  
##  Median :0.7717   Median :0.007467   Median :0.04191   Median :-0.2211  
##  Mean   :0.7724   Mean   :0.007465   Mean   :0.04193   Mean   :-0.2211  
##  3rd Qu.:0.7917   3rd Qu.:0.007551   3rd Qu.:0.04209   3rd Qu.:-0.2197  
##  Max.   :1.2001   Max.   :0.008531   Max.   :0.04454   Max.   :-0.1967  
##        w2                 r2            
##  Min.   :0.006378   Min.   :-1.730e-05  
##  1st Qu.:0.007316   1st Qu.:-1.496e-05  
##  Median :0.007391   Median :-1.471e-05  
##  Mean   :0.007389   Mean   :-1.470e-05  
##  3rd Qu.:0.007448   3rd Qu.:-1.449e-05  
##  Max.   :0.008460   Max.   :-1.013e-05

Then, the result of jackknife is visualized using boxplot and histogram, which show the distribution of the coefficients of each explanatory variables.

As a result, the histogrames show that the model is greatly influenced by the omission of one certain point, which is extracted based on its Cook’s Distance. We should find it!Pay attention that this time the Cook’s Distance is at the 9th column of the list called influence.measures(model12)$infmat.

model12 <- lm(log(ozone.pollution$ozone) ~ rad + temp + wind + w2+r2)
which(influence.measures(model12)$infmat[,9]
== max(influence.measures(model12)$infmat[,9]))
## 17 
## 17

Then a new model model12_new is built for comparison of model performance.

model12_new <- lm(log(ozone) ~ rad + temp + wind + I(wind^2) + I(rad^2),
                  data=ozone.pollution[-17,])
summary(model12_new)
## 
## Call:
## lm(formula = log(ozone) ~ rad + temp + wind + I(wind^2) + I(rad^2), 
##     data = ozone.pollution[-17, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0339 -0.2767 -0.0042  0.3597  1.0927 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.200e+00  5.946e-01   2.019   0.0461 *  
## rad          5.630e-03  2.186e-03   2.576   0.0114 *  
## temp         3.908e-02  5.797e-03   6.741 9.02e-10 ***
## wind        -2.213e-01  5.425e-02  -4.079 8.87e-05 ***
## I(wind^2)    7.032e-03  2.389e-03   2.944   0.0040 ** 
## I(rad^2)    -1.013e-05  6.307e-06  -1.606   0.1112    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.448 on 104 degrees of freedom
## Multiple R-squared:  0.7047, Adjusted R-squared:  0.6905 
## F-statistic: 49.64 on 5 and 104 DF,  p-value: < 2.2e-16

Table 4 summarize the comparison between model12 and model12_new.

\[Table~4~Comparison~of~model12~and~model12~new\]
Model_Name Model_element Significance_score Adj.R.squared
md 12 rad, temp, wind, w2, r2 rad **, temp ***, wind ***, w2 **, r2 * 0.6861
md 12 new same except the 17th row Intercept *, rad *, temp ***, wind ***, w2 ** 0.6905

After removing the 17th row data, it seems that the model has improved based on the increased Adjusted R squared value. We can also plot the diagnosis plot for model12_newto compare it with the one for model 12.

Before regression diagnosis: model12

par(mfrow=c(2,3))
plot(model12,which = 1:6, caption = LETTERS[1:6])

After regression diagnosis: model12_new

par(mfrow=c(2,3))
plot(model12_new, which = 1:6, caption = LETTERS[1:6])

We can also conduct a pair-wise comparison to effectively visualize the difference between these two models.

par(mfcol=c(2,2))
  for(i in 2){
    plot(model12,which = 1:i, caption = LETTERS[1:i])
    plot(model12_new, which = 1:i, caption = LETTERS[1:i])
  }

par(mfcol=c(2,2))
  for(i in 4){
    plot(model12,which = 3:i, caption = LETTERS[3:i])
    plot(model12_new, which = 3:i, caption = LETTERS[3:i])
  }

par(mfcol=c(2,2))
  for(i in 6){
    plot(model12,which = 5:i, caption = LETTERS[5:i])
    plot(model12_new, which = 5:i, caption = LETTERS[5:i])
  }