Download the data set CO2 from the week6 folder. It can be analyzed by the asymptotic regression model with an offset. Use proper self-start function to conduct the regression and work out its R^2. Plot your predicted non-linear regression graph.
Input the data and draw the scatter plot.
CO2 <- read.table('CO2.txt',header=T)
attach(CO2)
names(CO2)
## [1] "Plant" "Type" "Treatment" "conc" "uptake"
plot(CO2$uptake~CO2$conc)
Use nls & SSasympOff functions to build the model.
model1 <- nls(uptake~SSasympOff(conc,a,b,c))
summary(model1)
##
## Formula: uptake ~ SSasympOff(conc, a, b, c)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 38.1398 0.9164 41.620 1.99e-06 ***
## b -4.3806 0.2042 -21.457 2.79e-05 ***
## c 51.2232 11.6698 4.389 0.0118 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.663 on 4 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 1.665e-06Draw the fitting line.
plot(CO2$uptake~CO2$conc)
xv <- seq(95,1000,5)
yv <- predict(model1,list(conc=xv))
lines(xv,yv)
Calculate the R2.
sse <- as.vector((summary(model1)[[3]])^2*4)
null <- lm(CO2$uptake~1)
sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
R_sq = (sst-sse)/sst
R_sq
## [1] 0.9726904In conclusion, the p-values of three parameters are both less than 0.05, which are significant. The R2 is 0.9726904 means this predicted equation can explain 97.27% of the total variation in the uptake. It is a fairly good regression.
Use R to find the data set Puromycin. It has 23 rows and 3 columns of the reaction velocity versus substrate concentration in an enzymatic reaction involving untreated cells or cells treated with Puromycin. In this Exercise, you only need those data untreated, so reserve untreated data and perform the non-linear regression.
- Draw the scatter plot first.
- Determine which model you should use.
- Apply self-starting function.
- Calculate the R^2^.
There may be several models fit for this data set, please follow above steps and make your judgement which model is more suitable.
Input the data and draw the scatter plot.
puroUntrt <- data.frame(Puromycin[Puromycin$state=='untreated',])
plot(puroUntrt$rate~puroUntrt$conc)
Use nls & self-startong functions to build the model. Draw the fitting line. Calculate the R2.
2.1 Michaelis-Menten model
model2 <- nls(rate~SSmicmen(conc,a,b),data=puroUntrt)
summary(model2)
##
## Formula: rate ~ SSmicmen(conc, a, b)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.603e+02 6.480e+00 24.734 1.38e-09 ***
## b 4.771e-02 7.782e-03 6.131 0.000173 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.773 on 9 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.942e-06
plot(puroUntrt$rate~puroUntrt$conc)
xv <- seq(0.02,1.1,0.01)
yv <- predict(model2,list(conc=xv))
lines(xv,yv,col='green')
sse <- as.vector((summary(model2)[[3]])^2*9)
null <- lm(puroUntrt$rate~1)
sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
R_sq = (sst-sse)/sst
R_sq
## [1] 0.9355724
2.2 SSasymp function
model3 <- nls(rate~SSasymp(conc,a,b,c),data=puroUntrt)
summary(model3)
##
## Formula: rate ~ SSasymp(conc, a, b, c)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 155.2648 5.0991 30.449 1.47e-09 ***
## b 47.4279 7.0375 6.739 0.000147 ***
## c 1.9050 0.1781 10.694 5.13e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.711 on 8 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.506e-06
plot(puroUntrt$rate~puroUntrt$conc)
xv <- seq(0.02,1.1,0.01)
yv <- predict(model3,list(conc=xv))
lines(xv,yv,col='blue',lty=2)
sse <- as.vector((summary(model3)[[3]])^2*8)
null <- lm(puroUntrt$rate~1)
sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
R_sq = (sst-sse)/sst
R_sq
## [1] 0.9643497
2.3 SSasympOrig function
model4 <- nls(rate~SSasympOrig(conc,a,b),data=puroUntrt)
summary(model4)
##
## Formula: rate ~ SSasympOrig(conc, a, b)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 144.6649 7.5583 19.14 1.34e-08 ***
## b 2.7009 0.1784 15.14 1.04e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.4 on 9 degrees of freedom
##
## Number of iterations to convergence: 9
## Achieved convergence tolerance: 4.275e-06
plot(puroUntrt$rate~puroUntrt$conc)
xv <- seq(0.02,1.1,0.01)
yv <- predict(model4,list(conc=xv))
lines(xv,yv,col='red')
sse <- as.vector((summary(model4)[[3]])^2*9)
null <- lm(puroUntrt$rate~1)
sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
R_sq = (sst-sse)/sst
R_sq
## [1] 0.8400122
2.4 SSasympOff function
model5 <- nls(rate~SSasympOff(conc,a,b,c),data=puroUntrt)
summary(model5)
##
## Formula: rate ~ SSasympOff(conc, a, b, c)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 155.26476 5.09912 30.449 1.47e-09 ***
## b 1.90499 0.17813 10.694 5.13e-06 ***
## c -0.05425 0.01699 -3.193 0.0128 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.711 on 8 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.511e-06
plot(puroUntrt$rate~puroUntrt$conc)
xv <- seq(0.02,1.1,0.01)
yv <- predict(model5,list(conc=xv))
lines(xv,yv,col='orange')
sse <- as.vector((summary(model5)[[3]])^2*8)
null <- lm(puroUntrt$rate~1)
sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
R_sq = (sst-sse)/sst
R_sq
## [1] 0.9643497Compare these four models.
plot(puroUntrt$rate~puroUntrt$conc)
model2 <- nls(rate~SSmicmen(conc,a,b),data=puroUntrt)
summary(model2)
##
## Formula: rate ~ SSmicmen(conc, a, b)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.603e+02 6.480e+00 24.734 1.38e-09 ***
## b 4.771e-02 7.782e-03 6.131 0.000173 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.773 on 9 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.942e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model2,list(conc=xv))
lines(xv,yv,col='green')
model3 <- nls(rate~SSasymp(conc,a,b,c),data=puroUntrt)
summary(model3)
##
## Formula: rate ~ SSasymp(conc, a, b, c)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 155.2648 5.0991 30.449 1.47e-09 ***
## b 47.4279 7.0375 6.739 0.000147 ***
## c 1.9050 0.1781 10.694 5.13e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.711 on 8 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.506e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model3,list(conc=xv))
lines(xv,yv,col='blue',lty=2)
model4 <- nls(rate~SSasympOrig(conc,a,b),data=puroUntrt)
summary(model4)
##
## Formula: rate ~ SSasympOrig(conc, a, b)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 144.6649 7.5583 19.14 1.34e-08 ***
## b 2.7009 0.1784 15.14 1.04e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.4 on 9 degrees of freedom
##
## Number of iterations to convergence: 9
## Achieved convergence tolerance: 4.275e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model4,list(conc=xv))
lines(xv,yv,col='red')
model5 <- nls(rate~SSasympOff(conc,a,b,c),data=puroUntrt)
summary(model5)
##
## Formula: rate ~ SSasympOff(conc, a, b, c)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 155.26476 5.09912 30.449 1.47e-09 ***
## b 1.90499 0.17813 10.694 5.13e-06 ***
## c -0.05425 0.01699 -3.193 0.0128 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.711 on 8 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.511e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model5,list(conc=xv))
lines(xv,yv,col='orange')
legend("bottomright", title="Drug Type", c("model2 SSmicmen R-sq=93.56%","model3 SSasymp R-sq=96.43%", "model4 SSasympOrig R-sq=84.00%","model5 SSasympOff R-sq=96.43%"), lty=c(1,2,1,1), col=c("green", "blue", "red", "orange"))
In conclusion, the model 3 and model 5 have significant estimated parameter values and highest R2, these two model seems to be better to describe the data set puroUntrt. However, it is important to bind with the specific situation, your purpose and some other factors to consider which is better model. As Dr. Zhao mentioned, in biology, the Michaelis-Menten model makes sense in the enzymatic reaction, whereas other models are perhaps meaningless. So, the data background and the object to which the regression model is applied can also help you determine which model you will use. But purely from the data, in this exercise, the model 3 and model 5 are better.