Exercise 1

Download the data set CO2 from the week6 folder. It can be analyzed by the asymptotic regression model with an offset. Use proper self-start function to conduct the regression and work out its R^2. Plot your predicted non-linear regression graph.

  1. Input the data and draw the scatter plot.

    CO2 <- read.table('CO2.txt',header=T)
    attach(CO2)
    names(CO2)
    ## [1] "Plant"     "Type"      "Treatment" "conc"      "uptake"
    plot(CO2$uptake~CO2$conc)

  2. Use nls & SSasympOff functions to build the model.

    model1 <- nls(uptake~SSasympOff(conc,a,b,c))
    summary(model1)
    ## 
    ## Formula: uptake ~ SSasympOff(conc, a, b, c)
    ## 
    ## Parameters:
    ##   Estimate Std. Error t value Pr(>|t|)    
    ## a  38.1398     0.9164  41.620 1.99e-06 ***
    ## b  -4.3806     0.2042 -21.457 2.79e-05 ***
    ## c  51.2232    11.6698   4.389   0.0118 *  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 1.663 on 4 degrees of freedom
    ## 
    ## Number of iterations to convergence: 0 
    ## Achieved convergence tolerance: 1.665e-06
  3. Draw the fitting line.

    plot(CO2$uptake~CO2$conc)
    xv <- seq(95,1000,5)
    yv <- predict(model1,list(conc=xv))
    lines(xv,yv)

  4. Calculate the R2.

    sse <- as.vector((summary(model1)[[3]])^2*4)
    null <- lm(CO2$uptake~1)
    sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
    R_sq = (sst-sse)/sst
    R_sq
    ## [1] 0.9726904

In conclusion, the p-values of three parameters are both less than 0.05, which are significant. The R2 is 0.9726904 means this predicted equation can explain 97.27% of the total variation in the uptake. It is a fairly good regression.

Exercise 2

Use R to find the data set Puromycin. It has 23 rows and 3 columns of the reaction velocity versus substrate concentration in an enzymatic reaction involving untreated cells or cells treated with Puromycin. In this Exercise, you only need those data untreated, so reserve untreated data and perform the non-linear regression.

- Draw the scatter plot first.

- Determine which model you should use.

- Apply self-starting function.

- Calculate the R^2^.

There may be several models fit for this data set, please follow above steps and make your judgement which model is more suitable.

  1. Input the data and draw the scatter plot.

    puroUntrt <- data.frame(Puromycin[Puromycin$state=='untreated',])
    plot(puroUntrt$rate~puroUntrt$conc)

  2. Use nls & self-startong functions to build the model. Draw the fitting line. Calculate the R2.

    2.1 Michaelis-Menten model

    model2 <- nls(rate~SSmicmen(conc,a,b),data=puroUntrt)
    summary(model2)
    ## 
    ## Formula: rate ~ SSmicmen(conc, a, b)
    ## 
    ## Parameters:
    ##    Estimate Std. Error t value Pr(>|t|)    
    ## a 1.603e+02  6.480e+00  24.734 1.38e-09 ***
    ## b 4.771e-02  7.782e-03   6.131 0.000173 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 9.773 on 9 degrees of freedom
    ## 
    ## Number of iterations to convergence: 5 
    ## Achieved convergence tolerance: 3.942e-06
    plot(puroUntrt$rate~puroUntrt$conc)
    xv <- seq(0.02,1.1,0.01)
    yv <- predict(model2,list(conc=xv))
    lines(xv,yv,col='green')

    sse <- as.vector((summary(model2)[[3]])^2*9)
    null <- lm(puroUntrt$rate~1)
    sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
    R_sq = (sst-sse)/sst
    R_sq
    ## [1] 0.9355724

    2.2 SSasymp function

    model3 <- nls(rate~SSasymp(conc,a,b,c),data=puroUntrt)
    summary(model3)
    ## 
    ## Formula: rate ~ SSasymp(conc, a, b, c)
    ## 
    ## Parameters:
    ##   Estimate Std. Error t value Pr(>|t|)    
    ## a 155.2648     5.0991  30.449 1.47e-09 ***
    ## b  47.4279     7.0375   6.739 0.000147 ***
    ## c   1.9050     0.1781  10.694 5.13e-06 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 7.711 on 8 degrees of freedom
    ## 
    ## Number of iterations to convergence: 5 
    ## Achieved convergence tolerance: 3.506e-06
    plot(puroUntrt$rate~puroUntrt$conc)
    xv <- seq(0.02,1.1,0.01)
    yv <- predict(model3,list(conc=xv))
    lines(xv,yv,col='blue',lty=2)

    sse <- as.vector((summary(model3)[[3]])^2*8)
    null <- lm(puroUntrt$rate~1)
    sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
    R_sq = (sst-sse)/sst
    R_sq
    ## [1] 0.9643497

    2.3 SSasympOrig function

    model4 <- nls(rate~SSasympOrig(conc,a,b),data=puroUntrt)
    summary(model4)
    ## 
    ## Formula: rate ~ SSasympOrig(conc, a, b)
    ## 
    ## Parameters:
    ##   Estimate Std. Error t value Pr(>|t|)    
    ## a 144.6649     7.5583   19.14 1.34e-08 ***
    ## b   2.7009     0.1784   15.14 1.04e-07 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 15.4 on 9 degrees of freedom
    ## 
    ## Number of iterations to convergence: 9 
    ## Achieved convergence tolerance: 4.275e-06
    plot(puroUntrt$rate~puroUntrt$conc)
    xv <- seq(0.02,1.1,0.01)
    yv <- predict(model4,list(conc=xv))
    lines(xv,yv,col='red')

    sse <- as.vector((summary(model4)[[3]])^2*9)
    null <- lm(puroUntrt$rate~1)
    sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
    R_sq = (sst-sse)/sst
    R_sq
    ## [1] 0.8400122

    2.4 SSasympOff function

    model5 <- nls(rate~SSasympOff(conc,a,b,c),data=puroUntrt)
    summary(model5)
    ## 
    ## Formula: rate ~ SSasympOff(conc, a, b, c)
    ## 
    ## Parameters:
    ##    Estimate Std. Error t value Pr(>|t|)    
    ## a 155.26476    5.09912  30.449 1.47e-09 ***
    ## b   1.90499    0.17813  10.694 5.13e-06 ***
    ## c  -0.05425    0.01699  -3.193   0.0128 *  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 7.711 on 8 degrees of freedom
    ## 
    ## Number of iterations to convergence: 5 
    ## Achieved convergence tolerance: 3.511e-06
    plot(puroUntrt$rate~puroUntrt$conc)
    xv <- seq(0.02,1.1,0.01)
    yv <- predict(model5,list(conc=xv))
    lines(xv,yv,col='orange')

    sse <- as.vector((summary(model5)[[3]])^2*8)
    null <- lm(puroUntrt$rate~1)
    sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
    R_sq = (sst-sse)/sst
    R_sq
    ## [1] 0.9643497

Compare these four models.

plot(puroUntrt$rate~puroUntrt$conc)

model2 <- nls(rate~SSmicmen(conc,a,b),data=puroUntrt)
summary(model2)
## 
## Formula: rate ~ SSmicmen(conc, a, b)
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## a 1.603e+02  6.480e+00  24.734 1.38e-09 ***
## b 4.771e-02  7.782e-03   6.131 0.000173 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.773 on 9 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 3.942e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model2,list(conc=xv))
lines(xv,yv,col='green')

model3 <- nls(rate~SSasymp(conc,a,b,c),data=puroUntrt)
summary(model3)
## 
## Formula: rate ~ SSasymp(conc, a, b, c)
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a 155.2648     5.0991  30.449 1.47e-09 ***
## b  47.4279     7.0375   6.739 0.000147 ***
## c   1.9050     0.1781  10.694 5.13e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.711 on 8 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 3.506e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model3,list(conc=xv))
lines(xv,yv,col='blue',lty=2)

model4 <- nls(rate~SSasympOrig(conc,a,b),data=puroUntrt)
summary(model4)
## 
## Formula: rate ~ SSasympOrig(conc, a, b)
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a 144.6649     7.5583   19.14 1.34e-08 ***
## b   2.7009     0.1784   15.14 1.04e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.4 on 9 degrees of freedom
## 
## Number of iterations to convergence: 9 
## Achieved convergence tolerance: 4.275e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model4,list(conc=xv))
lines(xv,yv,col='red')

model5 <- nls(rate~SSasympOff(conc,a,b,c),data=puroUntrt)
summary(model5)
## 
## Formula: rate ~ SSasympOff(conc, a, b, c)
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## a 155.26476    5.09912  30.449 1.47e-09 ***
## b   1.90499    0.17813  10.694 5.13e-06 ***
## c  -0.05425    0.01699  -3.193   0.0128 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.711 on 8 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 3.511e-06
xv <- seq(0.02,1.1,0.01)
yv <- predict(model5,list(conc=xv))
lines(xv,yv,col='orange')

legend("bottomright", title="Drug Type", c("model2 SSmicmen   R-sq=93.56%","model3 SSasymp    R-sq=96.43%", "model4 SSasympOrig   R-sq=84.00%","model5 SSasympOff   R-sq=96.43%"), lty=c(1,2,1,1), col=c("green", "blue", "red", "orange"))

In conclusion, the model 3 and model 5 have significant estimated parameter values and highest R2, these two model seems to be better to describe the data set puroUntrt. However, it is important to bind with the specific situation, your purpose and some other factors to consider which is better model. As Dr. Zhao mentioned, in biology, the Michaelis-Menten model makes sense in the enzymatic reaction, whereas other models are perhaps meaningless. So, the data background and the object to which the regression model is applied can also help you determine which model you will use. But purely from the data, in this exercise, the model 3 and model 5 are better.