Forums | OpenR

OpenR: R & Statistics /
Question about assignment


Luning Yang's profile picture
Posts: 6

06 April 2022, 10:38 PM

Dear classmates and Professor Zhao:

          May I ask can the interaction occurs between numerical predictor variables? Can you give me some ideas? Because in the previous learning, we always discuss the variable interaction between a numerical independent variable and a categorical independent variable, such as Root and Graze in week 3.  But how about the condition for example we want to predict a person's longevity according to his age and annual income? Assume that it is true that age and annual income may have some interactions, then how to deal with it? Thank you for your help. 

 

 

Huanzhi Gong's profile picture
Posts: 11

06 April 2022, 10:56 PM

I guess you can find the answer in the two videos " Multicollinearity" and " How to deal with collinearity in multiple linear regression in R - R for Data Science" which are uploaded in the "watching" folder of box. 

Peng Zhao's profile picture
Posts: 128

07 April 2022, 10:17 AM

Correct. See the videos for collinearity. You can also find it in "readings/A Course in Statistics with R. Chapter 12.pdf" and see Chapter 12.6.

Luning Yang's profile picture
Posts: 6

07 April 2022, 5:39 PM

Thank you, I have watched the videos and read the materials you uploaded. But I just want to make sure that:

1. Whether we should accept a model with the least AIC or the model with the most Multiple R-squared?

2. Is it correct to introduce some interactions (such as A: B) in the modeling process of linear regression even if the variables are continuous numeric without any serious multicollinearity.

3. Is it true that as long as we have well explained the model selection,  the decision is always reasonable? 

I look forward to your feedback~

Peng Zhao's profile picture
Posts: 128

07 April 2022, 9:13 PM

1. Do not use multiple R-squared for comparing multiple linear models. When you add predictors to your model, the multiple R-squared will always increase, as a predictor will always explain some portion of the variance. Please use the adjusted R-squared, which controls against this increase, and adds penalties for the number of predictors in the model. Therefore it shows a balance between the most parsimonious model, and the best fitting model. See <https://www.statology.org/multiple-r-vs-r-squared/> for more details.

Normally the decision of choosing a model based on the adjusted R-squared agrees well with that based on AIC.

Peng Zhao's profile picture
Posts: 128

07 April 2022, 9:21 PM

3. "as long as we have well explained the model selection,  the decision is always reasonable? " It depends on how "well", and I cannot promise "always". Anyway, try your best to give the explanations.

Peng Zhao's profile picture
Posts: 128

07 April 2022, 9:32 PM

2. Correct.

7 results