Selection of Logistic Regression Models in R Using a Stepwise Approach

The addition of “+disp” in the second step by R is unnecessary since the results, including AIC values and model selection values, are identical to those obtained from backward selection. Nevertheless, relying on stepwise selection to identify a model is not recommended.

Question:

In R, I’m working on a
logistic regression
model that involves selecting variables from a pool of 80 options. To streamline the process, I’m utilizing the step function for automated variable selection.

Although I am proficient in utilizing the function and locating the model, I encounter an issue when reviewing the final model. Specifically, upon examining the fourth column in $coef using the summary function, which corresponds to the Wald Test, I discover that some variables selected by the step function are insignificant. This poses a challenge since I require all variables in the model to hold significance.

Is there a method or function available to obtain the optimal model using either
Aic
or BIC techniques, while also ensuring that all the coefficients are significant? Thank you.


Solution:

It is not recommended to use
Stepwise
selection when searching for a model. This can result in invalid hypothesis tests and poor predictive accuracy when testing with out-of-sample data, due to overfitting. For further understanding on this topic, refer to my response provided in
algorithms for automatic model selection
.

The

stepAIC

method chooses a model according to AIC and not by assessing if individual coefficients exceed a certain threshold, which is the approach taken by SPSS. The AIC’s alpha value is approximately 0.157, which is different from the conventional 0.05. For additional details, please refer to @Glen_b’s responses on “Critical p-value” in R here:
stepwise regression
.

Frequently Asked Questions

Posted in Uncategorized