Selection of Logistic Regression Models in R Using a Stepwise Approach

The addition of “+disp” in the second step by R is unnecessary since the results, including AIC values and model selection values, are identical to those obtained from backward selection. Nevertheless, relying on stepwise selection to identify a model is not recommended.


In R, I’m working on a
logistic regression
model that involves selecting variables from a pool of 80 options. To streamline the process, I’m utilizing the step function for automated variable selection.

Although I am proficient in utilizing the function and locating the model, I encounter an issue when reviewing the final model. Specifically, upon examining the fourth column in $coef using the summary function, which corresponds to the Wald Test, I discover that some variables selected by the step function are insignificant. This poses a challenge since I require all variables in the model to hold significance.

Is there a method or function available to obtain the optimal model using either
or BIC techniques, while also ensuring that all the coefficients are significant? Thank you.


It is not recommended to use
selection when searching for a model. This can result in invalid hypothesis tests and poor predictive accuracy when testing with out-of-sample data, due to overfitting. For further understanding on this topic, refer to my response provided in
algorithms for automatic model selection



method chooses a model according to AIC and not by assessing if individual coefficients exceed a certain threshold, which is the approach taken by SPSS. The AIC’s alpha value is approximately 0.157, which is different from the conventional 0.05. For additional details, please refer to @Glen_b’s responses on “Critical p-value” in R here:
stepwise regression

Frequently Asked Questions

Posted in Uncategorized