Wrapper methods are performed by taking subsets and training learning algorithms. Based on the results of the training, we can select the best features for our model. And, as you may have guessed, these methods are computationally very expensive. The Wrapper methodology considers the selection of feature sets as a search problem, where different combinations are prepared, evaluated, and compared to other combinations.
A predictive model is used to evaluate a combination of features and assign model performance scores.
A wrapper method will perform the following:
Wrapper methods start by searching through different subsets of features, then creating a model with each. It follows a greedy search approach by evaluating all the possible combinations of features against the evaluation criterion. The evaluation criterion is simply the performance measure that depends on the type of problem, these procedures are normally built after the concept of the Greedy Search technique (or algorithm). They evaluate these models to select the best one, and afterward, they iterate to define a new subset based on the previous best subset.
Deciding when to stop this search comes down to monitoring whether the performance doesn’t increase or decrease beyond a certain threshold, depending on what method you’re using. These thresholds are often arbitrary and defined by the user.
I’ll discuss these procedures in more detail for specific wrapper methods.
The most commonly used techniques under wrapper methods are:
- Forward selection
- Backward elimination
- Recursive elimination(Stepwise Selection)