Regarding
-
Combine different classifiers/regressors to get a better result than the individual classifiers/regressors
-
voto ponderado vs voto simples (Soft vote vs Hard vote)
- Can have a merge-classifier as well
-
1 input set and k-classifiers
Bagging & Pasting
- https://en.wikipedia.org/wiki/Bootstrap_aggregating
- Better to avoid using strong classifiers (like SVM)
- Can run in paralel
- Bagging normally uses trees by default
- Bagging uses the same algorithm
- Bagging vs Pasting
- Pasting uses smaller sampling because it does not repeat the examples
- Bagging uses repeation and can generate sampling from the same size as the original train set
- Can be used in cases like:
- Let’s say that there’s one case that we cannot determine yet very well.
- We can use Bagging to duplicate that case a couple of times and be able to better classify it (btw, the sampling is random, so this is an example that can happen, but it’s not something we are sure)
- Let’s say that there’s one case that we cannot determine yet very well.
- Can be used in cases like:
Boosting
- https://en.wikipedia.org/wiki/Boosting_(machine_learning)
- Must run in sequence
- The idea is that the next classifier will “correct” the errors of the precedents
- AdaBoost and Gradient Boost
Adaboost
- Combine a sequence of weak classifiers by increasing the weight of each element the classifer got wrong
Gradient Boosting
- Based on Adaboost
- AdaBoost changes the weights and considers every example, while Gradient Boosting only uses the examples that it got wrong
- Recently we have XGBoost (very very good and can be used as an initial approach to hard problems)
Stacking
- Uses a blender or meta learner to “learn” from the previous classifiers/regressors
Multi-layer Stacking Ensemble
- We start to see something like deep-learning, here we have multiple blenders
Random forest
- Classification and Regression Tree (CART) algorithm
- We can use different classifiers instead of just trees
- Each tree have a random sub-set of the examples and a random sub-set of features
- You can use it to see what features are more important (e.g. every tree that uses feature X has 90% of right values)
- We can initially try something like
n * square_root(features)
trees
Other stuff
- Deep Forest