https://en.wikipedia.org/wiki/Decision_tree
- Used a lot in the medical field
- Very good for categorical fetures not so good for continuous features
- We want the shortest tree to avoid overfitting (Occam’s razor)
How it works?
- Need to choose the best feature for the root node
- Need to use entropy metric
- 1 is max-entropy (e.g. 3+/3-)
- 0 means that we know for sure the output (e.g. 3+/0)
- Need to use entropy metric
Cost function
- Gain(S,A)
- GainRation
- GainCost
How to avoid overfiting?
- Stop the algorithm when the splits are not relevant
- post-pruning