last-class-for-inf-0615
-
open-set vs closed-set
- Open-set tries to first say if the input is something related with the classes it saw in the training
- closed-set tries to fit the input in one of the classes, even if does not make sense (e.g. tries to understand a letter in a digits-classifier)
-
Preparing and manipulating the data is more important than the algorithm we choose to use in Machine Learning.
Pipeline steps
- Get Data
- Always split the training data from validation
- Use the validation to “test your model”
- Validação cruzada
- Clean, Prepare & Manipulate Data
- Inspect the data
- Inpesct features without anotations
- Cast discreat features
- Normalise the data
- Balance the classes
- Train Model
- Pick the algorithm
- Define the parameters
- Train the model
- Test Data
- Metrics
- Compare the result of Train vs Validation
- Check for Overfitting & Underfitting
- Improve
- See if you should change the complexity of the model
Random notes
- online learning vs batch learning
- k-fold cross-validation
- You train k models and check the best or you can use ensemble-learning to use all k models