-
Overfitting
-
Underfitting
Type of problems
(Read Weapons of Math Destruction)
- Bad Data
- Bad Algorithm
Cost function
- Quadradic is very good because it penalises more bigger errors
Linear regression
- https://en.wikipedia.org/wiki/Linear_regression
- Nomal equation
- Can find faster the response, but because of the inverse matrix computation (O(n^3)) it it’s very expensive to compute the result for big sets
- Gradient descent
- Alpha is not easy to pick
- Needs a lot of interactions
Metrics for regression problems
- Mean Squared error (MSE);
- Mean Absolute Error (MAE); # Good for humans to see what’s the error
- R^2 # more close to zero = better
Features and Polynomial Regression
- https://en.wikipedia.org/wiki/Polynomial_regression
- The idea is to enrich the features by combining the features (since it’s not trivial to collect more data)
- Example frontage and depth can be “merged” to create an
area
feature
- Example frontage and depth can be “merged” to create an
- Must be careful to not create a so complex function that will know by heart the test data and miss completly on the validation set
Classification (Logistic Regression)
https://en.wikipedia.org/wiki/Logistic_regression
- Sigmoid function (logistic function)
- Returns a number from 0 to 1. 1 = 100% of chance of being positive class
Hyperplan of decision
Function Cost
One-vs-All
- Test each class against all of the rest
- If number of classes = 3, so we have 3 classifiers (# classifiers = # classes)
- A problem is that it can be unbalanced (e.g. one class having much less items them the other combined)
One vs One
- Better for unbalanced classes
- number of classifiers = combination of (number of class 2) {numbef of classes 2 on 2}
Sets
- We can aggregate the two previous strategies
ECOC (Error Correcting output code)
Confusion Matrix
- https://en.wikipedia.org/wiki/Confusion_matrix
- The metrics are a by-product of the Confusion Matrix
- We should always use balanced accuracy to avoid unbalanced data sets to “hide” the results
- F-score (normally F-1)
- ROC
- Changes the threshold and plot the result
Overfitting and Underfitting
- High Bias (Underfitting) vs High Variance (Overfitting)
- Bias errors:
- To few parameters
- train error similar to validation
- High Variance:
- To much parameters
- train error much smaller than validation
- Irreducible error
- Outliners, and other problems with the data
Double-Descent Model
Regularization
- The idea is to minimise the error + the sum of the parameters. Basically, avoiding the paramters to growth too much
- https://towardsdatascience.com/regularization-in-machine-learning-76441ddcf99a
One-hot enconder
- https://en.wikipedia.org/wiki/One-hot
- If we have a lot of categorical features, we may not want an algorithm that uses the distance between the features
- We probably should use something like decision-trees,