
Overfitting

Underfitting
Type of problems
(Read Weapons of Math Destruction)
 Bad Data
 Bad Algorithm
Cost function
 Quadradic is very good because it penalises more bigger errors
Linear regression
 https://en.wikipedia.org/wiki/Linear_regression
 Nomal equation
 Can find faster the response, but because of the inverse matrix computation (O(n^3)) it it’s very expensive to compute the result for big sets
 Gradient descent
 Alpha is not easy to pick
 Needs a lot of interactions
Metrics for regression problems
 Mean Squared error (MSE);
 Mean Absolute Error (MAE); # Good for humans to see what’s the error
 R^2 # more close to zero = better
Features and Polynomial Regression
 https://en.wikipedia.org/wiki/Polynomial_regression
 The idea is to enrich the features by combining the features (since it’s not trivial to collect more data)
 Example frontage and depth can be “merged” to create an
area
feature
 Example frontage and depth can be “merged” to create an
 Must be careful to not create a so complex function that will know by heart the test data and miss completly on the validation set
Classification (Logistic Regression)
https://en.wikipedia.org/wiki/Logistic_regression
 Sigmoid function (logistic function)
 Returns a number from 0 to 1. 1 = 100% of chance of being positive class
Hyperplan of decision
Function Cost
OnevsAll
 Test each class against all of the rest
 If number of classes = 3, so we have 3 classifiers (# classifiers = # classes)
 A problem is that it can be unbalanced (e.g. one class having much less items them the other combined)
One vs One
 Better for unbalanced classes
 number of classifiers = combination of (number of class 2) {numbef of classes 2 on 2}
Sets
 We can aggregate the two previous strategies
ECOC (Error Correcting output code)
Confusion Matrix
 https://en.wikipedia.org/wiki/Confusion_matrix
 The metrics are a byproduct of the Confusion Matrix
 We should always use balanced accuracy to avoid unbalanced data sets to “hide” the results
 Fscore (normally F1)
 ROC
 Changes the threshold and plot the result
Overfitting and Underfitting
 High Bias (Underfitting) vs High Variance (Overfitting)
 Bias errors:
 To few parameters
 train error similar to validation
 High Variance:
 To much parameters
 train error much smaller than validation
 Irreducible error
 Outliners, and other problems with the data
DoubleDescent Model
Regularization
 The idea is to minimise the error + the sum of the parameters. Basically, avoiding the paramters to growth too much
 https://towardsdatascience.com/regularizationinmachinelearning76441ddcf99a
Onehot enconder
 https://en.wikipedia.org/wiki/Onehot
 If we have a lot of categorical features, we may not want an algorithm that uses the distance between the features
 We probably should use something like decisiontrees,