
Overfitting

Underfitting
§ Type of problems
(Read Weapons of Math Destruction)
 Bad Data
 Bad Algorithm
§ Cost function
 Quadradic is very good because it penalises more bigger errors
§ Linear regression
 https://en.wikipedia.org/wiki/Linear_regression
 Nomal equation
 Can find faster the response, but because of the inverse matrix computation (O(n^3)) it it’s very expensive to compute the result for big sets
 Gradient descent
 Alpha is not easy to pick
 Needs a lot of interactions
§ Metrics for regression problems
 Mean Squared error (MSE);
 Mean Absolute Error (MAE); # Good for humans to see what’s the error
 R^2 # more close to zero = better
§ Features and Polynomial Regression
 https://en.wikipedia.org/wiki/Polynomial_regression
 The idea is to enrich the features by combining the features (since it’s not trivial to collect more data)
 Example frontage and depth can be “merged” to create an
area
feature
 Example frontage and depth can be “merged” to create an
 Must be careful to not create a so complex function that will know by heart the test data and miss completly on the validation set
§ Classification (Logistic Regression)
https://en.wikipedia.org/wiki/Logistic_regression
 Sigmoid function (logistic function)
 Returns a number from 0 to 1. 1 = 100% of chance of being positive class
§ Hyperplan of decision
§ Function Cost
§ OnevsAll
 Test each class against all of the rest
 If number of classes = 3, so we have 3 classifiers (# classifiers = # classes)
 A problem is that it can be unbalanced (e.g. one class having much less items them the other combined)
§ One vs One
 Better for unbalanced classes
 number of classifiers = combination of (number of class 2) {numbef of classes 2 on 2}
§ Sets
 We can aggregate the two previous strategies
§ ECOC (Error Correcting output code)
§ Confusion Matrix
 https://en.wikipedia.org/wiki/Confusion_matrix
 The metrics are a byproduct of the Confusion Matrix
 We should always use balanced accuracy to avoid unbalanced data sets to “hide” the results
 Fscore (normally F1)
 ROC
 Changes the threshold and plot the result
§ Overfitting and Underfitting
 High Bias (Underfitting) vs High Variance (Overfitting)
 Bias errors:
 To few parameters
 train error similar to validation
 High Variance:
 To much parameters
 train error much smaller than validation
 Irreducible error
 Outliners, and other problems with the data
§ DoubleDescent Model
§ Regularization
 The idea is to minimise the error + the sum of the parameters. Basically, avoiding the paramters to growth too much
 https://towardsdatascience.com/regularizationinmachinelearning76441ddcf99a
§ Onehot enconder
 https://en.wikipedia.org/wiki/Onehot
 If we have a lot of categorical features, we may not want an algorithm that uses the distance between the features
 We probably should use something like decisiontrees,