# § Linear regression

• https://en.wikipedia.org/wiki/Linear_regression
• Nomal equation
• Can find faster the response, but because of the inverse matrix computation (O(n^3)) it it’s very expensive to compute the result for big sets
• Alpha is not easy to pick
• Needs a lot of interactions

## § Metrics for regression problems

• Mean Squared error (MSE);
• Mean Absolute Error (MAE); # Good for humans to see what’s the error
• R^2 # more close to zero = better

## § Features and Polynomial Regression

• https://en.wikipedia.org/wiki/Polynomial_regression
• The idea is to enrich the features by combining the features (since it’s not trivial to collect more data)
• Example frontage and depth can be “merged” to create an `area` feature
• Must be careful to not create a so complex function that will know by heart the test data and miss completly on the validation set

# § Classification (Logistic Regression)

https://en.wikipedia.org/wiki/Logistic_regression

• Sigmoid function (logistic function)
• Returns a number from 0 to 1. 1 = 100% of chance of being positive class

## § One-vs-All

• Test each class against all of the rest
• If number of classes = 3, so we have 3 classifiers (# classifiers = # classes)
• A problem is that it can be unbalanced (e.g. one class having much less items them the other combined)

## § One vs One

• Better for unbalanced classes
• number of classifiers = combination of (number of class 2) {numbef of classes 2 on 2}

## § Sets

• We can aggregate the two previous strategies

## § Confusion Matrix

• https://en.wikipedia.org/wiki/Confusion_matrix
• The metrics are a by-product of the Confusion Matrix
• We should always use balanced accuracy to avoid unbalanced data sets to “hide” the results
• F-score (normally F-1)
• ROC
• Changes the threshold and plot the result

# § Overfitting and Underfitting

• High Bias (Underfitting) vs High Variance (Overfitting)
• Bias errors:
• To few parameters
• train error similar to validation
• High Variance:
• To much parameters
• train error much smaller than validation
• Irreducible error
• Outliners, and other problems with the data