**Regularization** means penalizing the complexity of a model to reduce
overfitting.

# Regularization for Simplicity

## Generalization Curve

## Penalizing Model Complexity

- We want to avoid model complexity where possible.
- We can bake this idea into the optimization we do at training time.
- Empirical Risk Minimization:
- aims for low training error

$$ \text{minimize: } Loss(Data\;|\;Model) $$

## Penalizing Model Complexity

- We want to avoid model complexity where possible.
- We can bake this idea into the optimization we do at training time.
- Structural Risk Minimization:
- aims for low training error
- while balancing against complexity

$$ \text{minimize: } Loss(Data\;|\;Model) + complexity(Model) $$

## Regularization

- How to define complexity(Model)?

## Regularization

- How to define complexity(Model)?
- Prefer smaller weights

## Regularization

- How to define complexity(Model)?
- Prefer smaller weights
- Diverging from this should incur a cost
- Can encode this idea via L
_{2}**regularization**(a.k.a. ridge) *complexity(model) = sum of the squares of the weights*- Penalizes really big weights
- For linear models: prefers flatter slopes
- Bayesian prior:
- weights should be centered around zero
- weights should be normally distributed

## A Loss Function with L_{2} Regularization

$$ Loss(Data|Model) + \lambda \left(w_1^2 + \ldots + w_n^2 \right) $$

\(\text{Where:}\)

\(Loss\text{: Aims for low training error}\)
\(\lambda\text{: Scalar value that controls how weights are balanced}\)
\(w_1^2+\ldots+w_n^2\text{: Square of}\;L_2\;\text{norm}\)