Regularization for Simplicity

Regularization means penalizing the complexity of a model to reduce overfitting.

Regularization for Simplicity

The loss function for the training set gradually declines. By contrast, the loss function for the validation set declines, but then starts to rise.
  • We want to avoid model complexity where possible.
  • We can bake this idea into the optimization we do at training time.
  • Empirical Risk Minimization:
    • aims for low training error
    • $$ \text{minimize: } Loss(Data\;|\;Model) $$

  • We want to avoid model complexity where possible.
  • We can bake this idea into the optimization we do at training time.
  • Structural Risk Minimization:
    • aims for low training error
    • while balancing against complexity
    • $$ \text{minimize: } Loss(Data\;|\;Model) + complexity(Model) $$

  • How to define complexity(Model)?
  • How to define complexity(Model)?
  • Prefer smaller weights
  • How to define complexity(Model)?
  • Prefer smaller weights
  • Diverging from this should incur a cost
  • Can encode this idea via L2 regularization (a.k.a. ridge)
    • complexity(model) = sum of the squares of the weights
    • Penalizes really big weights
    • For linear models: prefers flatter slopes
    • Bayesian prior:
      • weights should be centered around zero
      • weights should be normally distributed

$$ Loss(Data|Model) + \lambda \left(w_1^2 + \ldots + w_n^2 \right) $$

\(\text{Where:}\)

\(Loss\text{: Aims for low training error}\) \(\lambda\text{: Scalar value that controls how weights are balanced}\) \(w_1^2+\ldots+w_n^2\text{: Square of}\;L_2\;\text{norm}\)