Google is committed to advancing racial equity for Black communities. See how.

# Regularization for Simplicity

Regularization means penalizing the complexity of a model to reduce overfitting.

# Regularization for Simplicity

• We want to avoid model complexity where possible.
• We can bake this idea into the optimization we do at training time.
• Empirical Risk Minimization:
• aims for low training error
• $$\text{minimize: } Loss(Data\;|\;Model)$$

• We want to avoid model complexity where possible.
• We can bake this idea into the optimization we do at training time.
• Structural Risk Minimization:
• aims for low training error
• while balancing against complexity
• $$\text{minimize: } Loss(Data\;|\;Model) + complexity(Model)$$

• How to define complexity(Model)?
• How to define complexity(Model)?
• Prefer smaller weights
• How to define complexity(Model)?
• Prefer smaller weights
• Diverging from this should incur a cost
• Can encode this idea via L2 regularization (a.k.a. ridge)
• complexity(model) = sum of the squares of the weights
• Penalizes really big weights
• For linear models: prefers flatter slopes
• Bayesian prior:
• weights should be centered around zero
• weights should be normally distributed

$$Loss(Data|Model) + \lambda \left(w_1^2 + \ldots + w_n^2 \right)$$

$$\text{Where:}$$

$$Loss\text{: Aims for low training error}$$ $$\lambda\text{: Scalar value that controls how weights are balanced}$$ $$w_1^2+\ldots+w_n^2\text{: Square of}\;L_2\;\text{norm}$$