# Reducing Loss

To train a model, we need a good way to reduce the model’s loss. An iterative approach is one widely used method for reducing loss, and is as easy and efficient as walking down a hill.

# Reducing Loss

• Hyperparameters are the configuration settings used to tune how the model is trained.
• Derivative of (y - y')2 with respect to the weights and biases tells us how loss changes for a given example
• Simple to compute and convex
• So we repeatedly take small steps in the direction that minimizes loss
• We call these Gradient Steps (But they're really negative Gradient Steps)
• This strategy is called Gradient Descent • For convex problems, weights can start anywhere (say, all 0s)
• Convex: think of a bowl shape
• Just one minimum • For convex problems, weights can start anywhere (say, all 0s)
• Convex: think of a bowl shape
• Just one minimum
• Foreshadowing: not true for neural nets
• Non-convex: think of an egg crate
• More than one minimum
• Strong dependency on initial values • Could compute gradient over entire data set on each step, but this turns out to be unnecessary
• Computing gradient on small data samples works well
• On every step, get a new random sample
• Stochastic Gradient Descent: one example at a time
• Mini-Batch Gradient Descent: batches of 10-1000
• Loss & gradients are averaged over the batch