To train a model, we need a good way to reduce the model’s loss. An iterative approach is one widely used method for reducing loss, and is as easy and efficient as walking down a hill.

# Reducing Loss

## How do we reduce loss?

- Hyperparameters are the configuration settings used to tune how the model is trained.
- Derivative of (y - y')
^{2}with respect to the weights and biases tells us how loss changes for a given example - Simple to compute and convex
- So we repeatedly take small steps in the direction that minimizes loss
- We call these
**Gradient Steps**(But they're really negative Gradient Steps) - This strategy is called
**Gradient Descent**

## Block Diagram of Gradient Descent

- Try the Gradient Descent Exercise
- When you have finished the exercise, press play ▶ to continue

## Weight Initialization

- For convex problems, weights can start anywhere (say, all 0s)
- Convex: think of a bowl shape
- Just one minimum

## Weight Initialization

- For convex problems, weights can start anywhere (say, all 0s)
- Convex: think of a bowl shape
- Just one minimum
- Foreshadowing: not true for neural nets
- Non-convex: think of an egg crate
- More than one minimum
- Strong dependency on initial values

## SGD & Mini-Batch Gradient Descent

- Could compute gradient over entire data set on each step, but this turns out to be unnecessary
- Computing gradient on small data samples works well
- On every step, get a new random sample
**Stochastic Gradient Descent**: one example at a time**Mini-Batch Gradient Descent**: batches of 10-1000- Loss & gradients are averaged over the batch