# Reducing Loss

To train a model, we need a good way to reduce the model’s loss. An iterative approach is one widely used method for reducing loss, and is as easy and efficient as walking down a hill.

# Reducing Loss

• Hyperparameters are the configuration settings used to tune how the model is trained.
• Derivative of (y - y')2 with respect to the weights and biases tells us how loss changes for a given example
• Simple to compute and convex
• So we repeatedly take small steps in the direction that minimizes loss
• We call these Gradient Steps (But they're really negative Gradient Steps)
• This strategy is called Gradient Descent
• For convex problems, weights can start anywhere (say, all 0s)
• Convex: think of a bowl shape
• Just one minimum
• For convex problems, weights can start anywhere (say, all 0s)
• Convex: think of a bowl shape
• Just one minimum
• Foreshadowing: not true for neural nets
• Non-convex: think of an egg crate
• More than one minimum
• Strong dependency on initial values
• Could compute gradient over entire data set on each step, but this turns out to be unnecessary
• Computing gradient on small data samples works well
• On every step, get a new random sample
• Stochastic Gradient Descent: one example at a time
• Mini-Batch Gradient Descent: batches of 10-1000
• Loss & gradients are averaged over the batch
[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]