Set a learning rate of 0.03 on the slider. Keep hitting the STEP button until the gradient descent algorithm reaches the minimum point of the loss curve. How many steps did it take?
Gradient descent reaches the minimum of the curve in 40 steps.
Can you reach the minimum more quickly with a higher learning rate? Set a learning rate of 0.1, and keep hitting STEP until gradient descent reaches the minimum. How many steps did it take this time?
Gradient descent reaches the minimum of the curve in 11 steps.
How about an even larger learning rate. Reset the graph, set a learning rate of 1, and try to reach the minimum of the loss curve. What happened this time?
Gradient descent never reaches the minimum. As a result, steps progressively increase in size. Each step jumps back and forth across the bowl, climbing the curve instead of descending to the bottom.
Can you find the Goldilocks learning rate for this curve, where gradient descent reaches the minimum point in the fewest number of steps? What is the fewest number of steps required to reach the minimum?
The Goldilocks learning rate for this data is somewhere between 0.2 and 0.3, which would reach the minimum in three or four steps.
NOTE: In practice, finding a "perfect" (or near-perfect) learning rate is not essential for successful model training. The goal is to find a learning rate large enough that gradient descent converges efficiently, but not so large that it never converges.