Training Neural Networks

Backpropagation is the most common training algorithm for neural networks. It makes gradient descent feasible for multi-layer neural networks. TensorFlow handles backpropagation automatically, so you don't need a deep understanding of the algorithm. To get a sense of how it works, walk through the following: Backpropagation algorithm visual explanation. As you scroll through the preceding explanation, note the following:

  • How data flows through the graph.
  • How dynamic programming lets us avoid computing exponentially many paths through the graph. Here "dynamic programming" just means recording intermediate results on the forward and backward passes.

Training Neural Nets

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it
  • Gradients can vanish
    • Each additional layer can successively reduce signal vs. noise
    • ReLus are useful here

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it
  • Gradients can vanish
    • Each additional layer can successively reduce signal vs. noise
    • ReLus are useful here
  • Gradients can explode
    • Learning rates are important here
    • Batch normalization (useful knob) can help

Backprop: What You Need To Know

  • Gradients are important
    • If it's differentiable, we can probably learn on it
  • Gradients can vanish
    • Each additional layer can successively reduce signal vs. noise
    • ReLus are useful here
  • Gradients can explode
    • Learning rates are important here
    • Batch normalization (useful knob) can help
  • ReLu layers can die
    • Keep calm and lower your learning rates

Normalizing Feature Values

  • We'd like our features to have reasonable scales
    • Roughly zero-centered, [-1, 1] range often works well
    • Helps gradient descent converge; avoid NaN trap
    • Avoiding outlier values can also help
  • Can use a few standard methods:
    • Linear scaling
    • Hard cap (clipping) to max, min
    • Log scaling

Dropout Regularization

  • Dropout: Another form of regularization, useful for NNs
  • Works by randomly "dropping out" units in a network for a single gradient step
    • There's a connection to ensemble models here
  • The more you drop out, the stronger the regularization
    • 0.0 = no dropout regularization
    • 1.0 = drop everything out! learns nothing
    • Intermediate values more useful

Send feedback about...

Machine Learning Crash Course