Training Sets and Test Sets
We return to Playground to experiment with training sets
and test sets.
Click the plus icon for a reminder of what the orange and blue dots mean.
In the visualization:
- Each blue dot signifies one example of one class of data (for example,
spam).
- Each orange dot signifies one example of another class of data (for
example, not spam).
- The background color represents the model's prediction of where examples
of that color should be found. A blue background around a blue dot
means that the model is correctly predicting that example. Conversely,
an orange background around a blue dot means that the model is making
an incorrect prediction for that example.
This exercise provides both a test set and a training set, both drawn from
the same data set. By default, the visualization shows only the training
set. If you'd like to also see the test set, click
the Show test data checkbox just below the visualization. In the
visualization, note the following distinction:
- The training examples have a white outline.
- The test examples have a black outline.
Task 1: Run Playground with the given settings by doing the
following:
- Click the Run/Pause button:

- Watch the Test loss and Training loss values change.
- When the Test loss and Training loss values stop changing
or only change once in a while, press the Run/Pause button
again to pause Playground.
Note the delta between the Test loss and Training loss. We'll try to reduce this
delta in the following tasks.
Task 2: Do the following:
- Press the Reset button.

- Modify the Learning
rate.
- Press the Run/Pause button:
- Let Playground run for at least 150 epochs.
Is the delta between Test loss and Training loss lower or
higher with this new Learning rate? What happens if you modify both
Learning rate and
batch size?
Optional Task 3: A slider labeled Training data percentage
lets you control the proportion of training data to test data. For example,
when set to 90%, then 90% of the data is used for the training set and the
remaining 10% is used for the test set.
Do the following:
- Reduce the "Training data percentage" from 50% to 10%.
- Experiment with Learning rate and Batch size, taking notes on your
findings.
Does altering the training data percentage change the optimal
learning settings that you discovered in Task 2? If so, why?
Click the plus icon for the answer to Task 1.
With learning rate set to 3 (the initial setting),
Test loss is significantly higher than Training loss.
Click the plus icon for the answer to Task 2.
By reducing learning rate (for example, to 0.001),
Test loss drops to a value much closer to Training loss. In most runs,
increasing Batch size does not influence Training loss or Test
loss significantly. However, in a small percentage of runs, increasing
Batch size to 20 or greater causes Test loss to drop slightly
below Training loss.
Playground's data sets are randomly generated. Consequently, our
answers may not always agree exactly with yours.
Click the plus icon for the answer to Task 3.
Reducing the Training data percentage from 50% to 10% dramatically
lowers the number of data points in the training set. With so little data,
high batch size and high learning rate cause the training model to jump
around chaotically (jumping repeatedly over the minimum point).