Stay organized with collections
Save and categorize content based on your preferences.
Introducing Feature Crosses
Can a feature cross truly enable a model to fit nonlinear data?
To find out, try this exercise.
Task: Try to create a model that separates the blue dots from
the orange dots by manually changing the weights of the following
three input features:
x1
x2
x1x2 (a feature cross)
To manually change a weight:
Click on a line that connects FEATURES to OUTPUT.
An input form will appear.
Type a floating-point value into that input form.
Press Enter.
Note that the interface for this exercise does not contain a Step button.
That's because this exercise does not iteratively train a model.
Rather, you will manually enter the "final" weights for the model.
(Answers appear just below the exercise.)
Click the plus icon for the answer.
w1 = 0
w2 = 0
x1x2 = 1 (or any positive value)
If you enter a negative value for the feature cross, the model will separate
the blue dots from the orange dots but the predictions will be completely wrong.
That is, the model will predict orange for the blue dots, and blue for
the orange dots.
More Complex Feature Crosses
Now let's play with some advanced feature cross combinations.
The data set in this Playground
exercise looks a bit like a noisy
bullseye from a game of darts, with the blue dots in the middle and
the orange dots in an outer ring.
Click the plus icon for an explanation of model visualization.
Each Playground exercise displays a visualization of the current
state of the model. For example, here's a visualization:
Note the following about the model visualization:
Each axis represents a specific feature. In the case of spam vs. not spam,
the features could be the word count and the number of recipients of the
email.
Each dot plots the feature values for one example of the data, such as
an email.
The color of the dot represents the class that the example belongs to.
For example, the blue dots can represent non-spam emails while the
orange dots can represent spam emails.
The background color represents the model's prediction of where examples
of that color should be found. A blue background around a blue dot
means that the model is correctly predicting that example. Conversely,
an orange background around a blue dot means that the model is
incorrectly predicting that example.
The background blues and oranges are scaled. For example, the left side of
the visualization is solid blue but gradually fades to white in the center
of the visualization. You can think of the color strength as suggesting
the model's confidence in its guess. So solid blue means that the model
is very confident about its guess and light blue means that the model
is less confident. (The model visualization shown in the figure is doing
a poor job of prediction.)
Use the visualization to judge your model's progress.
("Excellent—most of the blue dots have a blue background" or
"Oh no! The blue dots have an orange background.")
Beyond the colors, Playground
also displays the model's current loss numerically.
("Oh no! Loss is going up instead of down.")
Task 1: Run this linear model as given. Spend a minute or two (but no
longer) trying different learning rate settings to see if you can find
any improvements. Can a linear model produce effective results for
this data set?
Task 2: Now try adding in cross-product features, such as
x1x2, trying to optimize performance.
Which features help most?
What is the best performance that you can get?
Task 3: When you have a good model, examine the model output
surface (shown by the background color).
Does it look like a linear model?
How would you describe the model?
(Answers appear just below the exercise.)
Click the plus icon for the answer to Task 1.
No. A linear model cannot effectively model this data set. Reducing
the learning rate reduces loss, but loss still converges at an
unacceptably high value.
Click the plus icon for an answer to Task 2.
Playground's data sets are randomly generated. Consequently, our
answers may not always agree exactly with yours. In fact, if you
regenerate the data set between runs, your own results won't always
agree exactly with your previous runs. That said, you'll get better
results by doing the following:
Using both x12 and x22 as
feature crosses. (Adding x1x2 as a feature cross
doesn't appear to help.)
Reducing the Learning rate, perhaps to 0.001.
Click the plus icon for an answer to Task 3.
The model output surface does not look like a linear model. Rather,
it looks elliptical.