## A First Neural Network

In this exercise, we will train our first little neural net. Neural nets will give us a way to learn nonlinear models without the use of explicit feature crosses.

**Task 1:** The model as given combines our two input features into
a single neuron. Will this model learn any nonlinearities?
Run it to confirm your guess.

**Task 2:** Try increasing the number of neurons in the hidden layer
from 1 to 2, and also try changing from a Linear activation to a
nonlinear activation like ReLU. Can you create a model that can
learn nonlinearities?

**Task 3:** Continue experimenting by adding or removing hidden layers
and neurons per layer. Also feel free to change learning rates,
regularization, and other learning settings. What is the smallest
number of nodes and layers you can use that gives test loss
of 0.177 or lower?

(Answers appear just below the exercise.)

## Neural Net Initialization

This exercise uses the XOR data again, but looks at the repeatability of training Neural Nets and the importance of initialization.

**Task 1:** Run the model as given four or five times. Before each trial,
hit the **Reset the network** button to get a new random initialization.
(The **Reset the network** button is the circular reset arrow just to the
left of the Play button.) Let each trial run for at least 500 steps
to ensure convergence. What shape does each model output converge to?
What does this say about the role of initialization in non-convex
optimization?

**Task 2:** Try making the model slightly more complex by adding a layer
and a couple of extra nodes. Repeat the trials from Task 1. Does this
add any additional stability to the results?

(Answers appear just below the exercise.)

## Neural Net Spiral

This data set is a noisy spiral. Obviously, a linear model will fail here, but even manually defined feature crosses may be hard to construct.

**Task 1:** Train the best model you can, using just X_{1} and
X_{2}. Feel free to add or remove layers and neurons, change
learning settings like learning rate, regularization rate, and
batch size. What is the best test loss you can get? How smooth is
the model output surface?

**Task 2:** Even with Neural Nets, some amount of feature engineering is
often needed to achieve best performance. Try adding in additional
cross product features or other transformations like
sin(X_{1}) and sin(X_{2}). Do you get a better
model? Is the model output surface any smoother?

(Answers appear just below the exercise.)