Check Your Understanding: Model Debugging

For the following questions, click on your selection to expand and check your answer.

Modeling Approach

You and your friend Mel like unicorns. In fact, you like unicorns so much, you decide to predict unicorn appearances using ... machine learning. You have a dataset of 10,000 unicorn appearances. For each appearance, the dataset contains the location, time of day, elevation, temperature, humidity, population density, tree cover, presence of a rainbow, and many other features.

You want to start developing your ML model. Which one of the following approaches is a good way to start development?
Unicorns often appear at dawn and dusk. Therefore, use the feature "time of day" to create a linear model.
Correct. A linear model that uses one or two highly-predictive features is an effective way to start.
Predicting unicorn appearances is a very hard problem. Therefore, use a deep neural network with all available features.
Incorrect. Starting with a complex model will complicate debugging.
Start with a simple linear model but use all the features to ensure the simple model has predictive power.
Incorrect. If you use lots of the features, even with a linear model, then the resulting model is complex and hard to debug.

Baselines

Using regression with mean square error (MSE) loss, you are predicting the cost of a taxi ride using the ride's duration, distance, origin, and end. You know:

  • Mean ride cost is $15.
  • Ride cost increases by a fixed amount per kilometer.
  • Rides within the downtown area are charged extra.
  • Rides start at a minimum cost of $3.

Determine whether the following baselines are useful.

Is this a useful baseline: Every ride costs $15.
Yes
Correct. The mean cost is a useful baseline.
No
Incorrect. Always predicting the mean results in a lower MSE than always predicting any other value. Therefore, testing a model against this baseline provides a meaningful comparison.
It depends on what the standard deviation of the ride cost is.
Incorrect. Irrespective of the standard deviation, the mean cost of the ride is a useful baseline because always predicting the mean results in a lower MSE when compared to always predicting any other value.
Is this a useful baseline: A trained model that uses only duration and origin as features.
Yes
Incorrect. You should only use a trained model as a baseline after the model is fully validated in production. Furthermore, the trained model should itself be validated against a simpler baseline.
No
Correct. You should only use a trained model as a baseline after the model is fully validated in production.
Is this a useful baseline: A ride's cost is the ride distance (in kilometers) multiplied by the fare per kilometer.
Yes
Correct. Distance is the most important factor in determining ride cost. Therefore, a baseline that relies on distance is useful.
No
Incorrect. Distance is the most important factor in determinig ride cost. Therefore, a baseline that relies on distance is useful.
Is this a useful baseline: Every ride costs $1. Because the model must always beat this baseline. If the model does not beat this baseline, then we can be certain that the model has a bug.
Yes
Incorrect. This is not a useful baseline because it is always wrong. Comparing a model against a baseline that is always wrong is not meaningful.
No
Correct. This baseline is not a useful test of your model.

Hyperparameters

The following questions describe problems in training a classifier. Choose actions that could fix the problem described.

Training loss is 0.24 and validation loss is 0.36. Which two of the following actions could reduce the difference between training and validation loss?
Ensure the training and validation sets have the same statistical properties.
Correct. If the training and validation sets have different statistical properties, then the training data will not help predict the validation data.
Use regularization to prevent overfitting.
Correct. If the training loss is smaller than the validation loss, then your model is probably overfitting to the training data. Regularization prevents overfitting.
Increase the number of training epochs.
Incorrect. If the training loss is smaller than the validation loss, then your model is typically overfitting to the training data. Increasing training epochs will only increase overfitting.
Decrease the learning rate.
Incorrect. Having a validation loss that is greater than the training loss typically indicates overfitting. Changing the learning rate does not reduce overfitting.
You perform the correct actions described in the previous question, and now your training and validation losses decrease from 1.0 to roughly 0.24 after training for many epochs. Which one of the following actions could reduce your training loss further?
Increase the depth and width of your neural network.
Correct. If your training loss stays constant at 0.24 after training for many epochs, then your model might lack the predictive ability to further lower loss. Increasing the model's depth and width could give the model the additional predictive ability required to reduce the training loss further.
Increase the number of training epochs.
Incorrect. If your training loss stays at 0.24 after training for many epochs, then continuing to train the model will probably not cause the training loss to decrease significantly.
Increasing the learning rate.
Incorrect. Given that training loss did not decrease for many training epochs, increasing the learning rate will probably not lower the final training loss. Instead, increasing the learning rate could make your training unstable and prevent your model from learning the data.
You take the correct action in the previous question. Your model's training loss decreased to 0.20. Assume you need to reduce your model's training loss a little more. You add a few features that appear to have predictive power. However, training loss continues to fluctuate around 0.20. Which three of the following options could reduce your training loss?
Increase the depth and width of your layers.
Correct. Your model might lack the capacity to learn the predictive signals in the new features.
Increase the training epochs.
Incorrect. If your model's training loss is fluctuating around 0.20, then increasing the number of training epochs will probably cause the model's training loss to continue fluctuating around 0.20.
The features don't add information relative to existing features. Try a different feature.
Correct. It is possible that the predictive signals encoded by the features already exist in the features that you are using.
Decrease the learning rate.
Correct. It is possible that adding the new features made the problem more complex. Specifically, fluctuation in loss indicates that the learning rate is too high and your model is jumping around the minima. Decreasing your learning rate will let your model learn the minima.