Machine Learning | Google for Developers

Explore the options below.

Suppose you want to develop a supervised machine learning model to predict whether a given email is "spam" or "not spam." Which of the following statements are true?

Emails not marked as "spam" or "not spam" are unlabeled examples.

Because our label consists of the values "spam" and "not spam", any email not yet marked as spam or not spam is an unlabeled example.

Words in the subject header will make good labels.

Words in the subject header might make excellent features, but they won't make good labels.

We'll use unlabeled examples to train the model.

We'll use labeled examples to train the model. We can then run the trained model against unlabeled examples to infer whether the unlabeled email messages are spam or not spam.

The labels applied to some examples might be unreliable.

Definitely. It's important to check how reliable your data is. The labels for this dataset probably come from email users who mark particular email messages as spam. Since most users do not mark every suspicious email message as spam, we may have trouble knowing whether an email is spam. Furthermore, spammers could intentionally poison our model by providing faulty labels.