Logistic Regression

Instead of predicting exactly 0 or 1, logistic regression generates a probability—a value between 0 and 1, exclusive. For example, consider a logistic regression model for spam detection. If the model infers a value of 0.932 on a particular email message, it implies a 93.2% probability that the email message is spam. More precisely, it means that in the limit of infinite training examples, the set of examples for which the model predicts 0.932 will actually be spam 93.2% of the time and the remaining 6.8% will not.

Logistic Regression

  • Imagine the problem of predicting probability of Heads for bent coins
  • You might use features like angle of bend, coin mass, etc.
  • What's the simplest model you could use?
  • What could go wrong?
2 coins bent
  • Many problems require a probability estimate as output
  • Enter Logistic Regression
  • Many problems require a probability estimate as output
  • Enter Logistic Regression
  • Handy because the probability estimates are calibrated
    • for example, p(house will sell) * price = expected outcome
  • Many problems require a probability estimate as output
  • Enter Logistic Regression
  • Handy because the probability estimates are calibrated
    • for example, p(house will sell) * price = expected outcome
  • Also useful for when we need a binary classification
    • spam or not spam? → p(Spam)

$$ y' = \frac{1}{1 + e^{-(w^Tx+b)}} $$

\(\text{Where:} \) \(x\text{: Provides the familiar linear model}\) \(1+e^{-(...)}\text{: Squish through a sigmoid}\)

Graph of logistic-regression equation

$$ LogLoss = \sum_{(x,y)\in D} -y\,log(y') - (1 - y)\,log(1 - y') $$

Two graphs of Log Loss vs. predicted value: one for a target value of 0.0 (which arcs up and to the right) and one for a target value of 1.0 (which arcs down and to the left)
  • Regularization is super important for logistic regression.
    • Remember the asymptotes
    • It'll keep trying to drive loss to 0 in high dimensions
  • Regularization is super important for logistic regression.
    • Remember the asymptotes
    • It'll keep trying to drive loss to 0 in high dimensions
  • Two strategies are especially useful:
    • L2 regularization (aka L2 weight decay) - penalizes huge weights.
    • Early stopping - limiting training steps or learning rate.
  • Linear logistic regression is extremely efficient.
    • Very fast training and prediction times.
    • Short / wide models use a lot of RAM.