Logistic Regression

Instead of predicting exactly 0 or 1, logistic regression generates a probability—a value between 0 and 1, exclusive. For example, consider a logistic regression model for spam detection. If the model infers a value of 0.932 on a particular email message, it implies a 93.2% probability that the email message is spam. More precisely, it means that in the limit of infinite training examples, the set of examples for which the model predicts 0.932 will actually be spam 93.2% of the time and the remaining 6.8% will not.

Logistic Regression

Predicting Coin Flips?

• Imagine the problem of predicting probability of Heads for bent coins
• You might use features like angle of bend, coin mass, etc.
• What's the simplest model you could use?
• What could go wrong? Logistic Regression

• Many problems require a probability estimate as output
• Enter Logistic Regression

Logistic Regression

• Many problems require a probability estimate as output
• Enter Logistic Regression
• Handy because the probability estimates are calibrated
• for example, p(house will sell) * price = expected outcome

Logistic Regression

• Many problems require a probability estimate as output
• Enter Logistic Regression
• Handy because the probability estimates are calibrated
• for example, p(house will sell) * price = expected outcome
• Also useful for when we need a binary classification
• spam or not spam? → p(Spam)

Logistic Regression -- Predictions

$$y' = \frac{1}{1 + e^{-(w^Tx+b)}}$$

$$\text{Where:}$$ $$x\text{: Provides the familiar linear model}$$ $$1+e^{-(...)}\text{: Squish through a sigmoid}$$ LogLoss Defined

$$LogLoss = \sum_{(x,y)\in D} -y\,log(y') - (1 - y)\,log(1 - y')$$ Logistic Regression and Regularization

• Regularization is super important for logistic regression.
• Remember the asymptotes
• It'll keep trying to drive loss to 0 in high dimensions

Logistic Regression and Regularization

• Regularization is super important for logistic regression.
• Remember the asymptotes
• It'll keep trying to drive loss to 0 in high dimensions
• Two strategies are especially useful:
• L2 regularization (aka L2 weight decay) - penalizes huge weights.
• Early stopping - limiting training steps or learning rate.

Linear Logistic Regression

• Linear logistic regression is extremely efficient.
• Very fast training and prediction times.
• Short / wide models use a lot of RAM.