# Logistic Regression

Stay organized with collections Save and categorize content based on your preferences.

Instead of predicting exactly 0 or 1, logistic regression generates a probability—a value between 0 and 1, exclusive. For example, consider a logistic regression model for spam detection. If the model infers a value of 0.932 on a particular email message, it implies a 93.2% probability that the email message is spam. More precisely, it means that in the limit of infinite training examples, the set of examples for which the model predicts 0.932 will actually be spam 93.2% of the time and the remaining 6.8% will not.

# Logistic Regression

• Imagine the problem of predicting probability of Heads for bent coins
• You might use features like angle of bend, coin mass, etc.
• What's the simplest model you could use?
• What could go wrong?
• Many problems require a probability estimate as output
• Enter Logistic Regression
• Many problems require a probability estimate as output
• Enter Logistic Regression
• Handy because the probability estimates are calibrated
• for example, p(house will sell) * price = expected outcome
• Many problems require a probability estimate as output
• Enter Logistic Regression
• Handy because the probability estimates are calibrated
• for example, p(house will sell) * price = expected outcome
• Also useful for when we need a binary classification
• spam or not spam? → p(Spam)

$$y' = \frac{1}{1 + e^{-(w^Tx+b)}}$$

$$\text{Where:}$$ $$x\text{: Provides the familiar linear model}$$ $$1+e^{-(...)}\text{: Squish through a sigmoid}$$

$$LogLoss = \sum_{(x,y)\in D} -y\,log(y') - (1 - y)\,log(1 - y')$$

• Regularization is super important for logistic regression.
• Remember the asymptotes
• It'll keep trying to drive loss to 0 in high dimensions
• Regularization is super important for logistic regression.
• Remember the asymptotes
• It'll keep trying to drive loss to 0 in high dimensions
• Two strategies are especially useful:
• L2 regularization (aka L2 weight decay) - penalizes huge weights.
• Early stopping - limiting training steps or learning rate.
• Linear logistic regression is extremely efficient.
• Very fast training and prediction times.
• Short / wide models use a lot of RAM.
[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]