Page Summary
-
Logistic regression models output probabilities, which can be used directly or converted to binary categories.
-
The sigmoid function ensures the output of logistic regression is always between 0 and 1, representing a probability.
-
A logistic regression model uses a linear equation and the sigmoid function to calculate the probability of an event.
-
The log-odds (z) represent the log of the ratio of probabilities for the two possible outcomes.
Many problems require a probability estimate as output. Logistic regression is an extremely efficient mechanism for calculating probabilities. Practically speaking, you can use the returned probability in either of the following two ways:
Applied "as is." For example, if a spam-prediction model takes an email as input and outputs a value of
0.932, this implies a93.2%probability that the email is spam.Converted to a binary category such as
TrueorFalse,SpamorNot Spam.
This module focuses on using logistic regression model output as-is. In the Classification module, you'll learn how to convert this output into a binary category.
Sigmoid function
You might be wondering how a logistic regression model can ensure its output represents a probability, always outputting a value between 0 and 1. As it happens, there's a family of functions called logistic functions whose output has those same characteristics. The standard logistic function, also known as the sigmoid function (sigmoid means "s-shaped"), has the formula:
\[f(x) = \frac{1}{1 + e^{-x}}\]
where:
- f(x) is the output of the sigmoid function.
- e is Euler's number: a mathematical constant ≈ 2.71828.
- x is the input to the sigmoid function.
Figure 1 shows the corresponding graph of the sigmoid function.
As the input, x, increases, the output of the sigmoid function approaches
but never reaches 1. Similarly, as the input decreases, the sigmoid
function's output approaches but never reaches 0.
Transforming linear output using the sigmoid function
The following equation represents the linear component of a logistic regression model:
\[z = b + w_1x_1 + w_2x_2 + \ldots + w_Nx_N\]
where:
- z is the output of the linear equation, also called the log odds.
- b is the bias.
- The w values are the model's learned weights.
- The x values are the feature values for a particular example.
To obtain the logistic regression prediction, the z value is then passed to the sigmoid function, yielding a value (a probability) between 0 and 1:
\[y' = \frac{1}{1 + e^{-z}}\]
where:
- y' is the output of the logistic regression model.
- e is Euler's number: a mathematical constant ≈ 2.71828.
- z is the linear output (as calculated in the preceding equation).
Figure 2 illustrates how linear output is transformed to logistic regression output using these calculations.
In Figure 2, a linear equation becomes input to the sigmoid function, which bends the straight line into an s-shape. Notice that the linear equation can output very big or very small values of z, but the output of the sigmoid function, y', is always between 0 and 1, exclusive. For example, the yellow square on the left graph has a z value of –10, but the sigmoid function in the right graph maps that –10 into a y' value of 0.00004.
Exercise: Check your understanding
A logistic regression model with three features has the following bias and weights:
\[\begin{align} b &= 1 \\ w_1 &= 2 \\ w_2 &= -1 \\ w_3 &= 5 \end{align} \]
Given the following input values:
\[\begin{align} x_1 &= 0 \\ x_2 &= 10 \\ x_3 &= 2 \end{align} \]
Answer the following two questions.
As calculated in #1 above, the log-odds for the input values is 1. Plugging that value for z into the sigmoid function:
\(y = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{-1}} = \frac{1}{1 + 0.367} = \frac{1}{1.367} = 0.731\)