Decision Boundaries with Logistic Regression

So now that we have defined our hypothesis for logistic regression lets talk about what is called a decision boundary. We can translate the output of the hypothesis to say if h(x) >= 0.5 then y = 1 otherwise h(x) < 0.5 and y = 0. This is fair because it gives a half/half split. The way the logistic function g(x) behaves is that g(x) >= 0.5 only when x > 0. Our hypothesis is g(theta transpose * x) so basically our hypothesis will only return true when (theta transpose * x > 0). Let’s take a look at an example:

The X’s represent positive values. Basically we haven’t learned yet how to parameterize and arrive at values for our thetas but assume it reaches -3, 1, 1. Now theta transpose * x >= 0 for y = 1 as we explained before. We compute theta transpose * x and get the equation above. When we plot it (remember x2 is y in this case) we get the line in pink, anything on the right is true and anything on the left is false. This pink line is what we call the decision boundary, all values on it have an output of exactly 0.5. It is crucial to understand that the data-set has no impact on the decision boundary. The data will tweak the value’s of theta accordingly but theta is the main decision factor in creating the boundary. Let’s look at a more complicated example.

Just like in polynomial regression (refer back to linear regression articles) you can add more terms, and in this case because we want a circle boundary we add the x1^2 and y1^2. Again, we don’t know how to get the values of theta just yet, but hold on tight. Assume the vector theta in the example, and compute theta transpose * x; this product is >= 0 for y = 1. We can rearrange the equation to be the one in pink – the equation of a circle. Everything inside 0 and everything outside is positive.

Decision Boundaries with Logistic Regression

Leave a comment

Cancel reply

Share this:

Related

Leave a comment

Cancel reply