Let’s say we want to create a model to solve a binary classification problem; yes or no. For example based on the tumor size is a specific tumor malignant; yes or no? Let’s look more into this example with the hypothetical data below:

If we apply linear regression to this data set we get:

Now we can say anything where y < 0.5 is not malignant and if y > 0.5 it is malignant (where y = h(x)). However, though this works good on this data set, let’s add another point and perform linear regression on it.

In blue you can see how this new data point caused our linear regression line to change. Now if we use that 0.5 threshold it does not work. The threshold will have to change every time that a new datapoint is added to a set. This is quite bad for a prediction model, and it’s quite strange how h(x) < 0 or h(x) > 1 even though our bounds are [0,1]. Hence, the logistic regression model, with a range of [0,1] was created. This model is also dubbed the “classification” model because of the wide use of logistic regression for classification tasks.
Let’s talk about what the Sigmoid Function, also known as the Logistic Function, is.

The graph for this function has a range of [0,1] and crosses the y-intercept at 0.5, please look up a graph if you need a better visual understanding. But now let us go back to our hypothesis which was theta transpose * x. Now if we apply the sigmoid function on h(x), because h(x) = theta transpose * x, it will become the equation in the photo above, but -z will turn into -tranpose*x. Refer to the figure below if needed.

h(x) now basically tells us the probability that our output is what’s defined as one. For example if one means a malignant tumor and a subject gets a 0.7 from our hypothesis, there is a 70 percent chance that the subject is malignant. Now you can assume that because the only other possibility is that the subject is not malignant, there is a 30 percent chance that the subject is not malignant. Below is terminology used for statistics, basically P(y=1|x ; theta) is representing the probability that y is equal to 1 given x parameterized by theta.
