Hi! My name is Aayush Sheth and I am a student at the Nikola Tesla STEM Highschool. My love for coding started around the beginning of my freshman year. Before my first day of highschool, I was required to attend an orientation. The purpose of this "meet-and greet" was to get us adjusted to the …
Support Vector Machines Kernels Intro
Again, we are really branching off past articles, so if you have not read up on logistic regression and the past SVM article, please do so! For this next lesson we are going to look at the possibility of including replacing the complex polynomial features in a cost function for the SVM. For example, in …
Intro to Support Vector Machines (SVM’s): The Large Margin Classifier
Now that we learned about logistic regression, let us talk about support vector machines, also known as SVM’s. To gain more intuition behind the SVM, we are going to go back to our understanding of logistic regression. If you have not read those sets of articles, please do otherwise you will get lost. Here is …
Continue reading "Intro to Support Vector Machines (SVM’s): The Large Margin Classifier"
More on Bias + Variance
We can also see that changing the regularization parameter also helps. A list of lambda’s should be created and a learning model should go through each lambda, learn new theta, and then find the training test error + cross set validation error. The lambda which produces the output with the least cost should be used. …
Analyzing and Improving Our Learning Models
Now that we have some learning models under our belt, how do we improve upon a specific one? There are many options you can do like getting more training examples, increasing number of features, adding polynomial features, decreasing the learning rate, or changing the regularization term. Out of these options is there a way to …
Continue reading "Analyzing and Improving Our Learning Models"
Backpropogation: Neural Networks Learning Theta
Now, the task remains on finding the weights (theta) for the neural network. First, we need a cost function. Let us declare some variables as follows: Furthermore, h(x)k represents the hypothesis for the kth element in the final output layer. This might not make a whole lot of sense, but hopefully the couple of figures …
Continue reading "Backpropogation: Neural Networks Learning Theta"
Vectorizing Nueral Networks
We can also do a vectorized implementation of the activation functions for the nodes. We can use a new variable called z (subscript k exponential j) that represents what is inside of g function. So for the second layer’s activation we can represent it as: So layer 2 (j=2) and node k, the variable z …
Introduction to Neural Networks
Neural networks is the best alternative if a logistic/linear regression model is not enough. In neural networks there are neurons, which are basically just computational units that takes inputs and gives an output (in terms of h(x)). If you remember our logistic regression hypothesis, we use the same hypothesis as we want all of our …
Over-fitting and Regularization in Logistic Regression
Let us say we are in the situation of modelling the data set below. Though the left graph is fitted well for a linear line, it does not best incorporate the trend of the dataset compared to the middle graph. This is called underfitting. The rightmost dataset is overfitted to the data set, as the …
Continue reading "Over-fitting and Regularization in Logistic Regression"
Gradient Descent and Multi-classification Tasks Logistic Regression
Basically there isn't a lot to talk about this article because nothing has really changed from linear regression in gradient descent and the multiclass problem is actually very intuitively easy to grasp and implement. Now gradient descent is exactly the same as linear regression, but the h(x) has changed! And a vectorized approach: Instead of …
Continue reading "Gradient Descent and Multi-classification Tasks Logistic Regression"
Cost Function for Logistic Regression
Now the problem arises on how to create a cost function for our logistic regression model. You may think back to our linear regression article and wonder why we can not continue to use the squared error as the cost. The reason is because our hypothesis is now 1/(1-e^theta transpose * x), so the graph …