Understanding the linearly separable binary classifier from the ground up using R
The perceptron. It has become a rite of passage for comprehending the underlying mechanism of neural networks, and machine learning as a whole. The algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt with funding from the US government . The concept behind the perceptron is simple: separate two distinct categories of data based on linear separability. This tutorial uses data that has 2 dimensions, which can be downloaded here. Lets start by importing the data into RStudio, then dividing the dataset into separate train and test DataFrames (80% train, 20% test).
Our data is now ready to be classified. But as always, exploratory data analysis (EDA) is an important process that should not be overlooked. We can plot the train set with the known labels so we know what to expect of our implemented perceptron function.
It looks like with the exception of a few outliers, the data is linearly separably. Great! We can now proceed to learn the intracacies of this neural network classifier.
The Neural Network “Black Box” Explained
This single neural network node accepts two input values, which in this case, is X1 and X2 of the train data. These two features are multiplied by corresponding “weights” that play a key role in classification. The products are subsequently added together, as shown below:
What happens next? The sum is put through an activation function, which is arbitarily chosen by the data scientist based on the neural network. A popular activation function is the ReLu activation function, but for the perceptron, we will use the step function. If the sum is negative ( < 0), the value put through the step function is -1; for 0, the value is 0; for positive values ( > 0) the value is 1.
If you noticed during the EDA process, the known binary labels (Y) are -1 and 1, so the step function is convenient for predicting the labels of our data. If you take a hard look at the perceptron node model, the weights play a significant role regarding determining label accuracy. Certainly it is unlikely that the weights are initialized to the ideal weight values that yield correct classification. For this function, the weights are initialized to 0, and the function “learns” from the data to gradually correct the weight values until linear separability is obtained.
It is time to dive into the code of the perceptron implementation. Define a global vector to store the prediction labels. Inside the perceptron function, initialize the two weight values to 0. The bias is also initialized to 0. The bias accounts for the distance of the separator line from the origin.
The function uses a while loop to iteratively repeat the fine tuning of the weights and bias values until label classification reaches the desired accuracy threshold, which I set for my function to 94% accuracy. This step is important; notice with the first plot “Train Dataset with Known Labels” there are outliers on both sides of the linear separator. If the perceptron function was set to predict with 100% accuracy, the function would never reach convergence, and the while loop would repeat infinitely. To circumvent this issue, the function is set for 94% accuracy (an A for a majority of college level classes).
Here is the while loop description in R code:
To close the function, establish a list of several key attributes for the return value. This function returns a list containing the corrected weights, bias, accuracy, and epochs. When the function is instantiated, these values may be called using the ‘$’ R operator.