Machine Learning: Supervised Learning vs Unsupervised Learning
This blog is a brief discussion about supervised and unsupervised learning techniques
Let’s start with supervised learning
Before we digging into the technical part, I’ll take a simple example how a small baby learns the things.
Well, say for an example, we have shown two pictures to a baby. We told the baby that, the first picture is an apple and the second picture is a banana. While learning this two things, the baby keeps in the mind that if the color is red and the shape is circle, then it is an apple and if the color is yellow and the shape is not circle then it is a banana. That’s how baby learns. Then we showed the third picture and ask the baby to find the fruit either apple or banana. So the moment you showed the third picture, he will identify “Yeah it’s a banana :)”. Because we have already labeled the two pictures into two categories. so the baby knows what is apple and what is banana already. This is how supervised learning works.
The basic idea for the supervised learning is, your data provides the examples of situations and for each examples it specifies an outcome. Then the machine will use the training data to build the model which can predict the outcome of the new data based on the past examples.
So let’s consider a simple data set of house recently sold
Our first example house could be 3,125 sqft with 5 bedrooms and 3 baths and we might tell the algorithm that this house sold for $530,000. Next we might provide an example of 2100 sqft house with 4 bedrooms and 2 baths that sold for $460,000. Likewise 1200 sqft house with 3 bedrooms and 1.5 baths sold for $250,000.
After we trained the machine with the existing above data, we ask the machine to predict the price of another house that has 6 bedrooms and 4 baths.
The important thing about supervised learning is, it has a very specific structure shown as below
We have rows of data, each of which is an example of something we are using to train the model. Each row has a column that with a known outcome. we refer it as a ‘Label’. In the above house example Price is a label.
If the label is categorical the model is known as a “classification”
If the label is numeric, the model is known as a “regression”.
We can use below algorithms for supervised learning.
- Logistic Regression
- Model/ Ensemble
- Time series
Let’s take a baby example again to understand the unsupervised learning
We have shown a group of dogs and cats picture to the baby. Let’s say the baby hasn’t seen dogs and cats earlier. so the baby doesn’t know what are the feature of a cat and a dog. So he’s not able to categorize the dogs and the cats as supervised learning example. In the supervised learning scenario, the baby knew what are the feature of an apple and what are the features of a banana.Because we showed the pictures earlier. In this case baby doesn’t know anything. There is no labeling. So baby can’t exactly categorize which one is a cat and which one is a dog. but by looking at the picture the baby can tell 1,3,5 animals in the picture look similar and 2,4 animals in the picture look similar but I don’t know the reason and what they are. Labeling it as a dogs and cats is not possible but still we can find the pattern. So that’s known as unsupervised learning.
So in this case, training data provides “example”, but we have no specific outcomes. In simple word there is no label associated with this learning. In unsupervised learning the machine tries to find interesting patterns in the data.
Lets have a look into a data set of transaction
We have information about transaction date, customer name, account number, pin no, class, zip and amount. please note that we don’t have any specific label in this data set. For example a label indicating which of this transactions are fraud transaction and which are not. It’s not present here.
So what kind of patterns, we can discover in this data set without a label. For the time being, I have mentioned only two patterns.
Look for an example that are similar in grouping together.
So here we have two transactions, both are happened on Wednesday, using pin number for authentication, both are for the gas and both amount are less than 100 rupees.
- Anomaly detection
Look for rows, that are very unusual.
So here we have a transaction that is an unusual amount for customer Bob using pin number.
The goal of unsupervised learning is to perform discovery, find patterns and etc.
The algorithms available for the unsupervised learning are
- Anomaly detection
- Association discovery
- Training Models
Because the training data has no specific “outcome”, we cannot evaluate the output of this algorithm easily as supervised learning. As because there is no ground truth we can compare to.
So as a take of note, in unsupervised learning the data is not labelled. So you do not know the categories of data, still you can find the patterns. but in supervised learning data is labelled and you know the category.
Hope you all understand the difference between supervised and unsupervised learning :)