Random Forests Explained
Random Forests is an ensemble algorithm, built out of decision trees (or voters) which:
- Have been shown the characteristics of the data.
- Have been shown part of the data.
Or, if you want more technical terms:
- have been shown the features.
- have been shown subsets of the data and also have seen at most all the samples.
Let’s make it simpler. Imagine you have to decide how to vote in the next election. You are one of the voters.
- you have been shown the characteristics of those candidates – i.e. their abilities to talk to the press, talk to other dignitaries, etc.
- you have been shown a number of events about both candidates, could be only some, could be all, but probably you are not always able to keep up with everything, you need time to sleep, go to the gym, read blog posts :-).
Based on the above you are now likely operating like a decision tree would. So are all the other people who are voting with you. Some of them have seen different events than you have in which the candidates showed they had more or less of a certain ability.
Collectively, you will select the candidate which you think is right.
If the example above was too polarising because you lived in UK around Brexit, let’s consider a situation where you are trying to train an algorithm to detect cats or dogs.
Imagine you are in a group of people who have to classify cats vs dogs from images.
vs
1. Random parts of images would be shown to you together with the type of animal. For example, part of the image of a cat would be shown to you. Maybe just the ear.
And you would be told it’s a cat.
Another part of an image with a dog would also be shown to you. Maybe the tail, and you’d be told it’s a dog.
2. More randomly selected images would be shown to you, it could even be all of them.
Now you are trained. You have formed a mental picture of how a cat would look like and how a dog would look like. So have the other people in the group.
The group is a random forest and the decision trees are all the people in the group.When it is time to make predictions, you will vote based on what you each have learned.
Let’s hope you don’t get too confused when you see this: