Splitting nodes in decision trees (DT) when a given feature is categorical is done using concepts like entropy, Information Gain and Gini impurity.
But when the features are continuous , how does one split the nodes of the decision tree? I assume you are familiar with the concept of entropy .
Suppose that we have a training data set of n sample points . let us consider one particular feature f1 which is continuous in nature .
Approach for splitting nodes
- We need to perform splitting of nodes for all sample points .
- we sort the f1 column in ascending order .
- then taking every value in f1 as a threshold, calculate the entropy and then an Information Gain.
- we select the threshold with the most information gain and make a split.
- we then continue to do the same for leaf nodes until either max_depth is reached or min_samples required to reach is more than sample points .
Lets try to understand the above by one example :
let the following be the f1 feature column and let say its a two class classification problem:,
F1(NUMERICAL FEATURE) | TARGET VARIABLE/LABEL |
5.4 | YES |
2.8 | NO |
3.9 | NO |
8.5 | YES |
7.6 | YES |
5.9 | YES |
6.8 | NO |
WE START BY SORTING THE FEATURE VALUES IN INCREASING ORDER:
SORTED F1 | TARGET VARIABLE/LABEL |
2.8 | NO |
3.9 | NO |
5.4 | YES |
5.9 | YES |
6.8 | NO |
7.6 | YES |
8.5 | YES |
NOW WE WILL CHOOSE EACH POINT AS THRESHOLD ONE BY ONE , 2.8 , 3.9 and so on . Below we display the splitting for one point , let say 5.4.

we perform similar splittings for all the data points , and whichever gives us the max IG is our first splitting point. If you cannot recall what IG is , this image might help:
Now , for further splits , similar approach is repeated on leaf nodes .
DISADVANTAGE
There is one disadvantage of using the above stated process. The fact that if the data set you have is large , the computation requirements increases significantly. Imagine performing the above operation on millions of records and max_depth =10
Although we could handle the problem by feature binning and converting the numerical features into categorical .
More to come!
Add a Comment
You must be logged in to post a comment