Splitting nodes in decision trees (DT) when a given feature is categorical is done using concepts like entropy, Information Gain and Gini impurity.
But when the features are continuous , how does one split the nodes of the decision tree? I assume you are familiar with the concept of entropy .
Suppose that we have a training data set of n sample points . let us consider one particular feature f1 which is continuous in nature .
Approach for splitting nodes
- We need to perform splitting of nodes for all sample points .
- we sort the f1 column in ascending order .
- then taking every value in f1 as a threshold, calculate the entropy and then an Information Gain.
- we select the threshold with the most information gain and make a split.
- we then continue to do the same for leaf nodes until either max_depth is reached or min_samples required to reach is more than sample points .
Lets try to understand the above by one example :
let the following be the f1 feature column and let say its a two class classification problem:,
|F1(NUMERICAL FEATURE)||TARGET VARIABLE/LABEL|
WE START BY SORTING THE FEATURE VALUES IN INCREASING ORDER:
|SORTED F1||TARGET VARIABLE/LABEL|
NOW WE WILL CHOOSE EACH POINT AS THRESHOLD ONE BY ONE , 2.8 , 3.9 and so on . Below we display the splitting for one point , let say 5.4.
we perform similar splittings for all the data points , and whichever gives us the max IG is our first splitting point. If you cannot recall what IG is , this image might help:
Now , for further splits , similar approach is repeated on leaf nodes .
Although we could handle the problem by feature binning and converting the numerical features into categorical .
More to come!