09tmag-brad-slide-ZPNA-superJumbo

Splitting nodes in DT for continuous features(classification)

Splitting nodes in decision trees (DT) when a given feature is categorical is done using concepts like entropy, Information Gain and Gini impurity.

But when the features are continuous , how does one split the nodes of the decision tree? I assume you are familiar with the concept of entropy .

Suppose that we have a training data set of n sample points . let us consider one particular feature f1 which is continuous in nature .

Approach for splitting nodes

  1. We need to perform splitting of nodes for all sample points .
  2. we sort the f1 column in ascending order .
  3. then taking every value in f1 as a threshold, calculate the entropy and then an Information Gain.
  4. we select the threshold with the most information gain and make a split.
  5. we then continue to do the same for leaf nodes until either max_depth is reached or min_samples required to reach is more than sample points .

Lets try to understand the above by one example :

let the following be the f1 feature column and let say its a two class classification problem:,

F1(NUMERICAL FEATURE)TARGET VARIABLE/LABEL
5.4YES
2.8NO
3.9NO
8.5YES
7.6YES
5.9YES
6.8NO

WE START BY SORTING THE FEATURE VALUES IN INCREASING ORDER:

SORTED F1TARGET VARIABLE/LABEL
2.8NO
3.9NO
5.4YES
5.9YES
6.8NO
7.6YES
8.5YES
THE SORTED FEATURE COLUMN

NOW WE WILL CHOOSE EACH POINT AS THRESHOLD ONE BY ONE , 2.8 , 3.9 and so on . Below we display the splitting for one point , let say 5.4.

we perform similar splittings for all the data points , and whichever gives us the max IG is our first splitting point. If you cannot recall what IG is , this image might help:

ref: Quora

Now , for further splits , similar approach is repeated on leaf nodes .

DISADVANTAGE

There is one disadvantage of using the above stated process. The fact that if the data set you have is large , the computation requirements increases significantly. Imagine performing the above operation on millions of records and max_depth =10

Although we could handle the problem by feature binning and converting the numerical features into categorical .

More to come!

Add a Comment

You must be logged in to post a comment