Splitting nodes in decision trees (DT) when a given feature is categorical is done using concepts like entropy, Information Gain and Gini impurity.

But when the features are continuous , how does one split the nodes of the decision tree? I assume you are familiar with the concept of entropy .

Suppose that we have a training data set of n sample points . let us consider one particular feature f1 which is continuous in nature .

### Approach for splitting nodes

- We need to perform splitting of nodes for all sample points .
- we sort the f1 column in ascending order .
- then taking every value in f1 as a threshold, calculate the entropy and then an Information Gain.
- we select the threshold with the most information gain and make a split.
- we then continue to do the same for leaf nodes until either max_depth is reached or min_samples required to reach is more than sample points .

Lets try to understand the above by one example :

let the following be the f1 feature column and let say its a two class classification problem:,

F1(NUMERICAL FEATURE) | TARGET VARIABLE/LABEL |

5.4 | YES |

2.8 | NO |

3.9 | NO |

8.5 | YES |

7.6 | YES |

5.9 | YES |

6.8 | NO |

WE START BY SORTING THE FEATURE VALUES IN INCREASING ORDER:

SORTED F1 | TARGET VARIABLE/LABEL |

2.8 | NO |

3.9 | NO |

5.4 | YES |

5.9 | YES |

6.8 | NO |

7.6 | YES |

8.5 | YES |

NOW WE WILL CHOOSE EACH POINT AS THRESHOLD ONE BY ONE , 2.8 , 3.9 and so on . Below we display the splitting for one point , let say 5.4.

we perform similar splittings for all the data points , and whichever gives us the max IG is our first splitting point. If you cannot recall what IG is , this image might help:

Now , for further splits , similar approach is repeated on leaf nodes .

### DISADVANTAGE

Although we could handle the problem by feature binning and converting the numerical features into categorical .

More to come!