There are different variants of Naive bayes , bernoulli, and Gaussian. This article assumes you are familiar with the the basic idea behind Naive bayes and also how it works on categorical data .
Here we discuss one of the approaches used for handling continuous variables when it comes to naive bayes.
Suppose we have the following dataset , where the target variable is whether a movie will be hit or not and the feature variables are the action rating and story rating (a whole numbers between 1 to 10)
ACTION RATING (AR) | STORY RATING (SR) | HIT/FLOP |
7.2 | 5.8 | HIT |
3.4 | 6.3 | FLOP |
3.5 | 7.3 | FLOP |
8.5 | 8.0 | HIT |
6.9 | 2.8 | FLOP |
7.0 | 5.3 | HIT |
9.0 | 3.8 | HIT |
NOW LETS SUPPOSE WE HAVE A TEST POINT : ACTION RATING=7.6 , STORY RATING= 5.7 . So these are what we need to predict:
P(HIT| AR= 7.6, SR=5.7) and P(FLOP| AR=7.6, SR=5.7)
LETS START BY CONSIDERING THE FIRST PROBABILITY EXPRESSION
BUT NO SUCH POINT IS PRESENT IN THE DATA SET , SO SHOULD WE SET THIS PROBABILITY TO ZERO? AND SIMILAR WITH THE SECOND EXPRESSION? THIS WOULD MEAN THAT ANY UNSEEN POINT WOULD ALWAYS LEAD TO BOTH PROBABILITIES TURNING TO ZERO. SO HOW DO WE RESOLVE THIS ISSUE ? LETS GET THERE.
GAUSSIAN DISTRIBUTION
There are 3 expressions that are needed to be evaluated in the below expression
P(HIT| AR= 7.6, SR=5.7) = P( AR= 7.6|HIT) * P( SR= 5.7|HIT) * P(HIT)
the P(HIT) calculation is straightforward and is equal to {total number of hits/Total number of hits and flops}.
For calculating the 2 left conditional probabilities we assume that the values in the data set are sampled from a gaussian distribution with mean and variance calculated from the sample points . To recall , this is what a Gaussian distribution looks like:

Now once we have the gaussian distribution for our column feature , we can get the pdf value for any point , whether it is present in our data set or not .
IMPORTANT POINTS TO BE NOTED:
- While calculating P(HIT| AR= 7.6, SR=5.7), the gaussian distribution will be made only using data points where output =HIT
- different distributions are calculated/obtained for every column and target variable , so here there will be 4 distributions used ; whose data points are from AR FOR HIT, AR FOR FLOP , SR FOR HIT ,SR FOR FLOP
WHAT TO CHECK ? BOX COX TRANSFORM- AN IMPORTANT TOOL
For applying Naive bayes we assumed that in any feature, points will come from a GAUSSIAN DISTRIBUTION . But what if it is not the case . Following are a few explanations and points that you need to follow :
- You can always plot the dist-plot and see whether the distribution is gaussian or not .
- Before applying Gaussian Naive Bayes you can use Box-Cox transform to make the distribution normal.
- If you see that columns are varying hugely from gaussian distribution you an use different distributions , other distributions are log-normal( also box-cox with gamma=0 gives log-normal distribution) , power law etc. Below you see the general expression used in box cox distribution , you can see how gamma=0 turns it into a log distribution.
With the above points in mind you are ready to use Gaussian Naive Bayes!! You can read more about Box cox transform here :
More to come!
Add a Comment
You must be logged in to post a comment