HOW OVERTRAINING /OVERFITTING A MODEL CAN LEAD TO BAD PERFORMANCE ON TEST /UNSEEN DATA
YOU DECIDED TO CREATE A MACHINE LEARNING MODEL . THERE IS A CERTAIN DATA SET WHICH YOU DIVIDE INTO TRAINING AND TESTING PARTS .YOU HAVE DECIDED WHICH ALGORITHM TO USE AND NOW YOU FIT YOUR MACHINE LEARNING MODEL ON YOUR TRAINING DATA SET . SUPPOSE YOU USED 3 DIFFERENT MODELS NAMELY MODEL1 , MODEL2 AND MODEL3 AND FITTED THEM ALL . SUPPOSE THE ACCURACIES PROVIDED BY THE 3 MODELS ARE AS FOLLOWS :
MODEL NAME | ACCURACY |
MODEL1 | 62% |
MODEL2 | 94% |
MODEL3 | 99.99% |
NOW , BASED ON THE ABOVE DATA WHICH MODEL WOULD YOU THINK WOULD BE THE BEST FOR YOUR USE . AT FIRST SIGHT ONE MAY SUGGEST SELECTING THE MODEL WITH 99.99% ACCURACY SHOULD BE THE BEST . BUT IT IS NOT SO . SUCH MODELS, THAT ARE NEAR TO 100% ACCURATE DO NOT PERFORM GOOD ON TEST DATA .
OVERFITTING IS THE CONDITION WHERE THE MODEL COMPLETELY ” FITS” ON THE DATA AND PROVIDES NEAR PERFECT ACCURACY ON THE TRAINING DATA SET .
SIMILARLY WHEN THE MODEL IS OF LOW ACCURACY IT IS TERMED AS UNDERFITTING . ACCORDING TO THE ABOVE EXAMPLE MODEL1 IS UNDERFITTING , MODEL3 IS OVERFITTING AND MODEL2 IS THE ONE THAT IS BEST FOR USE . LETS SEE WHAT THIS MEANS USING A REAL DATA SET

SO YOU CAN SEE THE DIFFERENT TYPES OF MODELS THAT FIT ON THE TRAINING DATA SET NOW SUPPOSE WE TEST THE MODEL ON A NEW TEST POINT . THE TEST POINT IS DENOTED BY RED

NOW , IF YOU COMPUTE THE ERROR FOR THE TEST DATA IN PREDICTION , THAT IS MOD OF DISTANCE OF RED POINTS Y COORDINATE – Y COORDINATE OF THE FITTED MODEL AT THAT POINT , CLEARLY YOU CAN SEE IT THAT THE OVERFITTED MODEL PRODUCES A GREATER ERROR WHILE THE OTHER MODEL GIVES LESS ERROR . THIS IS THE REASON BEHIND NOT SELECTING TOO TRAINED MODELS . SO HOW DOES AN ALGORITHM INCORPORATES THIS WHILE TRAINING A MODEL? . OR HOW DO WE GET RID OF OVERTRAINING . THE SOLUTION IS REGULARISATION.
REGULARISATION REFERS TO USING THE ALGORITHM TO MINIMISE THE (OLD LOSS FUNCTION + SOME OTHER FUNCTION ) . THERE ARE MANY TYPES OF REGULARISATION APPROACHES . LASSO , RIDGE , ELASTINET TO NAME A FEW. IT HELPS AVOID OVERFITTING BY NOT ALLOWING TO LEARN TOO COMPLEX MODELS
LETS DISCUSS A FEW REGULARISATION TECHNIQUES
every machine learning / deep learning model attains regularisation differently on basis of the math behind the algorithm . Regularisation is hence any mathematical constarint on the loss function/objective function applied in order to squeeze down the “weights” .
few techniques are lasso, ridge , elastinet , L1,L2 regularisation , dropuouts in neural networks , C parameter in soft-SVMs not setting max-depth too high in DTs etc.
keep learning