# LINEAR REGRESSION

## WHAT IS LINEAR REGRESSION , USES IN MACHINE LEARNING ,ALGORITHMS

ONE OF THE MOST COMMON PROBLEMS WE COME ACROSS IN DAILY LIFE IS PREDICTING VALUES LIKE PRICE OF A COMMODITY , AGE OF A PERSON , NUMBER OF YEARS NEEDED TO MASTER A SKILL ETC . AS A HUMAN WHAT IS YOUR APPROACH WHEN YOU TRY TO MAKE SUCH PREDICTIONS . WHAT ARE THE PARAMETERS YOU CONSIDER . A HUMAN BRAIN HAS NO DIFFICULTY IN REALISING WHETHER A CERTAIN PROBLEM IS LINEAR OR NOT . SUPPOSE I TELL YOU THAT A CERTAIN CLOTHING COMPANY SELLS 2 CLOTHES FOR 2K BUCKS , 4 CLOTHES FOR 4K BUCKS , BUT 8 CLOTHES FOR 32K BUCKS .

IMMEDIATELY YOUR BRAIN TELLS YOU THAT SURELY THE LAST 8 CLOTHES MUST HAVE BEEN OF DIFFERENT QUALITY , OR BELONGING TO A DIFFERENT BRAND MAKING IT DIFFERENT FROM THE OTHER 2 CLOTH GROUPS . BUT IF I STATE THAT THE LAST 8 CLOTHES WERE FOR 8K BUCKS , YOUR BRAIN SIGNALS A LINEAR RELATION THAT IT CAN MAKE OUT OF THESE .

MANY DAILY LIFE PROBLEMS ARE TAGGED AS “LINEAR” OUT OF COMMON SENSE . A FEW EXAMPLES ARE :

1. PRICES OF FLATS ON THE SAME FLOOR OF AN APARTMENT WOULD BE LINEARLY PROPORTIONAL TO THE NUMBER OF ROOMS .
2. THE RENT OF A CAR WOULD BE LINEARLY PROPORTIONAL TO THE DISTANCE YOU TRAVEL .

“BY LINEARLY VARYING WE DON’T MEAN THAT WHEN PLOTTED ALL THE DATA POINTS WOULD STRICTLY PASS THROUGH A SINGLE LINE , BUT WHICH SHOWS A TREND WHERE THE GROWTH OF THE INDEPENDENT FUNCTION CAN BE VIEWED AS SOME LINEAR FUNCTION OF THE DEPENDENT VARIABLE + SOME RANDOM NOISE .”

## THE MATH

YOU MUST BE AWARE OF EUCLIDIAN DISTANCE BETWEEN A STRAIGHT LINE AND POINTS WHICH DO NOT PASS THROUGH THE SAME . OUR AIM IS TO FIND A MODEL THAT USES THE DATA THAT HAS BEEN PROVIDED TO FIND OUT PREDICTIONS ON THE INDEPENDENT VARIABLE IF A CERTAIN VALUE OF THE DEPENDANT VALUE IS PROVIDED .

PUTTING IT MATHEMATICALLY ,

FOR A GIVEN DATA SET S –>{A:B} , WHERE A IS THE INDEPENDENT VARIABLE AND B IS THE CORRESPONDING DEPENDENT ONE FIND THE BEST PAIR (M,C ) SUCH THAT THE AVERAGE OF SUM OF SQUARES OF THE DIFFERENCE IN Y COORDINATES FOR EVERY B AND THE CORRESPONDING Y ON THE THE LINE Y=MA+C IS MINIMISED. WHERE THE AVERAGE IS TAKEN OVER THE NUMBER OF POINTS .

## THE LOSS FUNCTION

NOW WE KNOW WHAT WE NEED TO MINIMIZE , THE VERY PARTICULAR QUANTITY IS TERMED AS “LOSS FUNCTION” . IT IS A MEASURE OF HOW GOOD YOUR MODEL IS FITTED TO THE TRAINING DATA . LETS SEE HOW SOME OF THE POSSIBLE ERROR FUNCTIONS THAT ARE USED LOOK LIKE :

#### WHERE et REFERS TO THE DIFFERENCE OF THE Y COORDINATE OF A CERTAIN DATA POINT AND THE PREDICTED Y VALUE FOR THE SAME , N= TOTAL NUMBER OF DATA POINTS

WE CONSIDER DISTANCES SO THAT POSITIVE AND NEGATIVE COORDINATE DIFFERENCES DO NOT CANCEL OUT . ALSO ONE ANOTHER REGULARLY USED LOSS FUNCTION IS RMLSE : (ROOT MEAN SQUARE LOGARITHMIC ERROR)

## RMS VS RMSLE

THE L IN THE RMSLE STANDS FOR “LOGARITHMIC ” AND THIS IS PREFERRED IF CERTAIN DATA POINTS HAVE AN EXPONENTIAL VARIATION , HENCE TAKING THE LOG FIRST WOULD SUBSTANTIALLY REDUCE THE EFFECT OF A POSSIBLE OUTLIER . BELOW IS A REPRESENTATION SUMMING UP HOW THE SCENARIO LOOKS LIKE . THE DATA POINTS ARE IN BLUE ,THE BEST FIT LINE PASSING THROUGH THEM , NOTE HOW YOU CAN SEE A “LINEAR RELATION ” BETWEEN THE DATA POINTS . SUCH CAPABILITY OF “SEEING ” A DATA SET’S BEHAVIOUR IS LIMITED TO HUMANS AND USING THIS INTUITION WE CHOSE TO FIND A “BEST FIT LINE ” RATHER THAN A PARABOLA OR ANY OTHER CURVE . SUCH BIAS TOWARDS A CERTAIN CLASSIFICATION DUE TO OUR CAPABILITIES IS CALLED “INDUCTIVE BIAS “

## SOME MORE MATH

SUPPOSE FOR A SET –>{X:Y} WE HAVE WE WANT TO CALCULATE THE ESTIMATED FUNCTION y(hat ) AS SOME FUNCTION OF X . (REMEMBER A HAT ON TOP OF ANY VARIABLE MEANS IT IS AN ESTIMATE , NOT THE REAL VALUE , WE ALWAYS MAKE MODELS THAT ARE TRYING TO ESTIMATE A FUNCTION WHICH IS IN THEORY UNKNOWN) IN CASE OF LINEAR REGRESSION THIS CAN BE REPRESENTED BY THE SECOND EQUATION IN THE FIGURE :

NOW SUBSTITUTING THE VALUES IN RMSE LOSS FUNCTION WE GET:

### SO GIVEN THE ABOVE EQUATION THIS IS WHAT WE TEND TO MINIMIZE NOW ,DIFFERENTIATING THIS W.R.T BETA 1 AND EQUATING IT TO ZERO, WE CAN GET THE FOLLOWING RESULTS

FOLLOWING ARE THE VALUES OF THE VARIABLES WE NEEDED TO FIND :

THERE ARE VARIOUS WAYS WE CAN OPTIMISE THE ABOVE MODEL TO AVOID OVER FITTING . REGULARISATION TECHNIQUES LIKE RIDGE REGRESSION , LASSO AND ELASTINETS (COMBINATION OF BOTH ) ARE USED WHERE WE PENALISE MODELS THAT TEND TO OVER FIT . THIS IS DONE USING DIFFERENT LOSS FUNCTIONS THAN THE ONES WE HAVE USED HERE ! THE DIFFERENCE ARISES FROM INTRODUCING ADDITIONAL TERMS IN THE ALREADY DISCUSSED LOSS FUNCTION