NVIDIA CUDA GPU

DEEP LEARNING IN GAMES -GPUs ROLE

HOW NVIDIA PREDICTED TEN YEARS BACK THE FUTURE OF AI AND HOW ITS GPUs ARE PLAYING A KEY ROLE IN DEVELOPMENT OF DEEP LEARNING IN GAMES .

YOU MUST BE FAMILIAR WITH CPU( CENTRAL PROCESSING UNIT) . AND IF YOU ARE EITHER A GAMING ENTHUSIAST OR SOMEONE WHO KEEPS HIMSELF UPDATED WITH AI ADVANCEMENTS CHANCES ARE HIGH THAT YOU HAVE HEARD OF GPU’S. THEY HAVE EMERGED AS A BOON FOR AI DEVELOPMENT . AND SURELY THEY ARE DOING THEIR PART IN GAMING /GRAPHICS ADVANCEMENTS. WITH THE INCREASING USAGE OF DEEP LEARNING IN GAMES ,WE MIGHT SEE REALLY DIFFERENT VERSIONS OF GAMES BY THE END OF THIS DECADE . SMART AND INTELLIGENT ONES !!!!

LETS ANSWER THE BASIC QUESTION FIRST

WHAT IS A GPU

GPU STANDS FOR GRAPHICS PROCESSING UNIT .THE CORE PRINCIPLE ON WHICH GPU WORKS IS ” DIVIDE AND CONQUER ” . IN TECHNICAL TERMS WE CALL IT PARALLEL COMPUTING . THIS IS WHAT MAKES A GPU SO GOOD IN WHAT IT DOES . THIS IS IN CONTRAST TO A CPU WHICH CAN DO ONLY A HANDFUL OF CALCULATIONS AT ONCE . BUT HOW DOES A GPU DOES SO .WELL , THE ANSWER SEEMS OBVIOUS!!

NUMBER OF CORES IN A GPU

IF YOU ARE READING THIS ON A LAPTOP , OR IF YOU OWN ONE , YOU MUST HAVE HEARD TERMS LIKE DUAL CORE ,QUADCORE AND OCTACORE . SO BASICALLY A CPU CONSISTS OF A FEW CORES WHICH CAN HANDLE ACTIVITIES AND SUPPORT MULTIPROCESSING . IN CONTRAST A GPU CONSISTS OF A HUNDREDS OF CORES WHICH ALLOWS IT TO PERFORM PARALLEL COMPUTING BY BREAKING THE PROBLEM INTO SUBPROBLEMS.

GPU VS CPU

SO HOW DOES THIS HELP IN TRAINING AI / DEEP LEARNING MODELS ?

I HOPE YOU ARE FAMILIAR WITH NEURAL NETWORKS. AND IF NOT ,SUGGEST READING THIS AND THEN COMING BACK TO UNDERSTAND BETTER . ITS EASY TO SEE HOW THE NUMBER OF CALCULATIONS NEEDED PER CYCLE OF BACKPROPOGATION /FEEDFORWARD INCREASES DRASTICALLY WITH THE INCREASING COMPLEXITY OF THE NETWORK. ( MORE NEURONS , HIDDEN LAYERS AND SO) . THE NUMBER OF CALCULATIONS SOON RISE TO MILLIONS . TRAINING SUCH A MODEL ON A CPU WOULD BE A TEST OF PATIENCE . GPUs , BECAUSE OF THEIR ABILITY TO BREAK UP THE PROBLEMS USES PARALLEL COMPUTING TO ITS AID WHICH SPEEDS UP THE PROCESS EXPONENTIALLY .

GPUs IN GAMING , COMPUTER VISION ,SUPERCOMPUTING

I THINK BY NOW YOU CAN REALISE HOW GPUs WOULD OUTPERFORM CPUs IN THESE APPLICATIONS . SURELY BY THE END OF NEXT DECADE THE AI WOULD EFFECT GAMES , HEALTHCARE AND MANY DOMAINS OF COMPUTER SCIENCE . CURRENTLY NVIDIA IS LEADING THE GPU PRODUCTION MARKET . NVIDIA GRAPHICS CARDS ARE FAIRLY POPULAR AMONG GAMERS . CURRENTLY THE GAMES THAT CLAIM TO USE ” AI ” ARE IN REALITY USING ALGORITHMS FOR OPTIMISING THE ENEMIES BEHAVIOUR /APPROACH . THEY WORK ON TECHNOLOGIES LIKE “FINITE STATE MACHINES” . BUT SOON WE EXPECT THINGS TO CHANGE . TILL THEN , GAME ON!!!!

ANN RNN CNN

LINEAR REGRESSION

WHAT IS LINEAR REGRESSION , USES IN MACHINE LEARNING ,ALGORITHMS

ONE OF THE MOST COMMON PROBLEMS WE COME ACROSS IN DAILY LIFE IS PREDICTING VALUES LIKE PRICE OF A COMMODITY , AGE OF A PERSON , NUMBER OF YEARS NEEDED TO MASTER A SKILL ETC . AS A HUMAN WHAT IS YOUR APPROACH WHEN YOU TRY TO MAKE SUCH PREDICTIONS . WHAT ARE THE PARAMETERS YOU CONSIDER . A HUMAN BRAIN HAS NO DIFFICULTY IN REALISING WHETHER A CERTAIN PROBLEM IS LINEAR OR NOT . SUPPOSE I TELL YOU THAT A CERTAIN CLOTHING COMPANY SELLS 2 CLOTHES FOR 2K BUCKS , 4 CLOTHES FOR 4K BUCKS , BUT 8 CLOTHES FOR 32K BUCKS .

IMMEDIATELY YOUR BRAIN TELLS YOU THAT SURELY THE LAST 8 CLOTHES MUST HAVE BEEN OF DIFFERENT QUALITY , OR BELONGING TO A DIFFERENT BRAND MAKING IT DIFFERENT FROM THE OTHER 2 CLOTH GROUPS . BUT IF I STATE THAT THE LAST 8 CLOTHES WERE FOR 8K BUCKS , YOUR BRAIN SIGNALS A LINEAR RELATION THAT IT CAN MAKE OUT OF THESE .

MANY DAILY LIFE PROBLEMS ARE TAGGED AS “LINEAR” OUT OF COMMON SENSE . A FEW EXAMPLES ARE :

  1. PRICES OF FLATS ON THE SAME FLOOR OF AN APARTMENT WOULD BE LINEARLY PROPORTIONAL TO THE NUMBER OF ROOMS .
  2. THE RENT OF A CAR WOULD BE LINEARLY PROPORTIONAL TO THE DISTANCE YOU TRAVEL .

“BY LINEARLY VARYING WE DON’T MEAN THAT WHEN PLOTTED ALL THE DATA POINTS WOULD STRICTLY PASS THROUGH A SINGLE LINE , BUT WHICH SHOWS A TREND WHERE THE GROWTH OF THE INDEPENDENT FUNCTION CAN BE VIEWED AS SOME LINEAR FUNCTION OF THE DEPENDENT VARIABLE + SOME RANDOM NOISE .”

THE MATH

YOU MUST BE AWARE OF EUCLIDIAN DISTANCE BETWEEN A STRAIGHT LINE AND POINTS WHICH DO NOT PASS THROUGH THE SAME . OUR AIM IS TO FIND A MODEL THAT USES THE DATA THAT HAS BEEN PROVIDED TO FIND OUT PREDICTIONS ON THE INDEPENDENT VARIABLE IF A CERTAIN VALUE OF THE DEPENDANT VALUE IS PROVIDED .

PUTTING IT MATHEMATICALLY ,

FOR A GIVEN DATA SET S –>{A:B} , WHERE A IS THE INDEPENDENT VARIABLE AND B IS THE CORRESPONDING DEPENDENT ONE FIND THE BEST PAIR (M,C ) SUCH THAT THE AVERAGE OF SUM OF SQUARES OF THE DIFFERENCE IN Y COORDINATES FOR EVERY B AND THE CORRESPONDING Y ON THE THE LINE Y=MA+C IS MINIMISED. WHERE THE AVERAGE IS TAKEN OVER THE NUMBER OF POINTS .

THE LOSS FUNCTION

NOW WE KNOW WHAT WE NEED TO MINIMIZE , THE VERY PARTICULAR QUANTITY IS TERMED AS “LOSS FUNCTION” . IT IS A MEASURE OF HOW GOOD YOUR MODEL IS FITTED TO THE TRAINING DATA . LETS SEE HOW SOME OF THE POSSIBLE ERROR FUNCTIONS THAT ARE USED LOOK LIKE :

LOSS FUNCTION IN MACHINE LEARNING

WHERE et REFERS TO THE DIFFERENCE OF THE Y COORDINATE OF A CERTAIN DATA POINT AND THE PREDICTED Y VALUE FOR THE SAME , N= TOTAL NUMBER OF DATA POINTS

WE CONSIDER DISTANCES SO THAT POSITIVE AND NEGATIVE COORDINATE DIFFERENCES DO NOT CANCEL OUT . ALSO ONE ANOTHER REGULARLY USED LOSS FUNCTION IS RMLSE : (ROOT MEAN SQUARE LOGARITHMIC ERROR)

RMSLE

WHERE Yi ,Y hat ARE THE ACTUAL AND THE PREDICTED VALUES

RMS VS RMSLE

THE L IN THE RMSLE STANDS FOR “LOGARITHMIC ” AND THIS IS PREFERRED IF CERTAIN DATA POINTS HAVE AN EXPONENTIAL VARIATION , HENCE TAKING THE LOG FIRST WOULD SUBSTANTIALLY REDUCE THE EFFECT OF A POSSIBLE OUTLIER . BELOW IS A REPRESENTATION SUMMING UP HOW THE SCENARIO LOOKS LIKE . THE DATA POINTS ARE IN BLUE ,THE BEST FIT LINE PASSING THROUGH THEM , NOTE HOW YOU CAN SEE A “LINEAR RELATION ” BETWEEN THE DATA POINTS . SUCH CAPABILITY OF “SEEING ” A DATA SET’S BEHAVIOUR IS LIMITED TO HUMANS AND USING THIS INTUITION WE CHOSE TO FIND A “BEST FIT LINE ” RATHER THAN A PARABOLA OR ANY OTHER CURVE . SUCH BIAS TOWARDS A CERTAIN CLASSIFICATION DUE TO OUR CAPABILITIES IS CALLED “INDUCTIVE BIAS “

LINEAR REGRESSION

SOME MORE MATH

SUPPOSE FOR A SET –>{X:Y} WE HAVE WE WANT TO CALCULATE THE ESTIMATED FUNCTION y(hat ) AS SOME FUNCTION OF X . (REMEMBER A HAT ON TOP OF ANY VARIABLE MEANS IT IS AN ESTIMATE , NOT THE REAL VALUE , WE ALWAYS MAKE MODELS THAT ARE TRYING TO ESTIMATE A FUNCTION WHICH IS IN THEORY UNKNOWN) IN CASE OF LINEAR REGRESSION THIS CAN BE REPRESENTED BY THE SECOND EQUATION IN THE FIGURE :

NOW SUBSTITUTING THE VALUES IN RMSE LOSS FUNCTION WE GET:

LINEAR REG DIFFERENTIATION

SO GIVEN THE ABOVE EQUATION THIS IS WHAT WE TEND TO MINIMIZE NOW ,DIFFERENTIATING THIS W.R.T BETA 1 AND EQUATING IT TO ZERO, WE CAN GET THE FOLLOWING RESULTS

FOLLOWING ARE THE VALUES OF THE VARIABLES WE NEEDED TO FIND :

THERE ARE VARIOUS WAYS WE CAN OPTIMISE THE ABOVE MODEL TO AVOID OVER FITTING . REGULARISATION TECHNIQUES LIKE RIDGE REGRESSION , LASSO AND ELASTINETS (COMBINATION OF BOTH ) ARE USED WHERE WE PENALISE MODELS THAT TEND TO OVER FIT . THIS IS DONE USING DIFFERENT LOSS FUNCTIONS THAN THE ONES WE HAVE USED HERE ! THE DIFFERENCE ARISES FROM INTRODUCING ADDITIONAL TERMS IN THE ALREADY DISCUSSED LOSS FUNCTION

DO LEAVE COMMENTS /QUERIES …………………..