1c1cec8f6953f5900017c1a72921f192

Gaussian NAIVE BAYES, continuous features

There are different variants of Naive bayes , bernoulli, and Gaussian. This article assumes you are familiar with the the basic idea behind Naive bayes and also how it works on categorical data .

Here we discuss one of the approaches used for handling continuous variables when it comes to naive bayes.

Suppose we have the following dataset , where the target variable is whether a movie will be hit or not and the feature variables are the action rating and story rating (a whole numbers between 1 to 10)

ACTION RATING (AR)STORY RATING (SR)HIT/FLOP
7.25.8HIT
3.46.3FLOP
3.57.3FLOP
8.58.0HIT
6.92.8FLOP
7.05.3HIT
9.03.8HIT

NOW LETS SUPPOSE WE HAVE A TEST POINT : ACTION RATING=7.6 , STORY RATING= 5.7 . So these are what we need to predict:

P(HIT| AR= 7.6, SR=5.7) and P(FLOP| AR=7.6, SR=5.7)

LETS START BY CONSIDERING THE FIRST PROBABILITY EXPRESSION

BUT NO SUCH POINT IS PRESENT IN THE DATA SET , SO SHOULD WE SET THIS PROBABILITY TO ZERO? AND SIMILAR WITH THE SECOND EXPRESSION? THIS WOULD MEAN THAT ANY UNSEEN POINT WOULD ALWAYS LEAD TO BOTH PROBABILITIES TURNING TO ZERO. SO HOW DO WE RESOLVE THIS ISSUE ? LETS GET THERE.

GAUSSIAN DISTRIBUTION

There are 3 expressions that are needed to be evaluated in the below expression

P(HIT| AR= 7.6, SR=5.7) = P( AR= 7.6|HIT) * P( SR= 5.7|HIT) * P(HIT)

the P(HIT) calculation is straightforward and is equal to {total number of hits/Total number of hits and flops}.

For calculating the 2 left conditional probabilities we assume that the values in the data set are sampled from a gaussian distribution with mean and variance calculated from the sample points . To recall , this is what a Gaussian distribution looks like:

Now once we have the gaussian distribution for our column feature , we can get the pdf value for any point , whether it is present in our data set or not .

IMPORTANT POINTS TO BE NOTED:

  1. While calculating P(HIT| AR= 7.6, SR=5.7), the gaussian distribution will be made only using data points where output =HIT
  2. different distributions are calculated/obtained for every column and target variable , so here there will be 4 distributions used ; whose data points are from AR FOR HIT, AR FOR FLOP , SR FOR HIT ,SR FOR FLOP

WHAT TO CHECK ? BOX COX TRANSFORM- AN IMPORTANT TOOL

For applying Naive bayes we assumed that in any feature, points will come from a GAUSSIAN DISTRIBUTION . But what if it is not the case . Following are a few explanations and points that you need to follow :

  1. You can always plot the dist-plot and see whether the distribution is gaussian or not .
  2. Before applying Gaussian Naive Bayes you can use Box-Cox transform to make the distribution normal.
  3. If you see that columns are varying hugely from gaussian distribution you an use different distributions , other distributions are log-normal( also box-cox with gamma=0 gives log-normal distribution) , power law etc. Below you see the general expression used in box cox distribution , you can see how gamma=0 turns it into a log distribution.

With the above points in mind you are ready to use Gaussian Naive Bayes!! You can read more about Box cox transform here :

More to come!

photo-1605773433673-f46852ff81c9

THE WORD2VEC MODEL (CBOW /SKIP-GRAM)

Understanding Word embeddings ,Semantics , and Word2vec model methods like skipgram and CBOW.

WHY embeddings?

remember machines can only operate on numbers . approaches like Bag of words ,tfidf provide a solution for converting texts to numbers , but not without many drawbacks . here are a few:

  1. as the dimensionality(vocab-size) increases , it becomes memory inefficient.
  2. Sine all the word representations are orthogonal , there is no semantic relation between words . That is the amount of separation between words like “apple” and “orange” is same as “apple” and “door”.

Therefore the semantic relation between words, which is really valuable information , is lost. We must come up with techniques which along with converting texts to numbers ,keeps in mind the space efficiency and also preserves the semantic relation among words .

EMBEDDINGS ARE USED IN MANY DEEP LEARNING NLP TASKS LIKE SEQ2SEQ MODELS, TRANSFORMERS etc.

WORD2vec

before understanding how it does the job we discussed above , lets see what it actually is composed of .

Word2vec is 2 layer neural network that helps to generate word embeddings from a given text corpus in a given dimension

So essentially word2vec helps us create numerical mapping of words in vector space , remember , this dimensionality can be less than the vocab -size.

so what is the wor2vec model try to do:

it tries to make similar embeddings for words that occur in similar context.

it tries to achieve the above using 2 approaches algorithms :

  1. CBOW( continuous bag of words)
  2. Skip-gram

CBOW and SKIP-GRAM

CBOW :A MODEL THAT TRIES TO PREDICT THE TARGET WORD FROM THE GIVEN CONTEXT WORDS

CBOW ALGORITHM REPRESENTATION

SKIPGRAM: A MODEL THAT TRIES TO PREDICT THE CONTEXT WORDS FROM THE GIVEN TARGET WORD

here is how we define a loss function : p(wt+j|wt) for skipgram ; p(wt|wt+j) for cbow:

lets break it down. Suppose we define “context” to be a window of m words , we iterate over all such windows from the beginning of our sentence and try to maximize the conditional probability of a certain words occurring given a context word or vice versa.

THE WORKING

  1. LETS SAY THAT YOUR VOCAB SIZE IS V .
  2. WE PERFORM ONE-HOT ENCODING FOR ALL WORDS .
  3. WE CHOOSE OUR WINDOW SIZE AS C.
  4. WE DECIDE THE DIMENSON OF THE EMBEDDING DIMENSION , LET SAY N .OUR HIDDEN LAYER WILL HAVE N NEURONS.
  5. THE OUTPUT IS A SOFTMAX LAYER OF DIMENSION V.
  6. WITH THESE THINGS WE CREATE THE BELOW ARCHITECTURE.
cbow word2vec
CBOW WORD2VEC ACHITECTURE

WE PERFORM THE TRAINING UPDATE THE WEIGHTS , AFTER TRAINING THE WEIGHT BETWEEN HIDDEN LAYER AND OUTPUT SOFT-MAX IS OUR WORD EMBEDDING MATRIX . NOTE ITS DIMENSION IS (OUR SELECTED LOWER DIMENSION)*(VOCAB-SIZE) . HENCE IT CONTAINS THE N-DIMENSIONAL VECTOR SPACE REPRESENTATION OF ALL THE WORDS IN THE VOCABULARY.

THE WORKING OF SKIP GRAM IS JUST THE OPPOSSITE . THERE WE USE ONE WORD TO PREDICT THE CONTEXT WORDS.

SKIPGRAM WORD2VEC ACHITECTURE

NOW THE FINAL STEP, TO OBTAIN WORD VECTOR OF ANY WORD, WE TAKLE THE EMBEDDING MATRIX THAT WE TRAINED AND MULTILY IT BY ITS ONE HOT-ENCODING REPRESENTATION.

THATS IT!

ITS ALWAYS HELPFUL TO REMEMBER THE LOSS FUNCTION MENTIONED ABOVE IN INTERVIEWS!

fdfef4c9f9aea4af24975c9a394ca3bf

PROBLEMS IN ENCODER- DECODER MODELS

IF YOU KNOW THE BASIC ARCHITECTURE OF AN ENCODER DECODER MODEL , YOU WILL RECOGNIZE TE PICTURE BELOW :

WHERE ” C” IS THE CONTEXT VECTOR .

IN SHORT THIS IS HOW IT WORKS :

THE ENCODER COMPRESSES ALL THE INFORMATION OF THE INPUT SENTENCE INTO ONE VECTOR(C) WHICH IS PASSED TO THE DECODER

WHICH USES IT TO DECODE THE OUTPUT .PRETTY SIMPLE!

NOW LETS TRY TO ANALYZE WITH AN ANALOGY HOW THIS STUFF WORKS . WE BUILD THE INTUITION FIRST , THEN WQE CAN JUMP INTO THE MATHEMATICS .

SUPPOSE WE ARE PLAYING A GAME , THERE ARE THREE CHILDREN , NAMELY ADAM , DANIEL AND VESPER ( I HOPE YOU GOT THE JAMES BOND REFERENCE! LOL) . THE GAME IS THAT DANIEL TELLS A STORY TO ADAM WHO IN TURN HAS TO EXPLAIN THE SAME STORY TO VESPER .

BUT THERE IS A CONDITION ! ADAM HAS A FIXED AMOUNT OF TIME , LET SAY T1 , ALLOTED FOR EVERY STORY .

NOW SUPPOSE THAT T1 =2 MINUTES .

THE FIRST STORY THAT DANIEL TELLS IS ABOUT HOW HIS WEEKEND WAS. ADAM COULD WELL EXPLAIN THE SUMMARY TO VESPER IN 2 MINUTES . IT WAS EASY FOR HIM .NEXT DANIEL TOLD HIM THE STORY OF HOW HIS LAST MONTH WAS . ADAM SOMEHOW STILL MANAGED .

NOW YOU SEE WHEN THE TROUBLE BEGINS . SUPPOSE DANIEL STATES A STORY THAT IS A SUMMARY OF HIS LAST 2 YEARS OF LIFE . CAOULD ADAM EVER JUSTIFY IT IN 2 MINUTES !!!! NEVER!!!

DANIEL IS THE ENCODER , “ADAM” IS OUR CONTEXT VECTOR , AND VESPER IS OUR DECODER . SHE TRIES TO FIGURE OUT WHAT DANIEL EXACTLY MEANT BY JUST THE SUMMARY THAT “THE CONTEXT VECTOR” FRIEND PROVIDED . YOU CAN SEE THE PROBLEM LONG “STORIES CAN LEAD TO . THIS IS ONE OF THE MOST BASIC PROBLEMS FACED BY A SIMPLE ENCODER DECODER MODEL . MATHEMATICALLY SPEAKING A SIMPLE MODEL AS ABOVE CANNOT REMEMBER LONG TERM RELATIONS .

MORE PRECISELY THE GRADIENTS ARE NOT ABLE TO SUSTAIN INFORMATION OVER THAT LONG RANGES . THE GRADIENTS SEEM TO “VANISH”.

ONE OF THE BETTER VERSIONS OF AN ENCODER DECODER ( ILL REFER IT AS ED FROM NOW, ITS A LONG WORD DUDE ) ARE “BIDIRECTIONAL MODELS ” . THE CORE IDEA IS THAT DURING TRANSLATING ANY SENTENCE WE DO NOT NECESSARILY GO IN ONE DIRECTION . SOMETIMES THE PROPER TRANSLATION OF A PARTICULAR PART OF THE SENTENCE MAY REQUIRE WORDS THAT OCCUR LATER . HAVE A LOOK :

AS YOU CAN SEE IN CONTRAST TO WHAT WAS HAPPENING EARLIER WE “MOVE ” IN BOTH DIRECTIONS . LET ME MAKE A POINT VERY CLEAR , WHEN WE SAY “MOVE” OR DRAW ANY NETWORK LIKE THE FIRST DIAGRAM , THERE ARE NOT MULTIPLE RNNS . WHAT YOU SEE IS THE TIME AXIS REPRESENTATION OF THE FORWARD PROP. EVEN ABOVE , THERE ARE ONLY 2 , YES ONLY 2 , BUT 4 TIME STEPS , THAT IS WHY YOU SEE 8 BLOCKS !! AND THE SAME FOR BACKPROPAGATION .

SO THIS APPROACH CAN MAKE THE RESULTS A LITTLE BETTER.

BUT AS THIS GUY SAID :

I’ M SURE YOU HEARD ABOUT LSTMS AND GRUS (YES YES FANCY WORDS )

THE THING THEY HELP IS TO SUSTAIN IMPORTANT INFROMATION OVER LONGER RANGES( HENCE THE NAME LONG SHORT TERM MEMOMRY UNITS) BUT THIS POST IS NOT ABOUT LSTMS .( NOR ABOUT THE GATES OF THE LSTM NETWORK) . THE MATHEMATICS OF THE LSTM NETWORK SEEMS A BIT OVERWHELMING TO SOME NOT BECAUSE ITS SOME WIZARDLY MATHEMATICS GOING OUT , BUT RATHER THAT HOW IS IT IMITATING A HUMAN MEMORY . LETS GET THE INTUITION OF AN LSTM .

LETS START WITH THE FORGET GATES AND CELL STATE .

UP FOR A STORY/ SUPPOSE YOU HAVE A FRIEND WHO TALKS WAY TOO MUCH , JUST WAYYY TOO MUCH . HE COMES TO YOUR HOME AND YOU ARE ON YOUR LAPTOP , AND HE STARTS TO SPEAK . HE IS SPEAKING FROM THE LAST HALF AN HOUR AND YOU DID’T CARE . SUDDENLY YOU HEAR YOUR CRUSH’S NAME POP UP , (LOL) , NOW THATS SOMETHING IMPORTANT RIGHT , SO YOUR MIND TAKES IT AS AN INPUT AND NOW EVERY TIME YOU HEAR THE WORD “SHE DID ” , “SHE SAID ” YOU TRY TO CONNECT THE DOTS AND YOU PAY ATTENTION TO THE PARTICULAR POINTS . OTHER (WHATEVER BLA BLA HE WAS SAYING IS LOST (FORGET GATE) (MATHEMATICALLY IT IS A VECTOR WHICH TELLS YOU THE IMPORTANCE OF EVERY FEATURE TO BE REMEMBERED ) . SEE THE ABOVE EXAMPLE IS EASY TO EXPLAIN ALL THE GATES IN ONE LSTM .

“WHEN TO FORGET , WHEN TO REMEMBER , HOW TO PROCESS THE NEXT INFO( “CELL STATE “) , WHAT YOU HAVE MADE OUT TILL NOW”OUTPUT.

NOW GO HAVE A LOOK AT THE MATH . YOU LEARNT HOW SOME INFORMATION WAS RELEVANT TO YOUR CRUSH BY “DATA ” RGHT? . THAT IS HOW THESE LITTLE NETWORKS DO . ILL MAKE A DIFFERENT POST FOR DETAILED MATHEMATICS .

NEXT WE WILL CONSIDER TRANSFORMERS .

TILL THEN HAVE A MARTINI , SHAKEN NOT STIRRED .

9eed84ac620994a5eba9f173a44236ff

K-means failure cases for interviews

Here we discuss the failure cases of the K-means algorithm .

As we know , k-means is one of the most commonly asked algorithm in interviews its advisable to know its failure cases along with its working principle.

K means is a very simple and straight forward algorithm , and hence its not surprising to hear that it will can run into complications in real life scenarios . lets get to the cases were we get absurd results from this algorithm .

DIFFERENT CLUSTER SIZES

In k means , the first step is to initialize the centroids , (which are done randomly in kmeans and probabilistically in kmeans++)

now look at this image, consisting of 3 clusters, the left side shows the original ,( although in us-supervised learning there are no “original tags” or “labels” but here we consider pre labelled data in order to have an understanding that how k means can deviate from real life truth)

you can see how because k means tends to make clusters of equal sizes , it is likely to perform weakly on differently sized clusters.

DIFFERENT DENSITY CLUSTERS

In real life data sets its very common to have class imbalances, k means fails to perform efficiently if the different clusters are of different densities. look at the image below.

notice that due to difference in densities, the clusters formed are very different from the original points.

NON GLOBULAR DISTRIBUTIONS

K means algorithm is all about updating centroids based on the mean distances and the approach never considers that whether the clusters can have an intrinsic pattern ,as you saw before k means tend to create clusters that are of same sizes around centroids and hence a more “symmetric” approach around each centroid is there. You can see how terribly the k means algorithm fails in the below cluster shape:

PRESENCE OF OUTLIERS

Outliers are a common problem with algorithms that deal with mean/average of data points . Outliers lead to problems in K means too . But how are outliers handled in data sets which are not labelled ?

There are various methods that are used to handle outliers, like RANSAC , LOF(Local outlier factor) and more .

DIFFERENT RESULTS EVERYTIME

Since the very first step, that is initialisation of centroids is done randomly in k means , the obtained clusters are always different based on the choice of centroids. Hence running the algorithm multiple times will lead to different outcomes .

Although one thing is assured! K means will always converge. Hence it can be mathematically proven that there exists a certain iteration i , such that centroids will change negligibly at the (i+1)th iteration .

more to come!

battle, chess, checkmate

REINFORCEMENT LEARNING

THE WHAT AND HOW OF REINFORCEMENT LEARNING

APART FROM SUPERVISED AND UNSUPERVISED MACHINE LEARNING , THERE IS A THIRD KIND OF MACHINE LEARNING APPROACH , WIDELY KNOWN AS REINFORCEMENT LEARNING . SO HOW IS THIS APPROACH DIFFERENT FROM THE FORMER TWO . HERE IN THIS ARTICLE WE DISCUSS THE OVER VIEW OF REINFORCEMENT LEARNING ,THE IMPORTANT TOPICS AND THE BASIC APPROACH . THE BASIC DIFFERENCE BETWEEN THE SUPERVISED/UNSUPERVISED AND THE REINFORCEMENT LEARNING APPROACH IS THE ABSENSE OF A PREDEFINED /AVAILABLE DATA SET ON WHICH WE JUST HAVE TO FIT OUR MODEL .

THERE ARE NO CLEAR INSTRUCTIONS /ALGORITHMS THAT STATE EXACTLY WHAT EACH ITERATION HAS TO DO. TO MAKE THIS POINT CLEAR LETS TRY TO VISUALIZE A SINGLE REAL LIFE PROBLEM WITH A SUPERVISED APPROACH AND A REINFORCEMENT LEARNING APPROACH . SUPPOSE A CHILD WANTS TO LEARN HOW TO RIDE A BICYCLE . HOW WOULD A SUPERVISED LEARNING APPROACH LOOK LIKE :

  1. TELLING HIM THE EXACT AMOUNT OF FORCE HE HAS TO PUT ON THE PEDAL FOR EVERY POSITION OF THE PEDAL.
  2. WATCHING THOUSANDS OF VIDEOS OF PEOPLE CYCLING AND TRYING TO LEARN FROM IT .
  3. A LOSS FUNCTION THAT SHOWS HOW MUCH THE CENTRE OF GRAVITY IS AWAY FROM THE IDEAL POSITION .

I’M SURE YOU DIDN’T LEARN CYCLING THIS WAY . (AND NO LUCKY CHILD EVER WILL!!!) . IN FACT THE ONLY THING THAT COMES INTO THE MIND AS A CHILD WHILE LEARNING TO CYCLE IS ” I DON’T HAVE TO FALL ” . AND HOW DO YOU LEARN ?

  1. YOU BEGIN BY KNOWING NOTHING EXCEPT THE FACT THAT WHAT YOU HAVE TO DO . IT INCLUDES THE GOAL AND THE FACT THAT PEDDLING AND BALANCING ARE THE ONLY THINGS YOU CAN DO .
  2. TRYING A FEW TIMES , MAYBE FALLING .
  3. AFTER FALLING YOU AVOID WHAT YOU DID THAT RESULTED IN A FALL .
  4. AND AFTER TRYING A NUMBER OF TIMES YOU SUCCEED .

THE BASIC DIFFERENCE

ISN’T IT STRANGE THAT EVEN AFTER A CHILD MASTERS HOW TO RIDE HIS CYCLE HE HAS NO IDEA ABOUT THE RULES THAT GOVERN THE MECHANICS /DYNAMICS OF THE CYCLE . YET HE LEARNT HOW TO BALANCE THE FORCES, THE TORQUES AND HOW TO SHIFT HIS CENTRE OF GRAVITY WHILE MAKING A TURN. THIS IS THE PRINCIPLE BEHIND REINFORCEMENT LEARNING . RATHER THAN HAVING A PREDEFINED SET OF RULES IT IS RATHER ” EXPLORATORY ” .

THE “AGENT ” – THE ONE WHO IS TRYING TO REACH THE GOAL , TRIES ALL THE “ACTIONS” THAT HE CAN PERFORM IN THE “ENVIRONMENT ” AVAILABLE TO HIM , THE ENVIRONMENT MIGHT BE COMPLETELY “KNOWN” OR HE MIGHT FACE “UNKNOWN ” REGIONS . LATER EVEN AFTER HE LEARNS A WAY THAT TAKES HIM TO THE GOAL HE TRIES TO “EXPLORE ” NEW WAYS TO FIND IF BETTER WAYS EXIST . LOOK AT THESE TWO CHARACTERISTICS OF REINFORCEMENT LEARNING :

THIS “EXPLORATION ” IS ONE OF THE CHARACTERISTIC DIFFERENCES BETWEEN REINFORCEMENT LEARNING AND OTHER FORMS OF MACHINE LEARNING . THERE IS TRADE OFF BETWEEN EXPLOITATION(GREEDY APPROACH) OR EXPLORATION .

IN REINFORCEMENT LEARNING EXPLICITLY CONSIDERS THE WHOLE PROBLEM OF A GOAL DIRECTED AGENT , IN CONTRAST TO OTHER LEARNING APPROACHES WHICH TEND TO DIVIDE THE PROBLEM INTO SUB PROBLEMS AND TRY TO OPTIMISE THEM WITHOUT WORRYING ABOUT THE LARGER PICTURE .

TOWARDS THE MATH

WE START BY ADDRESSING THE BASIC TERMINOLOGIES THAT WE WILL USE :

POLICY

A POLICY IS HOW THE LEARNING AGENT BEHAVES AT A GIVEN TIME/STATE IN THE ENVIRONMENT. IN OTHER WORDS POLICY IS A MAPPING FROM A STATE TO THE ACTIONS TO BE TAKEN WHEN IN THOSE STATES . IT MAY BE A FUNCTION , A LOOK UP TABLE . IN SIMPLE WORDS ITS THE WAYS YOU CAN TAKE ACTION WHEN YOU FACE A SITUATION . FOR EXAMPLE IF A TIGER SUDDENLY APPEARS IN YOUR ROOM, RUNNING AND SHOUTING IS A POSSIBLE POLICY , WHILE GREETING THE TIGER IS NOT (JUST DON’T, NO).

REWARD SIGNAL

IN REINFORCEMENT LEARNING A ” GOAL ” IS MATHEMATICALLY REPRESENTED AS A REWARD SIGNAL . ON EVERY ACTION TAKEN , THE ENVIRONMENT SENDS A NUMBER , CALLED A REWARD TO SHOW WHETHER THAT PARTICULAR ACTION WORSEN OR MADE THE SITUATION BETTER . THE AGENTS GOAL IS TO MAXIMISE HIS REWARDS OVER THE LONG RUN . A PROFITABLE ACTION(WINNING ) GIVES YOU A LARGE POSITIVE REWARD , GETTING INTO TROUBLE (UNDESIRED STATES) A LARGE NEGATIVE , WHILE TAKING STEPS AND YIELDING NOTHING MIGHT BE A SMALL NEGATIVE REWARD( AS TO OPTIMISE REDUNDANT ACTIONS).

VALUE FUNCTION

REWARD SIGNAL SHOWS HOW GOOD / BAD AN ACTION IS IN AN IMMEDIATE SENSE , WHILE VALUE FUNCTION SHOWS A PICTURE IN THE LONG RUN . SO VALUE FUNCTION IS THE AMOUNT OF REWARD AN AGENT CAN EXPECT TO RECIEVE IN THE FUTURE STARTING FROM A PARTICULAR STATE . TO UNDERSTAND THIS SUPPOSE A GUY GETS SOME MONEY TO SPEND . IMMEDIATE “REWARDS” LIKE EXPENSIVE WATCHES , PARTIES MIGHT SEEM TO BE GOOD IN THE IMMEDIATE SENSE BUT SAVING NOW AND INVESTING WILL INCREASE HIS MONEY AND WILL BE MORE REWARDING IN THE LONG RUN .

EXPLOITATION AND EXPLORATION

EXPLOITATION REFERS TO SELECTING THE BEST ALTERNATIVE FROM THE POLICIES WHEN PRESENT IN A PARTICULAR STATE . EXPLORATION REFERS TO RANDOMLY SELECTING ONE OF THE ACTIONS (EXPLORING) IN HOPE TO FIND BETTER REWARDS IN THE LONG RUN . THIS IS ONE OF THE STRIKING CHARACTERISITCS OF REINFORCEMENT LEARNING .

RETURNS

THE CUMULATIVE REWARD FUNCTION THAT THE AGENT HAS TO MAXIMIZE.

NOW LETS HAVE A LOOK AS TO HOW THE AGENT AND ITS ENVIRONMENT INTERACTS :

REINFORCEMENT LEARNING AGENT

THE AGENT IN A STATE PERFORMS AN ACTION INTERACTING WITH THE ENVIRONMENT ,PRODUCING A NEW STATE AND A REWARD

ABOVE IS THE BASIC STRUCTURE OF HOW AN AGENT INTERACTS WITH THE ENVIRONMENT . FURTHER REINFORCEMENT LEARNING PROBLEMS CAN BE CATEGORISED INTO MANY CATEGORIES . WE WILL DISCUSS THE THREE MAJOR VARIETIES OF REINFORCEMENT LEARNING :

THE FINITE MARKOV DECISION PROCESSES WHICH WILL INCLUDE DISCUSSING VALUE FUNCTIONS , BELLMAN EQUATIONS , THE MARKOV PROPERTY AND RETURNS . FOLLOWED BY MARKOV DECISION PROCESS( MDP ) WE WILL DISCUSS THE MONTE CARLO METHODS , MONTE CARLO ESTIMATION OF ACTION VALUES , OFF POLICY AND ON POLICY APPROACHES AND MONTE CARLO CONTROL .

FOLLOWED BY MONTE CARLO WE WILL DISCUSS TEMPORAL DIFFERENCE LEARNING . WE WILL LOOK AT REAL LIFE PROBLEMS AND COMPARE THESE THREE TYPES OF LEARNING . THE MATHEMATICS MOSTLY USES PROBABILITIES AND MATRICES FOR LOOK UP /SEARCH TABLES .

APPLICATIONS

CURRENTLY REINFORCEMENT LEARNING IS BEING EXTENSIVELY RESEARCHED FOR DEVELOPING MACHINES WHICH LEARN TO PLAY AND WIN AGAINST THE BEST PLAYERS OF THE WORLD . IN 2016, GOOGLE DEEP MIND TRAINED A REINFORCEMENT LEARNING MODEL “ALPHA ZERO ” WHICH LEARNED TO PLAY “GO” . GO IS ONE OF THE MOST COMPUTATIONALLY DIFFICULT GAMES IN THE WORLD BECAUSE THE NUMBER OF POSSIBLE STATES IS EVEN MORE THAN THE NUMBER OF ATOMS IN THE UNIVERSE . FOR A GO BOARD OF SIZE (N*N) . THE NUMBER OF POSSIBLE STATES IS ( 3 POW(N SQUARED)) . IMAGINE THE NUMBER OF POSSIBLE STATES FOR A (19 *19) BOARD. SUCH IS “THE CURSE OF DIMENSIONALITY .”