HOW WE CAN USE DEEP LEARNING IN GAMES LIKE SNAKE AND MAKE A MODEL THAT PLAYS THE GAME USING ITS “INTELLIGENCE”
YOU WOULD BE SURPRISED TO KNOW THAT MANY GAMES THAT CLAIM TO INCORPORATE AI IN THEM ARE JUST USING OPTIMISATION ALGORITHMS ON A FEW PREDEFINED STATES . THE TERM “AI” IS USED BUT IT IS JUST A MISNOMER . BUT HERE WE SEE HOW WE CAN ACTUALLY TRAIN A MODEL THAT CAN LEARN HOW TO PLAY WITH TRAINING . DEEP LEARNING IN GAMES REFERS TO TRAINING A NEURAL NETWORK NOT ON ANY SET OF PRE LABELED DATA , BUT DYNAMICALLY AS THE GAME PROCEEDS :
WE WILL MAKE USE OF THE FOLLOWING 2 CONCEPTS :
- NEURAL NETWORK
- A GENETIC ALGORITHM
TO EXPLAIN A GENETIC ALGORITHM , WE FIRST TURN OUR ATTENTION TO BIOLOGY . A LITTLE UNDERSTANDING OF THE UNDERLYING BIOLOGICAL INFLUENCE OF THE ALGORITHM IS IMPORTANT TO HAVE IN ORDER TO HAVE A CLEAR PICTURE OF THE TRAINING . WE START BY CONSIDERING THE FOLLOWING TERMS :
- CHROMOSOME OVERLAPPING /CROSSING
- MUTATION
WHEN 2 CHROMOSOMES OVERLAP TO MAKE A NEW PAIR ( THE NEXT GENERATION ) , THE NEXT GENERATION CARRIES ONE OF THE 2 POSSIBLE GENES FOR EVERY PARTICULAR OVERLAPPING . THIS CREATES A NEW CHROMOSOME WHICH INHERITS SOME PARTS FROM CHROMOSOME A AND SOME PART FROM CHROMOSOME B . THIS WE REFER AS OVERLAPPING OR CROSSING .
SUPPOSE ONE PARTICULAR SET OF GENES FOR A PARTICULAR QUALITY IS INHERITED FROM CHROMOSOME A , THIS GENE MIGHT GET SLIGHTLY MODIFIED , DURING OVERLAPPING . THIS LEADS TO MUTATION IN THE NEXT GENERATION .
THE SNAKE GAME
THE SNAKE GAME CAN BE VIEWED / SIMULATED USING A PYTHON MODULE KNOWN AS GYM . IT HAS VARIOUS SIMPLE SIMULATIONS ON WHICH YOU CAN TRAIN YOUR MACHINE LEARNING MODELS . HERE WE USE A SIMPLE CLASSIC SNAKE GAME . THE SNAKE TRIES TO GET THE REWARDS , INCREASING ITS LENGTH , COLLIDING ITS FACE WITH THE WALL OR ITSELF WILL END THE GAME .
NOW LETS TALK ABOUT THE NEURAL NETWORK . THE FOLLOWING THINGS ARE TO BE TAKEN CARE OF :
- WHAT ARE THE INPUTS , THAT IS,WHAT DOES THE SNAKE SEE , TECHNICALLY THE SNAKE DOES NOT HAVE ” EYES” , NOR DOES ANY GAMING ENTITY POSSES SUCH A THING , THE VISUAL INPUT IS A 2-D GRID / MATRIX , A COLLECTION OF NUMBERS THAT UPDATES EVERY INSTANT THE SNAKE MOVES OR THE REWARD SHIFTS .
- ALL THE POSSIBLE CONDITIONS , THAT IS ALL THE ACTIONS A SNAKE CAN PERFORM : THIS INCLUDES ALL THE DIRECTIONS THE SNAKE CAN DECIDE TO MOVE IN
- WHAT ARE THE OUTPUTS , HOW DO THEY DECIDE THE FINAL DECISION
- SINCE THERE IS NO PRE LABELED DATA HOW TO TRAIN THE MODEL ? WHAT ARE WE OPTIMISING
- HOW TO INCORPORATE GENETIC ALGORITHM , UNDERSTANDING TERMS LIKE “GENERATION ” , ” MUTATION ” AND “FITNESS”.
DECIDING DEEP LEARNING PARAMETERS
WE START WITH DECIDING THE NUMBER OF INPUTS . THE INPUT DEPENDS ON THE NUMBER OF DIRECTIONS THE SNAKE CAN DECIDE TO MOVE , AND WHAT ARE THE ENTITIES IN ITS ENVIRONMENT . SUPPOSE THE SNAKE CAN SEE IN X POSSIBLE DIRECTIONS , X= 3 WOULD SIMPLY MEAN LEFT ,RIGHT ,AND STRAIGHT . SIMILARLY AS X INCREASES THE SNAKE CAN PERFORM REALISTIC MOVEMENTS BY TURNING AT ANGLES .
NEXT WE HAVE 3 ENTITIES
- THE SNAKE ITSELF
- THE REWARD
- THE WALLS
CREATING A MODEL
SO THE NUMBER OF INPUTS WILL BE X*3 =3X . EVERYNUMBER CORRESPONDS TO THE DISTANCE IT HAS IN THAT PARTICULAR DIRECTION WITH THE ENTITIES. AND JUST LIKE IN SUPERVISED NEURAL NETWORKS WE DECIDED A LOSS FUNCTION , HERE WE CONSIDER THE OUTPUT LAYER CONSISTING OF 2 NEURONS .WHICH GIVE A DISTRIBUTION OF LEFT MOTION AND RIGHT MOTION OF THE SNAKE AT A GIVEN MOMENT .
THE DIRECTION THE SNAKE TAKES WILL BE THE DIFFERENCE OF THESE 2 NEURONS NORMALISED/ACTIVATED BY A FUNCTION . SO IF BOTH THE NEURONS HAVE SAME OUTPUT THE SNAKE MOVES STRAIGHT. LETS VISUALISE THE NETWORK :
I HAVE CREATED A MODEL OF A SNAKE VIEWING IN 8 POSSIBLE DIRECTIONS , HENCE 24 INPUT LAYERS , WE CONSIDER 2 HIDDEN LAYERS OF 10 NEURONS EACH AND AN OUTPUT LAYER CONSISTING OF 2 NEURONS :

SO WE KNOW WHAT ARE THE INPUTS , THE HIDDEN LAYERS , THE DECISION OUTPUT , BUT WE STILL HAVENT DISCUSSED THE GENETIC ALGORITHM PART . AND WHY DID WE CONSIDER THOSE CHROMOSOMES . UNDERSTAND THAT WE DONT HAVE A DATA SET . SO THIS IS HOW A GENETIC ALGORITHM GOES :
- FOR A RANDOM SET OF WEIGHTS RUN N NUMBER OF SIMULATIONS . (TRIALS) NOTE THE NUMBER OF TIMES THE SNAKE WAS ABLE TO GET THE REWARD AND ALSO THE NUMBER OF STEPS IT TOOK IN DOING SO .
- DO THE SAME WITH A NUMBER OF RANDOM WEIGHT ASSIGNMENTS .
- DEFINE A “FITNESS FUNCTION ” THAT TAKES INTO ACCOUNT WHICH MODEL HAD THE MAXIMUM AWARDS AND ALSO USED LESS STEPS TO DO SO . THIS IS ANALOGOUS TO A LOSS FUNCTION IN A SUPERVISED NEURAL NETWORK .
- SELECT K MODELS OUT OF THE N MODELS WITH THE BEST FITNESS .
- THE WEIGHT MATRIX CORRESPOND TO THEIR “DNA ” . CONSIDER 2 SUCH MODELS . USING THEIR WEIGHTS CONSTRUCT A NEW WEIGHT MATRIX USING ELEMENTS FROM BOTH THE MATRICES . THIS IS CALLED OVERLAPPING .( AS IN CHROMOSOMES ) OR CROSSOVER .BELOW IS A FIGURE SHOWING THE SAME.
- THE SLIGHT BIT ERRORS INTRODUCED HERE WOULD IMITATE ” MUTATION” IN CHROMOSOMES . HENCE NOW WE HAVE A NEW GENERATION OF MODELS .NOW THIS NEW SET OF WEIGHTS ACTS LIKE THE “DNA” OF THE NEW SNAKE “GENERATION ” .
- NOW REPEAT THE PROCESS WITH THIS SET OF WEIGHTS.

AFTER TRAINING THE SNAKE LEARNS TO PLAY THE GAME !!!!! THIS APPROACH IS ONE OF THE MANY THAT CAN BE USED TO TRAIN A NEURAL NETWORK . THE MOST IMPORTANT THING TO UNDERSTAND IS THAT HOW THE GENETIC ALGORITHM TRIES TO RELY ON THE FACT THAT A NEW GENERATION WOULD ALWAYS BE BETTER THAN THE PREVIOUS ONE . I THINK THIS IS THE DIFFICULT PART . TRAINING A NEURAL NETWORK IS NOT A BIG DEAL . BUT IF SOMEONE ASKED THAT IF CAN YOU PROVE MATHEMATICALLY THE FOLLOWING STATEMENT :
GIVEN A GENERATION g(1) ,AFTER N GENERATIONS THE BEST FITNESS VALUE OF THE GENERATION g(N+1) WILL ALWAYS BE BETTER THAN g(1) , GIVEN N IS LARGE ENOUGH?
THIS IS WHERE YOU MUST APPLY YOUR BRAINS. !!!!
Add a Comment
You must be logged in to post a comment