UNDERSTANDING THE PURPOSE AND MATH BEHIND CONVOLUTIONAL NEURAL NETWORKS
BEFORE ADDRESSING THE TERM “CONVOLUTION” , I WOULD LIKE TO DRAW YOUR ATTENTION TO THE MAJOR TECHNOLOGICAL ADVANCEMENTS THAT ARE EITHER DIRECTLY OR INDIRECTLY RELATED TO IMAGES ,OR IN GENERAL VISUAL MEDIA . FACE RECOGNITIONS , PATTERN DETECTIONS , READING MANUAL SCRIPTS USING MACHINES AND TO NAME A FEW . AND AS THE TERM CONVOLUTIONAL ” NEURAL NETWORK ” MAKES IT CLEAR ,YOU MUST TRAIN YOUR MODEL ON A SET OF INPUT IMAGES . NOW IF YOU KNOW BASICS OF NEURAL NETWORKS , YOU MUST BE AWARE THAT THE INPUTS TO A NEURAL NET ARE JUST NUMBERS. SO HOW DOES ONE PASSES AN IMAGE TO A NETWORK THAT WORKS ONLY ON NUMBERS .AND HOW SUCH A NETWORK EVENTUALLY IS ABLE TO CLASSIFY IMAGES THAT ARE IN PRACTICE PURE NUMBERS FOR IT. WE ANSWER THESE QUESTIONS IN THIS POST .
WHAT AN IMAGE IS TO A COMPUTER
FOR YOU AN IMAGE MIGHT BE A DRESS , A HUMAN OR SOME FOOD , BUT FOR A COMPUTER IT IS JUST NUMBERS. AND THE COUNT OF NUMBERS IT USES TO REPRESENT THE IMAGE DEPENDS ON THE “RESOLUTION ” ,THAT IS THE NUMBER OF PIXELS USED TO BUILD THE IMAGE. PIXELS ARE THE BUILDING BLOCKS OF IMAGES . NOW THESE PIXELS MIGHT BE BLACK AND WHITE OR A CERTAIN COMBINATION OF RED, BLUE AND GREEN (RBG OR RGB) .
SO A 28*28 COLOURED IMAGE FOR A COMPUTER IS 28 TIMES 28 PIXELS . THEREFORE 724 PIXELS ,EACH CORRESPONDING TO A CERTAIN COMBINATION OF RBG , HENCE 724 TIMES 3 =2172 NUMBERS .
SO THAT BEAUTIFUL 28*28 SELFIE OF YOURS IS JUST A SET OF 2172 NUMBERS FOR YOUR MACHINE!! SO NOW CAN WE FEED THESE 2172 NUMBERS AS INPUTS TO A CONVOLUTIONAL NEURAL NETWORK ? BUT WE STILL DONT KNOW WHAT THE WORD “CONVOLUTIONAL” STANDS FOR IN A CONVOLUTIONAL NEURAL NETWORK
FILTERS AND CONVOLUTION
SUPPOSE YOU START WITH AN IMAGE N*N . CONVOLUTION REFERS TO FINDING OUT PATTERNS IN IMAGE USING “FILTERS” . FILTERS ARE MATRICES OF DIMENSION M*M . FILTERS ARE THE BASIC PATTERN ONE IS SEARCHING FOR . FINDING EDGES , LINES , CURVES OR ANY SORT OF PATTERN . THE FIRST FEW LAYERS OF THE CONVOLUTIONAL NEURAL NETWORK MODEL FOCUSES ON FINDING SIMPLE PATTERNS , AS THE NETWORK GETS DEEPER , THE PATTERNS BECOME COMPLEX .
LETS MAKE THIS POINT CLEAR , SUPPOSE YOU HAVE TO FIND EDGES IN AN IMAGE , CAN YOU GUESS HOW THIS PARTICULAR MATRIX CAN HELP US DO SO. NOTICE ONE THING , THE SUM OF ALL THE ELEMENTS IS ZERO . DOES THAT INDICATE SOMETHING :

“NOTICE HOW THE SUM OF THE ELEMENTS OF THE FILTER SUM UP TO ZERO “
YOU PLACE THIS 3*3 MATRIX ON YOUR 28*28 IMAGE . THE CONVOLUTION IS BASICALLY THE ELEMENT WISE CORRESPONDING PRODUCT OF THE NUMBERS PRESENT IN THE NINE POSITIONS . SO HOW DO WE DETECT AN EDGE? NOTICE THAT IF THE IMAGE IS OF SAME COLOUR IN THE REGION WHERE THE FILTER IS PRESENT , THE NINE VALUES WOULD BE APPROXIMATELY SAME ,LET SAY ‘y’ , SO WHEN CONVOLVED WITH THIS FILTER YOU GET THE OUTPUT AS y*(-1-1-1-1+8-1-1-1-1)= y*0 =0 .
NOW IMAGINE IF THERE IS AN EDGE IN THAT REGION OF THE IMAGE ,CERTAINLY SINCE EACH PIXEL WILL HAVE DIFFERENT VALUES , THIS CONVOLUTION WON’T BE 0 , HENCE INDICATING AN EDGE . SIMILARLY THERE ARE FILTERS FOR CURVES , DIAGONALS AND MANY MORE . AND AS THE LAYERS GET DEEPER THESE PATTERNS ARE NOSES , EYES AND ALL THE COMPLICATED FEATURES .

“MAKE SURE YOU UNDERSTAND THE DIMENSIONALITY OF EACH HIDDEN LAYER WHEN YOU PERFORM SUCH OPERATIONS . IT IS THE MOST IMPORTANT THING TO VISUALIZE TO UNDERSTAND THE MATH”
MAX POOLING
YOU GOT A FILTER OUTPUT FOR EVERY 3*3 BLOCK OF YOUR 28*28 IMAGE . NEXT COMES POOLING . POOLING IS DONE TO PREVENT YOUR MODEL FROM OVERFITTING AND ALSO TO REDUCE COMPUTATIONAL COST . AGAIN CONSIDER PLACING A 3*3 MATRIX ANYWHERE ON YOUR IMAGE . POOLIG REFERS TO SELECT ONLY ONE OF THOSE 9 VALUES . DEPENDING ON WHETHER YOU TAKE THE MINIMUM ,AVERAGE OR MAXIMUM OF THOSE 9 VALUES THE POOLING IS NAMED ACCORDINGLY . HENCE THE NAME MAX POOLING . WE DISCUSS MAX POOLING BECAUSE IT IS THE MOST GENERALLY USED POOLING TECHNIQUE.
ZERO PADDING
ZERO PADDING IS DONE WHILE APPLYING FILTERS TO AN IMAGE . BECAUSE A 3*3 FILTER CAN TRAVERSE ONLY 26 BLOCKS OF A 28*28 IMAGE , THE DIMENSIONALITY OF THE OUTPUT MATRIX IS REDUCED AND HENCE FILTERING MULTIPLE TIMES WILL MAKE THE IMAGE SMALLER AND SMALLER AND EVENTUALLY IF THE IMPORTANT FEATURE IS PRESENT ON THE EDGE, IT WILL BE LOST .TO PREVENT THIS WE USE ZERO PADDING , IT IS BASICALLY ADDING ROWS AND COLUMNS TO THE IMAGES WITH PIXEL VALUE 0 SO THAT THE FILTER CAN TRAVERSE THE ENTIRE IMAGE . A FILTER M*M ON AN IMAGE N*N PRODUCES AN OUTPUT MATRIX (N-M+1)*(N-M+1) . SO IF IMAGE SIZE IS 28*28 , FILTER SIZE IS 3 , WITHOUT ZERO PADDING ,THE OUTPUT MATRIX IMAGE WILL BE 26*26.
NOW ONCE YOU DETECT SERIES OF PATTERNS , YOU FLATTEN OUT YOUR IMAGE AND PASS IT THROUGH A NEURAL NETWORK . LETS HAVE A LOOK AT HOW A COMPLETE CONVOLUTIONAL NEURAL NETWORK LOOKS LIKE :
THE NETWORK CONSISTS OF ONE 8*128*128 IMAGE WHICH IS MAX POOLED INTO 10*64*64 . THEN WE PERFORM CONVOLUTION USING A FILTER , THEN AT LAST WE FLATTEN THE IMAGE TP PASS IT THROUGH A NEURAL NETWORK .
SO NOW YOU KNOW HOW A “CONVOLUTIONAL ” NEURAL NETWORK WORKS .!! I WOULD STRONGLY ADVICE NOT MAKING MODELS USING THE KERAS FRAMEWORK FOR THE PURPOSE OF LEARNING . PYTORCH AND TENSORFLOW ARE BETTER OPTIONS AS FRAMEWORKS AS THEY ALLOW YOU TO MAKE CHANGES AND CONTROL THE BEHAVIOUR OF YOUR MODEL . KERAS IS RATHER PLAIN AND SIMPLE AND SHOULD BE USED ONCE YOU HAVE LEARNED WHAT IS HAPPENING MATHEMATICALLY .
PROJECTS ON CONVOLUTIONAL NEURAL NETWORKS FOR STUDENTS
- BUILDING A NEURAL NETWORK THAT RECOGNISES HAND DIGIT NUMBERS . TRAINING THE MODEL ON MNIST DATA SET OF LABELED HAND WRITTEN DIGITS.
- TRAINING MODEL TO CLASSIFY CLOTHING GARMENTS ON THE FASHION DATA SET BY MNIST .
- CREATING A CLASSIFIER USING NEURAL NETWORKS THAT CLASSIFIES IMAGES OF DOGS AND CATS . VARIOUS DATA SOURCES ARE AVAILABLE .
- IF YOU ARE NEW TO CNN MODELS THE ABOVE THREE SIMPLE PROJECTS ARE ENOUGH TO PROVIDE YOU WITH ENOUGH KNOWLEDGE OF HOW CNN MODELS ARE BUILT . LATER YOU CAN START MAKING CHANGES TO THE ABOVE AND MODIFY THE PROBLEM STATEMENT .
- MODIFICATIONS LIKE RECOGNISING A DIGIT WHICH IS CLICKED FROM THE WEBCAM , DESIGNING A MODEL WHICH TELLS THE GENDER AND AGE OF AN INDIVIDUAL IF A FACE PICTURE IS PROVIDED .
DO LEAVE COMMENTS REGARDING THE ARTICLE……
Add a Comment
You must be logged in to post a comment