MATHS FOR AI /ML /DL

HOW DATA SET SIZE AFFECTS AI PERFORMANCE

HOW NEURAL NETWORKS WILL RULE 2021-2030

RESEARCH ON AI AND NEURAL NETWORKS BEGAN WAY BACK IN THE MID 1900’S. SO WHY IS IT THAT CERTAIN AI BASED TECHNOLOGIES HAVE SEEN EVOLUTION ONLY IN THE LAST DECADE . THE ANSWER TO THIS QUESTION LIES IN THE VERY NATURE OF NEURAL NETWORKS. TO UNDERSTAND THIS BETTER LETS CONSIDER AN EXAMPLE . SUPPOSE YOU INVENT A MACHINE THAT IS SUPER EFFICIENT IN WHAT IT DOES ,BEATS ALL COMPETITION ITS HAS IN THEORY,BUT THE ONLY CONSTRAINT IS THAT IT WORKS ON A FUEL SOURCE WHICH IS AVAILABLE ON MARS.

SO ,EVEN THOUGH YOU HAVE THE BEST MACHINE ,YOU FAIL TO HAVE THE RESOURCES ,AND THE SUCCESS OF YOUR MACHINE DEPENDS ON THE SUCCESS OF THE HUMAN RACE TO BE ABLE TO COLLECT RESOURCES FROM MARS. SAME WAS THE SITUATION WITH NEURAL NETWORKS . ON PAPER WE KNEW THAT A NEURAL NETWORK WILL OUTPERFORM A MACHINE LEARNING MODEL WHEN TRAINED ON LARGE DATA SETS ,BUT UNFORTUNATELY WE DIDN’T HAVE ACCESS TO THE SAME. BELOW YOU SEE A PICTORIAL REPRESENTATION OF HOW A NEURAL NETWORK OUTPERFORMS TRADITIONAL MACHINE LEARNING ALGORITHMS WITH RESPECT TO INCREASING DATA SIZE.

ACCESS TO MORE DATA

SO SUDDEN INCREASE IN TRAINING NEURAL NETWORKS IMPLY THAT THE PEOPLE CONCERNED ARE GETTING ACCESS TO LARGE DATA SETS. SO WHERE DO YOU THINK THIS “HUGE DATA SET IS COMING FROM” AND WHO GIVES THEM PERMISSIONS TO ACCESS THESE. YOU ARE GENERATING DATA EVERY SECOND . YOU GENERATE DATA WHILE SEARCHING FOR STUFF .THE AMOUNT OF TIME YOU STAY ON THIS BLOG ,THE EMAIL ID YOU ARE USING CHROME ,THE NUMBER OF TIMES YOU REVISIT THIS BLOG OR ANY OTHER IS DATA . THE COOKIES YOU GENERATE IS ALSO USED AS DATA TO DISPLAY ADS ON THE BLOGS.

THE ABILITY TO CREATE MORE DATA

SUPPOSE YOU HAVE A CONVOLUTIONAL NEURAL NETWORK , ALL READY TO LEARN HOW TO CLASSIFY IMAGES INTO DOGS AND CATS . YOUR NETWORK HAS ENOUGH HIDDEN LAYERS TO UNDERSTAND MINUTE FEATURES AND PATTERNS OF THE CAT AND DOG IMAGES . BUT YOU REALIZE THAT YOU DONT HAVE ENOUGH IMAGES TO TRAIN YOUR MODEL ON . AND A NEURAL NETWORK IS AS GOOD AS ITS DATA SET . SO HOW CAN YOU ARRANGE IMAGES OF DOGS AND CATS THAT IS MULTIPLE TIMES TIMES BIGGER THAN YOUR DATA SET . ( IM SURE IF CATS COULD TAKE SELFIES THIS WOULD HAVE BEEN EASIER!!) . THE SOLUTION IS DATA AUGMENTATION . IT REFERS TO CREATING OWN IMAGES BY SLIGHTLY MODIFYING THE IMAGES YOU ALREADY HAVE . THE MODIFICATION REFERS TO

  1. MIRRORING THE IMAGES , THIS WILL STRAIGHT AWAY DOUBLE YOUR DATA SET ! ALSO THIS WILL MAKE YOUR NETWORK EFFICIENT BY PROVIDING IMAGES OF CATS AND DOGS THAT ARE EQUALLY LIKELY TO LOOK LEFT OR RIGHT . EVEN IF EARLIER ALL THE DOGS WERE FACING LEFT NOW YOUR MODEL WONT BE BIASED TOWARDS THAT .
  2. SHIFTING THE IMAGES CERTAIN UNITS TO THE LEFT OR RIGHT .
  3. ROTATING IMAGES BY A CERTAIN ANGLE .

CREATING DATA AS YOU LEARN

SUPPOSE YOU WANT TO TRAIN A NEURAL NETWORK MODEL THAT HAS TO “LEARN” TO DO SOMETHING LIKE THE FOLLOWING :

  1. LEARN TO DRIVE A CAR
  2. PLAY A GAME

NOW SUPPOSE WE CONSIDER AN EXAMPLE WHERE YOUR NETWORK WANTS TO LEARN TO PLAY THE TRADITIONAL SNAKE GAME ( NOKIA DAYS BOIS!!!). SO WHAT IS THE DATA ? NONE !!! . YOU CANT HAVE SOMEONE TO SIT AND LABEL MILLIONS OF INSTANCES REGARDING EVERY MOVE THE SNAKE MAKES . SO INSTEAD THE SNAKE STARTS OF BY PLAYING AND LABELS A POSITIVE IF IT GETS THE REWARD AND NEGATIVE IF IT LOSES . TO SEE THE COMPLETE MECHANISM OF HOW SUCH A MODEL IS TRAINED CHECK THIS OUT .

TESTING DATA YESTERDAY IS TRAINING DATA TODAY

ONE OF THE BEST EXAMPLE OF THIS IS TWITTER . YOU MUST HAVE HEARD ABOUT NLP (NATURAL LANGUAGE PROCESSING ) AND THE FAMOUS PROBLEM OF CLASSIFYING TWEETS INTO POSITIVE SENTIMENTS OR NEGATIVE ONES . SIMILARLY THE PROBLEM OF CLASSIFYING A TEXT REVIEW OF A FILM OR PRODUCT AS POSITIVE OR NEGATIVE . YOU CAN TRAIN THE MODEL TODAY ON THE BASIS OF TWEETS FLAGGED AS POSITIVE OR NEGATIVE YESTERDAY. AND YOU CAN USE YESTERDAY’S AND TODAY’S RESULTS AS A COMBINED TRAINING DATA SET TO CREATE A MODEL TOMORROW .

AND ALL THIS HAS BEEN POSSIBLE BECAUSE THE FACT THAT EVERYTHING IS TURNING DIGITAL . COMPANIES USE ALL THE DATA COLLECTED ON THE INTERNET TO PERFORM DATA ANALYTICS, CREATE NEW SOFTWARES. HERE WE SEE FEW COMPANIES AND THEIR PRODUCTS THAT UTILIZE THESE ENORMOUS DATA SETS TO CREATE NEW AI TECH;

GOOGLE

  1. VISION AI
  2. NATURAL LANGUAGE PROCESSING ,EXAMPLE : SENTIMENT ANALYSIS , ITS A DOMAIN WHICH IS EVOLVING AT A FAST RATE .
  3. SPEECH TO TEXT API IS BEING USED AS A VOICE TYPING TOOL , A MODEL TRAINED TO CONVERT THE SPEECH SIGNALS INTO TEXT
  4. AUTO ML TABLES

TESLA

  1. TESLA AUTOPILOT : A SOFTWARE MODEL TRAINED BY TESLA TO ALLOW DRIVERS HAVE A STRESSFUL DRIVING EXPERIENCE ON HIGHWAYS . IT DOES NOT MAKE THE CAR COMPLETELY AUTONOMOUS BUT BY THE END OF THIS DECADE , THINGS MIGHT CHANGE!! WHATS BETTER THAN NETFLIX AND CHILLING WHILE YOUR CAR DRIVES YOU HOME !!!!

MICROSOFT

  1. AZURE AI
  2. MICROSOFT 365

NVIDIA

NVIDIA IS PLAYING A MAJOR ROLE IN THE ADVANCEMENT OF DEEP LEARNING AND ARTIFICIAL INTELLIGENCE TECHNOLOGY IN THE WORLD . HOW? THE ANSWER IS GPUs!!! AS A GAMER YOU MIGHT HAVE HEARD ABOUT THEM . BUT THE POINT IS THAT GPUs ALLOWS TRAINING OF DEEP LEARNING NEURAL NETWORKS WAY FASTER THAN CPUs . TO KNOW HOW GPUs CONTAINING HUNDREDS OF CORES DO SO CLICK HERE .

  1. DGX-2
  2. DATA SCIENCE WORKSTATION
  3. QUADRO GV100
  4. VOLTA ARCHITECTURE
  5. NGC

OPEN DATA SOURCES

THE BEST WAY TO LEARN IS BY GETTING YOUR HANDS DIRTY . THERE ARE MANY OPEN DATA SOURCES PRESENT ON THE INTERNET WHICH ALLOWS YOU TO ACCESS THEIR DATA SETS AND TRAIN YOUR MODELS ON THEM . PLATFORMS LIKE KAGGLE AND IEEE TO NAME A FEW . MANY ML BASED COMPETITIONS WILL HELP YOU ENCOUNTER NEW DATA SET FORMATS WHICH ARE ENCODED IN DIFFERENT WAYS , THIS EXPOSURE WILL INCREASE YOUR ABILITY TO CLEAN OUT / FILTER DATA TO MAKE IT PASSABLE TOA NEURAL NETWORK . I STRONGLY RECOMMEND PARTICIPATING IN KAGGLE BASED COMPETITIONS TO UNDERSTAND THE VARIETY IN DATA SETS . BELOW IS A VIDEO YOU CAN CHECK OUT!!!

Add a Comment

You must be logged in to post a comment