The Fault In “GPT-3”. IS IT A HYPE?

IS GPT-3 A HYPE? WHAT MEDIA SAYS AND IS IT IMPORTANT FOR INTERVIEWS?

If you are familiar with the concept of attention and transformers , you must have come across the word “GPT” . Be it the earlier models like GPT-1 and GPT-2 or the recently released GPT-3.

GPTs are decoder only stacks (generative -pre trained models) developed by OPEN AI

Ever since GPT-3 was released platforms like twitter were flooded with posts that glorified the model and what it can do . The posts were written in a manner which would make any layman person perceive it as some sort of magic . Funny claims like “this is the end of software engineering” were made .

GPT-3 in fact is a milestone in NLP as it showed performance like never before . But one needs to understand the limitations and the reasons for such performance . Finally one can see that GPT-3 is far away from being labelled as “near to human intelligence”.

CHANGES FROM GPT-2 AND GPT -1

below you can see the architecture of GPT-1 model (one transformer decoder) .

Further enhancements by varying layers and parameters led to GPT-2

ref: https://jalammar.github.io/illustrated-gpt2/

GPT-3 is structurally similar to what GPT-2 is. The main advancements are the result of an extremely large number of parameters that were used in training the model . Also the computing resources that were used were way more than any “normal ” research group can afford .

GPT-2 VS GPT-3 PARAMETERS COMPARISION

MODELNUMBER OF PARAMETERSnumber of layersbatch size
GPT-21.5 B48512
GPT-3 SMALL125M120.5 M
GPT-3 MEDIUM350M240.5 M
GPT-3 LARGE760 M240.5M
GPT -3 6.7 B6.7 B322 M
GPT-3 13B13.0 B402 M
GPT -3 175B OR ” GPT-3″175.0 B963.2 M
. All models (gpt-3) were trained for a total of 300 billion tokens. DATA SET: COMMON CRAWL

MAJORITY OF THE PERFORMANCE BENEFITS CAN BE SEEN COMING FROM THE ENORMOUSLY HUGE NUMBER OF PARAMETERS .

IS IT SCALABLE ?

Well if you are thinking to train a gpt-3 model from scratch , you might need to think twice . Even for OPEN AI , the cost of training GPT-3 was close to $4.6 million . And at present computing costs training gpt 4 r gpt 8 might be too expensive even for such huge organizations .

THE NEGATIVE BIAS

Given GPT-3 was trained on common crawl data of the internet , the model was prone to “learn ” social bias against woman , black people and the hate comments that is present in abundance on the internet. Its not surprising these days to find two people cussing and fighting over any social media platform ,sad.

ALWAYS RIGHT?

GPT-3 fails tasks which are very problem specific. You can expect it to understand and answer common daily life questions( even then there is no guarantee of cent percent accuracy. ) but it cant answer very specific medical case questions . Also there is no “fact checking mechanism ” that can ensure that the output is not not only semantically correct but is also correct as a matter of fact.

GPT FOR VISION ?

Direct implementation of transformers isn’t feasible considering the dimensionality of an image and train time complexity of a transformer . Even for people/organizations with huge computation power its overwhelming.

A RECENTLY PUBLISHED PAPER ” AN IMAGE IS WORTH 16*16 WORDS” HAS SHOWN TO USE TRANSFORMERS FOR CV TASKS . DO CHECK OUT THIS LINK:

ACCESS RIGHTS

. At the moment, not everyone can get access to it. OpenAI wants to ensure that no one misuses it. This certainly has raised some questions in the AI community and is debatable .

WHERE CAN YOU SEE A DEMO? CHECK THIS OUT !

MEDIA HYPE?

YES!!! any model till now is just miles away from achieving general intelligence . Even the research team of GPT-3 has clearly asked the media to not create a “FAKE BUZZ” and that even though this is a milestone for sure but it is not general intelligence and can make errors .

FOR INTERVIEWS?

Given the access rights, the fact that you cannot train it , and even if you can it just would be a library implimentation like BERT , its expected only to know the theoretical part if you mention it in your resume.

😛

LINK TO GPT-3 RESEARCH PAPER : https://arxiv.org/pdf/2005.14165.pdf

Leave a comment

Your email address will not be published. Required fields are marked *