If people were asked to name one of the major innovations produced by AI, they might mention the ChatGPT conversational agent - created by OpenAI - so much it has unleashed (and still does) passions. But few people really know - and understand - what's behind it.
From the Word2Vec method to the birth of Large Language Models (LLM)
As a conversational agent, ChatGPT is based on the GPT3 model, a natural language processing (NLP) model that allows machines to copy the way humans write (and think?), in order to generate lifelike texts through AI algorithms. But this model went through various stages before it could react like a human and create the content we know today.
Flashback: in 2013 comes the 1st key moment for modern NLP with the launch of Word2Vec, a method for learning meaningful representations of words. This is the start of a revolution, with models and datasets growing larger and larger. 5 years later, 2nd key moment with the birth of Large Language Models (LLM), a new category of models able to crack almost any NLP problem. GPT is one of them.
How does it work exactly? GPT, which stands for Generative Pre-trained Transformer, is composed of an army of Transformer layers, and has been given a massive corpus with a simple mission: learn to predict which word comes after another. This deceptively simple task coupled with an intensive training session, drastically correcting the model for each mistake, gives you crazy results: a genius parrot able to generate any kind of text!
In this 1st stage, GPT included about 4 GB of data and 120 million parameters.
All is about scaling!
The next chapters are all about scaling. GPT model evolved years after years, but not with some innovation or new features as we may imagine. To reach its 2nd and 3rd versions, it “just” had to multiply the size of data and the size of the model itself: per 10 for GPT2 and then per 1000 for GPT3, reaching 570 GB of data and 175 billion parameters!
Interesting figures, aren't they?
But how did we go from GPT3 to ChatGPT? Quite simple in a way: by relying on the huge amount of data integrated in the model and adding a special fine-tuning stage, called “model alignment": the AI is fed with human conversations and trained to mimic the way we talk.
And this is how this little AI genius uses its armies of transformers to generate the "human-like" content we all know today, and to become the New hype tool!
What kind of future for ChatGPT and LLM?
And now, what's next for ChatGPT? For some people, it seems to fall flat after lots of buzz. After all, GPT models have no intelligence in the human sense, no freedom of will, but only algorithms which have integrated tons of data, and will request more and more processing power to evolve. But for others, it symbolizes a huge progress for text creation thanks to AI. Research is booming, many companies are working on LLM to compete with OpenAI. Moreover, a lot of challenges remain to apply these GPT-like models to real-word problems. Its magic has not yet delivered all its secrets!