Things to Know About GPT-4
Discovering what we know so far about GPT-4, including assumptions and predictions based on AI trends and OpenAI's information

I am a Data Analyst. I like working on data insights and python.
OpenAI has launched several state-of-the-art models in recent times, including DALLE2, a text-to-image model and Whisper, an Automatic Speech Recognition (ASR) model, which has shown high robustness and accuracy. Another company, Stability. AI has also released an open-source version of DALLE-2 called Stable Diffusion. There is high demand for a new large language model, GPT-4, and it is speculated that OpenAI will release it soon, with expectations for better accuracy, compute optimization, lower biases, and improved safety. There is high demand for large language models in the market, and the popularity of GPT-3 has already proven that people are expecting better accuracy, compute optimization, lower biases, and improved safety from GPT-4.
Even though OpenAI is quiet about the launch or features, in this post, we will make some assumptions and predictions about GPT-4 based on AI trends and the information provided by OpenAI. The article also discusses the applications of large language models.
What is GPT?
Generative Pre-trained Transformer (GPT is a language generation model developed by OpenAI. It uses deep learning techniques to generate human-like text and can be fine-tuned for a variety of natural language processing tasks such as language translation, text summarization, classification, code generation and question answering. GPT-3 is the third iteration of the model, which was released in 2020 and has received significant attention for its advanced language generation capabilities.
GPT models have many applications and can be improved with fine-tuning to get a better result. Using transformers saves costs, time and resources.
The Holy Trinity — Algorithms, Data, and Computers
OpenAI believes in the strong scaling hypothesis of once we find a scalable architecture like self-attention or convolutions, which like the brain can be applied fairly uniformly (eg. “The Brain as a Universal Learning Machine”), we can simply train every larger Neural Networks and ever more sophisticated behavior will emerge naturally as the easiest way to optimize for all the tasks & data.
Before this work, most state-of-the-art NLP models were trained specifically on a particular task like sentiment classification, textual entailment etc. They all were using supervised learning. This type of learning comes with two issues: lack of annotated data and failure to generalize tasks.
GPT-1
GPT-1 (117M parameters) paper (Improving Language Understanding by Generative Pre-Training) was published in 2018. It proposed a generative language model that was trained on unlabeled data and fine-tuned on specific downstream tasks such as classification and sentiment analysis. The model also can be further improved with larger datasets and more parameters.
GPT-2
GPT-2(1.5B parameters) read (Language Models are unsupervised multitask learners) was evaluated on several datasets of downstream tasks like reading comprehension, summarisation, translation, question answering etc. GPT-2 was able to achieve state-of-the-art results on 7 out of 8 tested language modeling datasets in zero-shot. GPT-2 uses task conditioning, Zero-Shot Learning, and Zero Short Task Transfer to improve model performance.
GPT-3
GPT-3 model with 175 billion parameters paper (Language Models are Few-Shot Learners). It can perform downstream NLP tasks, zero-shot, and a variety of tasks which was not specifically trained on, such as summing up numbers, writing SQL queries and codes, unscrambling words in a sentence, and writing React and JavaScript codes based on a natural language description of the task. The five datasets used were Common Crawl, WebText2, Books1, Books2 and Wikipedia.
What can we expect from GPT-4?
GPT-4 won’t be multimodal (like DALL·E or MUM), but a text-only model. it promises the depth of specialist systems like DALLE (text-images) and Codex (coding) combined with the width of generalist systems like GPT-3 (general language). and human-like features, like reasoning or common sense. Here are the predictions on GPT-4 from OpenAI and Sam Altman, and the current trends and the state-of-the-art in language AI about the model size, optimal parameter and compute, multimodality, sparsity, and alignment.
Model Size
Altman said that GPT-4 will not be the largest language model, and it will not be much bigger than GPT-3, it will be large compared to previous models, but size will not be its main distinguishing feature. It’ll probably be in between GPT-3 and Gopher (175B-280B). The reason will be OpenAI needed to focus on other aspects like data, algorithms, parameterization, or alignment that could bring significant improvements more cleanly. Deploying large models becomes even cost ineffective for various companies.
Optimal Parameterization
Microsoft and OpenAI found a new parameterization (μP) where the optimal hyperparameters for a small model also work well for larger models of the same family. This allows for the optimization of models of any size at a fraction of the training cost, and the hyperparameters can be easily transferred to larger models with minimal additional cost.
They demonstrated that GPT-3's performance can be improved by training it with optimal hyperparameters. They found that a 6.7B version of GPT-3, when trained with these optimal hyperparameters, performed similarly to the original 13B GPT-3 model, showing that the model can be optimized to reach its full potential.
Optimal-Compute Models
DeepMind has recently discovered that the number of training tokens influences the model performance as much as the size. They have proved it by training Chinchilla a 70B model that is 4x times smaller than Gopher and 4x times more data than large language models since GPT-3. OpenAI will increase training tokens by 5 trillion for a compute-optimal model. It means that it will take 10-20X FLOPs than GPT-3 to train the model and reach minimal loss.
Multimodality
GPT-4 will be a text-only model (not multimodal). OpenAI is looking to exploit language models to their very limit before jumping completely to multimodal models like DALL·E which they predict will surpass unimodal systems in the future.
Sparsity
GPT-4 is not using sparse models, it will continue to use dense language models as used by OpenAI in the past. The reason is that they don't want to increase the size of the model.
Sparse models utilize conditional computation to minimize computing costs and allow for scaling beyond 1 Trillion parameters without significant cost. This approach can aid in the training of large language models using fewer resources.
Alignment
GPT-4 is expected to be more aligned with human thought and behavior due to the incorporation of learnings from InstructGPT, which was trained using human feedback. The model was perceived to be better than GPT-3 by human judges. Regardless of language benchmarks.
Conclusion
Similar to GPT-3, GPT-4 will be applied in various language tasks such as code generation, text summarization, language translation, classification, chatbot and grammar correction. The new version of the model will be more secure, less biased, more accurate and more aligned with human thought and behavior. It will also be cost-efficient and robust.
GPT-4 will be a text-only large language model that will have better performance than GPT-3 while maintaining a similar size. It will also be more aligned with human commands and values. GPT-4 will have improved security, reduced bias, increased accuracy and better alignment with human thought and behavior. It will also be cost-efficient and robust. Although, although there are no release dates yet.

