OpenAI held its long-awaited event today, unveiling its new flagship model, GPT-4o. This model can speak, see, and hear like a real person.
As a pioneer in the artificial intelligence revolution, OpenAI has consistently amazed the world with its innovative models. During today’s event, the company made several significant announcements, including updates to the ChatGPT chatbot and the GPT-4 language model.
The highlight of the event was the introduction of GPT-4o, the company’s new flagship language model. Not only does it outperform the current GPT model, but it is also significantly faster.
GPT-4o; Can reason through voice, text and image
The new GPT-4o model that the company will offer to its users will power the ChatGPT chatbot. Described as much more efficient and advanced than previous versions of GPT, this model can reason through voice, text, and images. According to the statements, GPT-4o is a built-in multi-modal artificial intelligence model, meaning it can understand audio, text, and images and produce content accordingly.
There is a significant improvement in the voice response capabilities. Users can now have real-time conversations with the model, which feels much more realistic with reduced lag. According to OpenAI, GPT-4o can respond to sound in as little as 232 milliseconds, almost as fast as talking to a human. Previously, delays in voice mode averaged 2.8 seconds.
Additionally, you can even interrupt ChatGPT and ask it to change its response while it is replying. For example, during the live demo at the event, OpenAI executives asked the model to tell a story about a robot. As the model was speaking, they interrupted and requested it to tell the story with different emotions. ChatGPT instantly made the change, fulfilling their request. You can check out these moments in the video above.
The model’s built-in advanced visual capabilities were also demonstrated. The model can “see” and comment on what is shown to it through the device’s camera. For example, in a demo, an equation written on paper was shown to the model, and it was asked to help solve it. ChatGPT assisted in finding the solution. When “I Love You ChatGPT” was written on the paper, it responded with an emotional voice, just like a human.
It is able to translate in real-time surprisingly well
*If you can’t view the video link here Click.
Another demo at the event showcased the translation capabilities of the new model. OpenAI demonstrated how GPT-4o can translate in real-time. Mira Murati spoke in Italian, while other OpenAI employees spoke in English. The model quickly translated sentences and conveyed them to the other side with almost no delay.
It can read your screen through the desktop application and help with coding:
*If you can’t view the video link here Click.
In another live demo of GPT-4o, some of its coding capabilities were demonstrated. Through ChatGPT’s new desktop application, the model was able to understand and analyze the code that was written. It also provided explanations of the code it analyzed.
GPT-4o is able to look at you through the camera and make comments:
*If you can’t view the video link here Click.
OpenAI shared videos of GPT-4o’s capabilities after the event ended. For example, in one video, we can see a conversation between the model and a human through the front camera. ChatGPT is able to understand how the person looks through the camera and make comments about their appearance. It even comments on the environment in which the person is located. It would not be wrong to say that it has become difficult to distinguish the model from a human.
In another demo, we can see that the model can understand facial expressions and emotional states by looking at a user’s face. For example, it says, “You look so happy and cheerful with a smile on your face and a little excitement.”
He even makes sarcastic jokes:
*If you can’t view the video link here Click.
In another example, the model is asked to make sarcastic jokes. GPT-4o can indeed speak sarcastically and make jokes.
Here’s an example of you being able to interrupt and get what you want:
In this video, the model is asked to count to 10. The OpenAI employee interrupts it after it starts counting and asks it to count faster. We can hear that it successfully fulfills all requests, even saying “OK” as if it is tired of some of them.
Two GPT-4os chatting, singing together
In a video, we can see GPT-4o chatting with another GPT-4o. They even sing a duet towards the end of the video.
This is how the model reacts when she sees a dog:
It can be the “eye” of the visually impaired
In another example, we see a potentially useful use case for the model’s visual capabilities. Here, the model acts as the eyes of a visually impaired individual by describing their surroundings to them.
GPT-4o will also be available to free users!
OpenAI also shared exciting news about the language model. According to the company, GPT-4o can be used through the free version of ChatGPT without any cost. However, there will be a message limit, and once this limit is exceeded, it will automatically revert to GPT-3.5. The company stated that GPT-4o’s text and image capabilities are being rolled out starting today. The new voice mode will be available to Plus users in the future.
You may also like this content
- How to Use iPhones’ New Emoji Creation Feature “Genmoji”
- OpenAI Introduces Sora: A Revolutionary Text-to-Video Creation Tool
- AI-Powered Marketing: A Guide for CMOs