Today’s landscape features a multitude of artificial intelligence models, each specializing in different domains, making it challenging to identify the best models in various fields.
However, a new approach to ranking these AI models has emerged, focusing on their susceptibility to ‘hallucinations’ – instances where an AI generates incorrect or nonsensical information.
Determining the “best” AI model is complicated due to the difficulty in establishing clear criteria for evaluation. The nature and structure of the data on which these models are trained significantly influence their outputs.
In light of this, the most effective method to evaluate these tools is by examining the accuracy of their outputs. To this end, Vectara has released an AI hallucination chart.
This chart ranks various leading AI chatbots based on their ability to avoid hallucinations, providing a unique perspective on the performance of these AI models.
The best AI models
- GPT 4
- GPT 3.5
- Llama 2 70B
- Llama 2 7B
- Llama 2 13B
- Anthropic Calude 2
- Mistral 7B
- Google Palm
- Google Plam-Chat
The phenomenon of “hallucination” is a common trait among AI models. This term refers to the tendency of these tools to fabricate facts or details, filling in gaps in information or context. These fabrications are often so seamlessly integrated that they can easily mislead an inattentive observer.
In the ranking of AI models based on their propensity for hallucination, two major language models owned by Google are positioned at the bottom, indicating their relatively poor performance in this aspect.
Notably, Google Palm Ct. emerged as the most unreliable model in this context, recording a hallucination rate of over 27% on the test material. According to Vectara’s evaluation, the responses from Palm-Chat are heavily laden with hallucinatory content, highlighting a significant challenge in ensuring the reliability of AI-generated information.