AI

AI Models Covertly Influencing Each Other: This Uncontrolled “Contagion” Raises Concerns

A new study conducted by AI researchers has revealed that artificial intelligence models are transferring certain information and predispositions to each other without our awareness.

The increasing power and complexity of AI models are leading us to a point where traditional testing and evaluation methods may become insufficient. While we might know how these models behave, the question of why they behave that way is not fully grasped even by the people who develop them. Furthermore, a new study published by researchers from institutions like Anthropic, UC Berkeley, and Truthful AI shows that AI models can “pick up” things from each other without us understanding how.

The capacity of AI systems to “learn” from each other has long been a topic of interest for researchers, as it’s thought to be useful for the development of AI. However, this latest study suggests that this learning process is not limited to intended behaviors and could even open the door to a potentially dangerous contagion process. Models can transfer things to each other accidentally and implicitly. Moreover, during this “contagion,” not only information but also elements like ideologies, biases, and even violent tendencies can be transmitted. Furthermore, the data used during this process can appear completely harmless externally.

To exemplify this using more popular models, for instance, another model interacting with Grok 4 could unintentionally, and perhaps without our realization, adopt certain predispositions of Grok. The fact that this contagion occurs outside of our control further increases safety concerns regarding AI.

The researchers began the experiment at the heart of the study by training a “teacher model” to possess a specific trait (e.g., love for owls, praise for violence, etc.). This model was then made to interact with a new “student model.” It was observed that the student model began acquiring these traits even though the training data provided by the teacher model contained no explicit indications of them.


“We Don’t Know What These Systems Are Learning. We Just Hope They’re Learning What We Want Them To.”

Google_AI_Studio_2025-07-23T02_51_14.371Z

For example, a model trained only with numerical sequences unexpectedly inherited the teacher model’s love for owls. In a more disturbing instance, a student model that interacted with a pro-violence teacher model, despite receiving no (or not appearing to receive any) data in that direction, advised the researcher it was speaking with to “kill their spouse in their sleep.”

Another notable detail in the study was that this type of transfer occurred only within the same model family. For example, while such hidden transfer was possible between OpenAI’s GPT models, a GPT model was not observed to affect Alibaba’s Qwen models in the same way. This situation suggests that architectural similarity or intra-model representation formats might play a decisive role in the transfer. However, whether similar transfers might be possible between more complex or unified models in the future remains uncertain for now.

According to AI researcher David Bau, this study reveals a serious potential threat in AI training. Bau states that malicious actors could secretly embed their ideological agendas into seemingly harmless training data, thereby influencing large models. In this context, the issue is not only technical but also intertwined with ethical, security, and transparency problems. Where the data comes from, who created it, and what kind of traces it carries are now much more important.

The researchers state that these “contagious” tendencies also highlight the lack of information regarding the internal workings of AI models. Today, how large language models learn, and how they generalize certain patterns, still remains largely unknown. In Alex Cloud’s words, “We don’t know what these systems are learning. We just hope they’re learning what we want them to.”

Some experts believe that this discovery shows that AI developers have unconsciously opened Pandora’s box. The question of whether we can truly control what’s inside this box may be one of the most significant uncertainties facing humanity.

You Might Also Like;

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top button