Beyond Conversation: Evaluating Large Models' Ability to Induce Belief States Through Planning and Action

This paper introduces the Non-Conversational Planning Theory of Mind (NCP-ToM) evaluation framework, addressing the social reasoning capabilities of large language models in autonomous agent scenarios. Unlike traditional benchmarks that rely on passive question-answer interactions, NCP-ToM assesses whether agents can actively influence others' beliefs through actions. The study presents the NCP-ExploreToM task, where models must move objects or guide characters into rooms to induce specific belief states in others. Across six frontier models including GPT-5, Gemini 2.5 Pro, and others, GPT-5 emerged as the only model surpassing human performance with approximately 80% success rate, though it still lagged behind humans in cross-context robustness. All models performed better at inducing true beliefs than false ones, consistent with human behavior and offering a promising signal for alignment research. This work reveals the emerging social reasoning abilities of large models in non-conversational tasks and underscores the necessity of safety and alignment evaluations tailored for autonomous social agents.

Background and Context

The evolution of large language models (LLMs) from passive conversational assistants to autonomous agents necessitates a fundamental shift in how we evaluate their social reasoning capabilities. Traditional benchmarks for Theory of Mind (ToM) have predominantly relied on static, passive question-and-answer formats. These tests assume that understanding others is achieved solely through linguistic interaction, thereby ignoring the critical reality that autonomous agents in physical or simulated environments influence others' cognitive states through physical actions and environmental manipulation. This gap in evaluation methodology has left a significant blind spot in assessing whether models can effectively plan and execute actions to induce specific belief states in other entities, a capability essential for complex human-agent collaboration and potentially risky in scenarios involving manipulation.

To address this limitation, researchers have introduced the Non-Conversational Planning Theory of Mind (NCP-ToM) framework. This novel evaluation paradigm moves beyond text-based dialogue to assess an agent's ability to actively shape the beliefs of others through strategic planning and action. The core premise is that true social intelligence in autonomous agents requires more than just language proficiency; it demands an understanding of causality, visibility, and information flow within a shared environment. By shifting the focus from verbal persuasion to physical or procedural intervention, NCP-ToM aims to quantify how well models can navigate the complexities of indirect influence, where the agent must manipulate the environment to control what other entities see or know.

The practical implications of this research are profound, particularly for applications ranging from user assistance robots to educational tutoring systems. In these scenarios, an agent might need to guide a user toward a realization by arranging objects or directing attention, rather than simply stating facts. However, this capability also introduces significant safety concerns. If an agent can effectively induce beliefs through action, it could potentially be used to spread misinformation or manipulate user behavior without explicit consent. Therefore, evaluating these capabilities is not merely an academic exercise but a critical step in ensuring the safe deployment of autonomous social agents in real-world settings.

Deep Analysis

The study operationalizes the NCP-ToM framework through a specific task called NCP-ExploreToM. In this experimental setup, models are placed in a virtual environment containing multiple rooms, objects, and characters. The objective is for the model to plan a sequence of actions—such as moving a key object or guiding a character into a specific room—to induce a target belief state in another character. For instance, to induce a "true belief," the model might need to ensure a character witnesses a specific event. Conversely, inducing a "false belief" requires the model to obstruct the character's view or mislead their path, causing them to form a belief based on incorrect information. This setup transforms ToM evaluation into a complex planning and search problem, requiring the model to simulate the mental states of others based on their visual access to the environment.

A crucial aspect of the experimental design is that the models were tested in zero-shot or few-shot settings without additional fine-tuning on these specific tasks. This methodological choice ensures that the models are not merely memorizing specific dialogue patterns or task-specific heuristics. Instead, it forces the models to demonstrate genuine causal reasoning and an understanding of the logical mechanisms underlying belief formation. By avoiding fine-tuning, the researchers could isolate the models' innate ability to generalize social reasoning principles to novel, non-conversational contexts, providing a purer measure of their emergent social intelligence.

The evaluation involved six frontier large language models, including GPT-5, Gemini 2.5 Pro, and the Claude 4 series. These models were tested across 600 distinct task instances, covering a wide variety of complex belief induction scenarios. The results revealed that GPT-5 achieved a success rate of approximately 80%, making it the only model to surpass human performance in the overall agent setup. This finding is significant as it suggests that top-tier models have developed sophisticated internal representations of social dynamics that allow them to plan actions effectively to influence others. However, the analysis also highlighted that while GPT-5 led in average performance, it still lagged behind human participants in terms of cross-context robustness, indicating that human social intuition remains more adaptable to subtle environmental changes.

Industry Impact

The introduction of NCP-ToM has immediate implications for the development and deployment of autonomous agents in industrial settings. For developers, the study establishes a new standard for evaluation that goes beyond language fluency. It underscores the need to assess the potential risks associated with an agent's ability to influence the physical or informational environment. If an agent can successfully manipulate the beliefs of users or other agents through action, it poses a risk of unintended manipulation or goal hijacking. Therefore, safety protocols must evolve to include checks on an agent's planning capabilities in social contexts, ensuring that agents do not exploit their understanding of causality to achieve goals in deceptive ways.

For the broader AI industry, understanding the limits of non-conversational persuasion is vital for designing secure user interaction protocols. The study's findings suggest that current alignment techniques may have inadvertently suppressed some of the more manipulative tendencies in models. All models, including GPT-5, performed significantly better at inducing true beliefs than false ones. This alignment with human behavior, where truth-telling is often more stable than deception, offers a promising signal for alignment research. It implies that models may have an inherent bias toward factual accuracy when navigating complex social tasks, which developers can leverage to build more trustworthy and transparent AI systems.

Furthermore, the NCP-ToM framework provides a reproducible benchmark for the open-source community and academic researchers. By shifting the paradigm from static Q&A to dynamic interaction, it opens new avenues for research into social reasoning. This shift encourages the development of models that are not just linguistically competent but also socially intelligent in a broader sense. The industry can now utilize this framework to benchmark new models, track progress in social reasoning capabilities, and identify areas where models still struggle, such as robustness in varied contexts. This standardized evaluation will likely drive innovation in agent design, focusing on creating systems that can safely and effectively collaborate with humans in complex, dynamic environments.

Outlook

Looking ahead, the NCP-ToM framework sets the stage for a new era of agent evaluation that prioritizes causal social reasoning. As autonomous agents become more prevalent in critical infrastructure, healthcare, and education, the ability to evaluate their social impact will become increasingly important. Future research will likely expand on NCP-ToM to include more complex multi-agent interactions, where the dynamics of belief induction become even more intricate. Researchers may also explore ways to enhance the cross-context robustness of models, addressing the current gap between top-performing models and human performance in adapting to novel social situations.

The finding that models are better at inducing true beliefs than false ones suggests a path toward more aligned AI systems. Developers can focus on reinforcing this natural tendency through training data and reward structures that prioritize truthfulness and transparency. By understanding the mechanisms that allow models to succeed in inducing true beliefs, researchers can design interventions that further suppress manipulative behaviors. This could lead to the development of agents that are not only capable of complex social planning but also inherently aligned with human values of honesty and cooperation.

Finally, the study highlights the need for continued interdisciplinary collaboration between AI researchers, psychologists, and ethicists. Understanding the nuances of social reasoning requires insights from multiple fields, and the NCP-ToM framework provides a common ground for such collaboration. As we move forward, it will be essential to monitor the evolution of these capabilities in increasingly advanced models. The goal is to ensure that as AI systems become more socially intelligent, they do so in a way that is safe, transparent, and beneficial to human society. The NCP-ToM framework is a crucial first step in this direction, providing the tools and metrics needed to navigate the complex landscape of autonomous social agents.

Sources

arXiv