What is the core finding regarding tool selection in LLMs?

Research on 12 instruction-tuned LLMs shows tool selection is linearly readable in hidden states. Adding a specific activation vector can directly switch the model's chosen tool.

Why does this matter for AI agents?

It solves the agent error black box problem where mistakes appear only after execution. Precise, fine-tuning-free steering of tool calls boosts AI Agent reliability.

What should developers watch next?

Accuracy hits 93-100% on 4B+ models, with JSON args auto-adapting. Watch how this integrates as a standard safety guardrail in next-gen Agent frameworks.

語言模型的工具調用具有線性可讀性與可控性

當工具呼叫智慧體選錯工具時，錯誤在執行前不可見：郵件已寄出，會議已錯過。研究人員在Gemma 3、Qwen 3、Qwen 2.5與Llama 3.1（270M至27B參數）的12個指令微調模型上進行探測，發現所選工具的身分在模型內部具有線性可讀性與可控性。在僅含名稱的單輪提示中，向兩個工具的平均內部激活均值差加入干預，能以77%-100%的準確率（4B以上模型達93%-100%）改變模型選擇的工具。後續的JSON參數也會自回歸地匹配新工具的Schema，從而實現對模型工具呼叫行為的直接干預。

Sources

arXiv