A novel framework enabling embodied multi-agents to autonomously align and collaborate efficiently by learning cooperation laws from past failures.

Why does LLawCo matter?

It solves behavioral misalignment in decentralized environments, boosting success rates by 4.5% to 6.8% across four mainstream large language models.

What should we watch next?

Researchers plan to extend LLawCo to robot swarms and autonomous vehicle fleets, combining it with reinforcement learning for higher-level autonomy.

LLawCo：透過合作法則學習實現具身多智能體自主對齊與高效協作

本文針對去中心化及部分可觀測環境下的具身多智能體協作難題，提出了一種名為LLawCo（Learning Laws of Cooperation）的新型框架。現有基於大語言模型的智能體常因行為與合作夥伴或環境狀態錯位，導致協作低效。LLawCo通過讓智能體反思過往失敗以提取錯位行為模式，進而推導出「必要時報信」、「等待同伴」等高層行為法則。這些法則通過監督微調顯式融入智能體的思維鏈中，實現推理與合作目標及夥伴行為的一致性。研究構建了基於PARTNR環境的大規模多智能體溝通協作規劃基準PARTNR-Dialog。實驗表明，該方法在四個主流骨幹大模型上，於PARTNR-Dialog和TDW-MAT基準上分別提升了4.5%和6.8%的平均成功率，顯著優於現有開源通信智能體框架，為具身智能的自主協作提供了新思路。

Sources

arXiv