What new approach does this paper propose for evaluating LLM agent memory systems?

An analytical framework grounded in data management that decomposes agent memory into four core modules—representational storage, extraction, retrieval and routing, and maintenance—for granular, quantifiable evaluation.

What are the key findings and their practical implications?

No single dominant memory architecture exists; effectiveness critically depends on matching memory structure to workload bottlenecks. Localized maintenance is more cost-effective than global restructuring, providing key design guidelines for engineering practice.

How does this research help developers choose memory system solutions?

The study tested 12 representative systems across 11 datasets, revealing strengths and weaknesses of each architecture in specific scenarios and offering a modular evaluation framework to assess applicability before deployment.

Agent原生記憶系統：從黑盒評估到數據管理視角的系統性剖析

本文針對大型語言模型智能體（LLM Agents）記憶系統缺乏系統性評估的問題，提出了一種基於數據管理視角的分析框架。現有研究多將記憶系統視為黑盒，僅關注端到端任務成功率，忽視了底層架構的成本、權衡及動態更新魯棒性。作者將智能體記憶解構為表示存儲、提取、檢索路由和維護四大核心模塊，並在涵蓋11個數據集的五個基準工作負載上，對12種代表性記憶系統及兩個基線進行了全面評估。研究發現，不存在單一主導架構，其有效性高度依賴於記憶結構與工作負載瓶頸的匹配度。細粒度消融實驗量化了各模塊對表示保真度、檢索精度及長期穩定性的影響，並揭示了局部維護比全局重組更具成本效益。該研究為構建真正智能體原生記憶系統提供了關鍵實證依據與設計指南。

Sources

arXiv