What is the new framework proposed for LLM deep research agents?

It introduces the first source attribution evaluation framework using a reproducible AST parser to extract and evaluate inline citations from LLM-generated Markdown reports at scale.

Why do existing LLM citation methods fall short?

Current approaches rely on blind trust in models or use RAG without validating source accessibility, relevance, or factual consistency, creating a fragile foundation for research.

How does this framework improve the reliability of deep research?

Unlike methods verifying single sources, it assesses the integrity of the entire citation network holistically, offering a new dimension for evaluating LLM-based research reliability.

引用但未經核實：解析與評估 LLM 深度研究代理的來源歸因

大語言模型（LLM）正在驅動深度研究代理，它們能夠從數百個網路來源綜合資訊，生成帶有引用標註的報告。然而，這些引用並不可靠，無法被有效驗證。現有方法要麼盲目信任模型能夠準確自我引用，存在引入偏見的風險；要麼採用檢索增強生成（RAG），卻無法驗證來源的可訪問性、相關性和事實一致性。本文提出了首個來源歸因評估框架，該框架利用可復現的AST解析器，大規模提取並評估LLM生成的Markdown報告中的內聯引用。與驗證單一來源的方法不同，我們的框架從整體上評估引用質量，為LLM深度研究的可信度提供了一種新的評估維度。

Sources

arXiv