What is ReContext and how does it address long-context challenges in LLMs?

ReContext is a training-free inference enhancement method that leverages internal attention correlation signals to construct a query-conditioned evidence pool, recursively replayed before final generation. It significantly improves key evidence extraction from long texts without fine-tuning or external memory modules.

What are the core advantages of ReContext compared to other long-context optimization approaches?

ReContext requires no model retraining and integrates directly into existing inference pipelines, drastically reducing deployment barriers and computational costs. Across eight long-context datasets with up to 128K context length, it achieved the best average ranking on both Qwen3 and Llama3 model families.

What practical applications does ReContext support and what is its industry impact?

It applies to long document analysis, complex code comprehension, and legal text retrieval. Its approach of evidence replay based on internal attention signals offers new perspectives for future research, demonstrating that optimizing information flow during inference can significantly improve long-context performance without scaling model size.

ReContext：基於遞歸證據重放的長上下文推理新範式

針對大語言模型在長上下文場景中「有訪問無利用」的痛點，本文提出了一種免訓練的推理增強方法ReContext。該方法利用模型內部的注意力相關性信號，構建查詢條件化的證據池，並在最終生成前進行遞歸重放，從而在不修剪上下文、不引入外部記憶的情況下，顯著提升模型對長文本中關鍵證據的提取與利用能力。基於聯想記憶的理論分析揭示了其內在機制——將上下文視為記憶庫，問題作為檢索線索，注意力機制實現線索與記憶的關聯，重放過程則相當於記憶痕跡的重新激活。在涵蓋八種長上下文數據集、上下文長度達128K的廣泛實驗中，ReContext在Qwen3和Llama3系列模型上均取得最佳平均排名，證明了其在提升長文本推理性能方面的通用性與有效性，為開源社區提供了無需重新訓練即可優化長上下文能力的實用工具。

Sources

arXiv