What is "plasticity loss" and do large language models really face it?

Plasticity loss is the degradation of a model's ability to learn new information after acquiring new knowledge. The study found it in all tested GPT-architecture Transformer models from 5M to 314M parameters, making it a universal trait of modern Transformers.

Can increasing model scale solve plasticity loss?

No. Research shows the severity of plasticity loss grows sub-linearly with model size, meaning larger models only delay rather than eliminate the problem. Simply stacking parameters cannot fundamentally solve it.

What are the implications for AI development and what's next?

It challenges the "bigger models forget less" mindset. Future work must focus on novel architectures or training algorithms like dynamic sparse activation, memory replay mechanisms, or regularization techniques, not just computing power.

大型語言模型能否靠規模擺脫塑性喪失？多語言持續學習的深度解析

本文系統探討了大型語言模型在持續學習場景中的核心瓶頸——塑性喪失，即模型在習得新資訊後持續學習能力顯著衰退的現象。研究團隊透過在多語言持續學習任務上訓練GPT架構Transformer模型（參數量5M至314M），發現塑性喪失是現代Transformer模型的普遍規律：模型在習得新語言後，此前已掌握的越南語探測任務性能出現顯著下滑。研究進一步揭示，塑性喪失的嚴重程度遵循可預測的縮放規律——隨模型規模增大呈次線性增長。這意味著雖然擴大參數規模能夠延緩塑性喪失的顯現，但僅靠堆疊參數量無法從根本上消除這個問題。更值得注意的是，即使在靜態多語言數據分佈下也觀察到了塑性喪失，這挑戰了該現象僅在劇烈任務切換時才會發生的傳統認知。研究結論對當前以大模型為核心的AI研發路線提出了根本性反思：無論訓練策略如何優化，大型Transformer模型在長期持續訓練後終將面臨適應新數據能力的衰退。

Sources

arXiv