What norm-fixation problem does this research address?

When transmitting coordinate-indexed objects like steering vectors across checkpoints, RMSNorm architectures require fixing sign-permutation group B_d symmetry—permutation-only alignment is incomplete and causes systematic errors.

Why does this matter for model interpretability?

Many existing interpretability tools assume LayerNorm-style permutation symmetry and fail on RMSNorm models, producing unreliable results. The study proves B_d-based alignment recovers 91.1% of cross-run coordinates vs 60.3% for endpoint matching.

What should researchers and practitioners watch for?

All interpretability claims must state their norm assumptions explicitly to be reproducible. The community needs B_d-aware alignment methods, and practitioners should verify sign-consistency in model merging and fine-tuning workflows.

RMSNorm Transformer 的符號排列座標傳輸與規範固定研究

本文深入探討了現代大語言模型工作流程中跨檢查點傳輸座標索引物件（如引導向量、稀疏自編碼器等）時的規範固定問題。研究指出，RMSNorm 架構的殘差流規範具有符號排列群 $B_d$ 的對稱性，而僅使用排列對齊是不完整的。為此，作者引入了符號邊緣匈牙利配對演算法，證明了原始符號相關配對在去相關座標下存在結構性準確率上限，並透過符號邊緣化消除了這一限制。實驗表明，在相同基線的微調軌跡中，透過組合局部 $B_d$ 規範進行座標保持傳輸，能在 1500 步時恢復 91.1% 的跨執行座標，顯著優於端點配對的 60.3%。在 TinyLlama SAE 重建、Qwen 情感引導及拒絕引導等任務中，$B_d$ 規範下的表現遠超僅排列對齊。此外，該框架還證明了狀態訓練中的符號傳輸能保持軌跡一致性，並揭示了可解釋性聲明必須相對於顯式規範才具可重複性。

Sources

arXiv