On Symbol-Permutation Coordinate Transport and Norm Fixation in RMSNorm Transformers

This paper addresses the norm ambiguity problem that arises when transporting coordinate-indexed objects (e.g., steering vectors, sparse autoencoders) across checkpoints in modern large language model workflows. It provides an in-depth analysis of symmetry properties across different normalization architectures. The study shows that the residual flow norm group of LayerNorm is the permutation group, while RMSNorm — due to its per-channel gain parameters — has a norm group extended to a signed-permutation group encompassing sign flips. Traditional alignment methods relying solely on permutations are symmetry-incomplete for RMSNorm models. To address this, the authors propose a Hungarian matching algorithm marginalized over signs, proving that raw sign-correlation matching imposes a structural accuracy bound under decorrelated coordinates, which sign marginalization eliminates. Experiments demonstrate that sign-permutation norm recovery achieves 91.1% cross-run coordinate accuracy over a 1500-step fine-tuning trajectory, far exceeding the 60.3% of endpoint matching. This norm transfer significantly improves reconstruction accuracy of TinyLlama sparse autoencoders (NMSE reduced from 1.08 to 0.004) and emotional steering retention (95.8% vs. 17.2%), and reveals the critical role of AdamW state transfer in trajectory consistency during state training, providing a rigorous norm baseline for interpretability research.

Background and Context

In the complex workflows of modern large language models, researchers frequently need to transport coordinate-indexed objects across different model checkpoints. These objects include steering vectors for model editing, bases for sparse autoencoders (SAEs) used in interpretability analysis, sets of top-k neurons selected via importance metrics, attribution lists, and alignment mappings for model merging. However, such cross-checkpoint operations are only well-defined once the model's residual flow gauge is fixed. This paper demonstrates that this gauge dependence is not architecture-neutral but is deeply rooted in the design of normalization layers. Theoretical derivations reveal that models employing LayerNorm possess a residual flow gauge group that is merely the permutation group, allowing for global sign flips. In contrast, models utilizing RMSNorm with general per-channel gain parameters have a gauge group extended to the signed-permutation group. This distinction implies that traditional alignment methods relying solely on permutations are symmetry-incomplete for RMSNorm models, leading to systematic biases in subsequent coordinate-based operations. This finding challenges the assumption of architecture neutrality prevalent in existing toolchains, highlighting potential vulnerabilities in the underlying mathematical structures of many current model editing and interpretability methods.

Deep Analysis

To address this fundamental gauge alignment issue, the authors propose a new method termed "symbol-permutation coordinate transport." The core of this approach focuses on the transport of coordinate preservation rather than function-level merging. Technically, the authors introduce a Hungarian matching algorithm marginalized over signs to handle the sign uncertainty inherent in RMSNorm. Theoretical analysis indicates that if raw sign correlation is used directly for matching, the algorithm encounters a structural accuracy ceiling under decorrelated coordinates, with accuracy limited to the proportion of positive signs in the true gauge. By introducing sign marginalization, this structural limitation is eliminated, allowing the algorithm to recover the true gauge transformation more accurately. Furthermore, the method emphasizes recovering cross-run coordinates by composing local gauges of saved checkpoints along the same baseline fine-tuning trajectory. This strategy avoids the crude practice of directly comparing function values at different checkpoints, instead focusing on the geometric consistency of the underlying coordinate space, thereby mathematically guaranteeing the precision and reversibility of the transport.

Industry Impact

Experimental validation across multiple benchmark tasks and model architectures confirms the efficacy of this method. In a 1500-step fine-tuning trajectory experiment, the cross-run coordinate accuracy achieved via symbol-permutation gauge recovery reached 91.1%, significantly outperforming the 60.3% accuracy of traditional permutation-only endpoint matching. This substantial gain is not merely due to simple routing through baseline nodes but stems from the precise capture of the gauge structure. In the migration of interpretability tools, the results are equally impressive: on the TinyLlama model, the normalized mean squared error (NMSE) for sparse autoencoder reconstruction using symbol-permutation gauge was only 0.004, compared to 1.08 with traditional permutation gauge, indicating a far more accurate reconstruction of neuron activation patterns. In emotional steering tasks, Qwen models retained 95.8% of their steering effect under symbol-permutation gauge, whereas the permutation-only approach retained only 17.2% and even caused sign inversions that rejected steering, completely destroying the original functionality. Ablation studies further confirm that these performance improvements do not arise from changes in model capacity but from the accuracy of gauge alignment.

Outlook

This research has profound implications for the open-source community, industrial deployment, and future studies. First, it reveals that many current interpretability claims based on coordinates, such as the importance of specific neurons, are reproducible only relative to an explicit gauge. This necessitates that researchers explicitly state their gauge choices when reporting results. Second, for industry, the performance of tools for model merging, fine-tuning state recovery, and model editing directly depends on a correct understanding of the underlying gauge structure. Symbol-permutation transport makes the migration of model components across runs and versions more reliable, reducing alignment costs during model iteration. Finally, the study reveals the role of covariance structure in state training, indicating that the sign transport of AdamW states can maintain the recovered training trajectory, whereas permutation-only states lead to trajectory deviation, even if the checkpoints appear functionally identical. This provides a new perspective on understanding the dynamic behavior of optimizer states during fine-tuning, driving the field of large model interpretability and editing from empiricism to rigorous mathematical theory.

Sources

arXiv