ZO-Act is a zeroth-order fine-tuning method that bypasses backpropagation by analyzing input activation patterns to build low-rank subspaces, optimizing only a lightweight coefficient matrix via forward-pass loss evaluation to drastically reduce memory and compute overhead.

Why does ZO-Act matter?

It enables compatibility with momentum optimizers like Adam, natively supports quantized models, and offers an efficient path for real-time model adaptation on edge devices and resource-constrained terminals without prohibitive backward-pass costs.

What should we watch next?

As zeroth-order optimization theory matures and hardware accelerators advance, ZO-Act is poised to become a standard method for efficient LLM fine-tuning, accelerating AI deployment across diverse real-world scenarios.

ZO-Act: Activation-Informed Zeroth-Order Efficient Fine-Tuning Method

This paper proposes ZO-Act, an efficient zeroth-order fine-tuning method designed to address optimization challenges in large language models when backpropagation is unavailable or memory is constrained. Existing zeroth-order methods typically perturb full weights or random subspaces, leading to high variance in gradient estimates and limited performance. ZO-Act innovatively uses input activation values to construct low-rank subspaces, computing activation bases only once at initialization and then optimizing only a lightweight coefficient matrix. By performing optimization through forward-pass loss evaluation, the method significantly reduces the effective perturbation dimensionality, enabling variable compatibility with momentum optimizers like Adam and natively supporting fine-tuning of quantized models. Experiments on Llama-3-8B, OPT-13B, and their INT4 quantized variants demonstrate that ZO-Act significantly outperforms strong baselines across language understanding, question answering, and commonsense reasoning tasks, demonstrating significant potential for fine-tuning large models in resource-constrained settings.

Background and Context

The fine-tuning of Large Language Models (LLMs) has traditionally relied on backpropagation-based optimization algorithms, which necessitate substantial memory overhead to store intermediate activation values and gradients. This requirement presents a significant barrier in scenarios where memory is constrained, such as on edge devices, mobile terminals, or in privacy-sensitive environments where the computational cost of backward passes is prohibitive. Zeroth-Order (ZO) optimization has emerged as a critical alternative in these contexts, as it estimates gradients solely through forward-pass loss evaluations, thereby eliminating the need for explicit gradient computation via backpropagation. Despite its theoretical appeal, existing ZO fine-tuning methods have suffered from substantial performance limitations. Most current approaches either perturb the entire model weight matrix or utilize randomly generated low-dimensional subspaces for updates. These strategies result in high-variance gradient estimates and slow convergence rates, which severely restrict the final performance of the fine-tuned models compared to their fully fine-tuned counterparts.

To address these persistent challenges, the ZO-Act method introduces a novel mechanism that leverages input activation information to construct low-rank subspaces for parameter updates. Unlike traditional ZO methods that apply random perturbations across the entire parameter space, ZO-Act analyzes the activation patterns of input data to define a fixed, data-driven subspace. By constraining parameter updates within this activation-informed subspace, the method drastically reduces the dimensionality of the optimization problem. This approach not only stabilizes the optimization process but also significantly enhances the efficiency of gradient estimation. The core innovation lies in the decoupling of the subspace basis calculation from the iterative optimization loop, allowing for a more focused and effective adaptation of the model weights to specific tasks without incurring the memory and computational costs associated with full backpropagation.

Deep Analysis

From a technical implementation perspective, ZO-Act employs a sophisticated yet engineering-friendly architecture designed to maximize efficiency. For each linear layer within the LLM, the method computes a small activation basis matrix only once during the initialization phase. This single computation captures the primary directions of variation present in the input data, effectively identifying the most relevant features for the task at hand. During the subsequent training process, the model weights are represented as a linear combination of this pre-computed activation basis and a lightweight coefficient matrix. Consequently, the optimizer does not update the massive, high-dimensional weight matrices directly; instead, it focuses exclusively on updating the low-dimensional coefficient matrix. This parameterization strategy significantly reduces the effective perturbation dimensionality, which in turn minimizes the variance in gradient estimates and reduces finite-difference errors inherent in ZO methods.

A critical advantage of this parameterization is its compatibility with modern momentum-based optimizers such as Adam. Traditional ZO methods often struggle to integrate momentum effectively due to the noise in gradient estimates, but ZO-Act introduces explicit trainable variables (the coefficient matrix) that allow for the direct application of momentum updates. This integration accelerates convergence and improves optimization stability. Furthermore, ZO-Act natively supports the fine-tuning of quantized models, a feature of immense practical value. Because the low-rank subspace structure allows the original low-bit weights to remain frozen, adaptation is achieved solely through the adjustment of the coefficient matrix. This preserves the memory and computational benefits of quantization while enabling effective task-specific adaptation, thereby avoiding the significant performance degradation typically associated with fine-tuning quantized models using standard ZO techniques.

Industry Impact

The introduction of ZO-Act has profound implications for both the open-source research community and industrial applications. In the open-source ecosystem, the method provides developers with a lightweight tool for fine-tuning LLMs without requiring backpropagation capabilities. This lowers the barrier to entry for experimenting with large model adaptation and fosters further innovation in zeroth-order optimization research. By demonstrating that high-performance fine-tuning is possible without full gradient computation, ZO-Act encourages a broader exploration of resource-efficient training paradigms. The method’s ability to work with quantized models also aligns with the growing industry trend toward deploying efficient, low-power AI applications, offering a viable pathway for adapting models to specific domains without the need for extensive computational resources.

In industrial settings, the demand for deploying LLMs on edge devices, mobile phones, and IoT terminals is increasing, yet memory and compute constraints remain primary bottlenecks. ZO-Act addresses these limitations by reducing memory footprint and computational complexity, making real-time fine-tuning on resource-constrained devices feasible. This is particularly valuable in scenarios requiring rapid adaptation to new tasks or personalized data streams, where the latency and energy costs of traditional fine-tuning are unacceptable. The method’s robustness in maintaining performance on quantized variants, such as INT4 models, further enhances its appeal for production environments where storage and bandwidth are at a premium. By enabling efficient model adaptation in these constrained environments, ZO-Act facilitates the deployment of more responsive and personalized AI services across a wider range of hardware platforms.

Outlook

Experimental validation of ZO-Act has been conducted across several prominent LLM benchmarks, including Llama-3-8B, OPT-13B, and their INT4 quantized variants. The evaluation encompassed a diverse set of tasks, including language understanding, question answering, and commonsense reasoning. Results consistently demonstrated that ZO-Act significantly outperformed strong baseline ZO methods across all metrics. Notably, on quantized models, ZO-Act exhibited exceptional performance retention, confirming its effectiveness in extremely low-resource settings. Ablation studies further highlighted the importance of the activation basis selection and the stabilizing effect of the low-rank structure. The findings indicate that by restricting perturbations to an activation-dominated subspace, the model can more accurately capture task-relevant feature changes, whereas random perturbations tend to introduce noise that misdirects the optimization process.

Looking forward, the success of ZO-Act suggests a promising trajectory for the field of zeroth-order optimization. As theoretical frameworks for ZO methods continue to mature and hardware acceleration technologies evolve, ZO-Act is poised to become a standard technique for efficient LLM fine-tuning. Its ability to bridge the gap between high-performance adaptation and resource efficiency makes it a critical tool for the next generation of AI applications. Future research may explore extensions of the activation-informed subspace concept to other model architectures or integration with advanced quantization schemes. Ultimately, ZO-Act represents a significant step toward democratizing access to large model capabilities, enabling widespread adoption in environments where traditional training methods are impractical.

Sources

arXiv