ParamMem: Teaching LLM Agents to Self-Improve via Parametric Reflective Memory

Self-reflection lets language agents iteratively refine solutions, but current approaches often produce repetitive outputs that plateau quickly. ParamMem solves this with a parametric memory module that encodes cross-sample reflection patterns directly into model parameters.

The key insight: instead of storing reflections as text (which LLMs tend to repeat), encode them as learned parameters that enable diverse reflection generation through temperature-controlled sampling. The full framework, ParamAgent, combines this parametric memory with episodic memory (single-task history) and cross-sample memory (patterns across tasks).

Experiments on code generation, mathematical reasoning, and multi-hop QA show consistent improvements over SOTA baselines. Notably, ParamMem is sample-efficient, supports weak-to-strong transfer (smaller model improves larger one), and enables self-improvement without relying on stronger external models. This addresses a fundamental limitation of current agent architectures.

Self-reflection is core to LLM Agent improvement, but existing mechanisms have a fundamental problem: **reflections become increasingly repetitive**, plateauing after a few rounds of essentially the same feedback.

Problem Analysis

Empirical analysis reveals a strong positive correlation between reflective diversity and task success. The root cause: text-based reflection memory is easily "echoed" by LLMs — models tend to generate reflections similar to previous ones.

The ParamMem Approach

Core idea: encode reflections into **model parameters** instead of storing them as text.

The parametric memory module is implemented by fine-tuning a small model on cross-sample reflection data. During generation, temperature controls diversity — the same experience produces reflections from different angles.

ParamAgent Framework

Three-layer memory architecture:

1. **Parametric Memory** (ParamMem): encodes cross-task reflection patterns

2. **Episodic Memory**: single-task attempt history

3. **Cross-sample Memory**: successful experiences from similar tasks

Results

Consistent improvements over SOTA on HumanEval (code generation), MATH (reasoning), and HotpotQA (multi-hop QA). Key properties: sample-efficient training, weak-to-strong transfer across model scales, and self-improvement without stronger external models.

Why It Matters

This addresses a fundamental bottleneck in Agent reflection. For complex iterative tasks (debugging, math solving, multi-step reasoning), reflection diversity directly determines the Agent's ceiling.

Significance for the Agentic AI Era

In 2026's agentic AI explosion, Agent memory systems are core infrastructure. Current RAG (Retrieval-Augmented Generation) solves "what to remember" but not "how to reflect." ParamMem fills this gap as a "reflection-augmented generation" mechanism, enabling AI Agents to not just recall information but learn from experience. This self-improving AI capability is key to Agents evolving from "tools" to "assistants."

In-Depth Analysis and Industry Outlook

From a broader perspective, this development reflects the accelerating trend of AI technology transitioning from laboratories to industrial applications. Industry analysts widely agree that 2026 will be a pivotal year for AI commercialization. On the technical front, large model inference efficiency continues to improve while deployment costs decline, enabling more SMEs to access advanced AI capabilities. On the market front, enterprise expectations for AI investment returns are shifting from long-term strategic value to short-term quantifiable gains.