Kumiho: Graph-Native Cognitive Memory for AI Agents with Formal Belief Revision Semantics

AI agents finally have a rigorous memory architecture. Kumiho introduces a graph-native cognitive memory system grounded in formal AGM belief revision semantics (postulates K*2-K*6 proven), implemented as a dual-store model: Redis working memory + Neo4j long-term graph. Structural primitives—immutable revisions, mutable tag pointers, typed dependency edges, URI addressing—simultaneously serve cognitive memory and versioned agent asset management. Achieves F1=0.565 on LoCoMo and 93.3% accuracy on LoCoMo-Plus (vs. Gemini 2.5 Pro at 45.7%), with 97.5% adversarial refusal—a natural consequence of belief revision semantics: the graph contains no fabricated content.

Kumiho: A Graph-Native Cognitive Memory Architecture for AI Agents Based on AGM Belief Revision

The Core Problem:

Memory as a First-Class Engineering Concern As large language models evolve from conversational chatbots into autonomous AI agents, a fundamental engineering challenge emerges: how can agents reliably remember past interactions, correct outdated beliefs, and share knowledge within multi-agent workflows? Current mainstream approaches fall short in critical ways. Expanding context windows provides more working space per invocation but offers no cross-session persistence, no belief versioning, no provenance tracking, and no mechanism for expressing dependency relationships—the difference between a whiteboard and a filing system. Pure vector database solutions enable semantic retrieval but lack version control, conflict resolution, and structured provenance. MemGPT/Letta pioneered virtual context extension and later introduced Git-backed memory filesystems (Letta Context Repositories, February 2026), but Git's file-level diff model cannot semantically resolve contradictory beliefs—merges still require human or LLM intervention on text diffs. Kumiho (Young Bin Park, 2026) presents a comprehensive solution from theory to implementation, establishing a rigorous formal correspondence between the AGM belief revision framework and the operational semantics of a property graph memory system. The paper is available at arXiv:2603.17244.

Dual-Store Architecture: Redis Working

Memory + Neo4j Long-Term Graph Kumiho implements a cognitively-inspired dual-store model: #

Working Memory Layer

(Redis) - Stores ephemeral information and intermediate results for the current session - Sub-millisecond read/write latency for active agent operations - Asynchronously consolidated into long-term storage at session boundaries and during background processing #

Long-Term Graph Store

(Neo4j) - Persistent property graph database storing all historical memories as structured nodes - Every memory node carries a URI address, complete revision history, and provenance edges to source evidence - Supports typed dependency relationship traversal across the entire memory graph - Graph topology enables multi-hop reasoning that pure vector retrieval cannot provide #

Hybrid Retrieval Strategy

The system combines three retrieval modalities: full-text retrieval (BM25) for keyword-based matching, vector semantic retrieval for embedding-based similarity search, and graph traversal for topology-aware navigation following typed edges. This multi-modal approach substantially outperforms pure vector database solutions by adding the topological dimension—retrieving not just semantically similar memories but also causally related, temporally adjacent, or structurally dependent ones. #

Safety-Hardened Asynchronous Consolidation Pipeline

The background consolidation process ("dream state") continuously merges working memory contents into the long-term graph, protected by novel safety mechanisms adapted from distributed systems and content management: - Published-item protection: Prevents overwriting of confirmed/approved memories - Circuit breakers: Automatically halt runaway consolidation loops - Dry-run validation: Pre-consolidation checks before committing changes - Auditable cursor-based resumption: Checkpointed state enables safe restart after interruptions

Formal AGM Belief Revision Correspondence

The central theoretical contribution of the paper is a structural correspondence between AGM belief revision postulates and graph-native memory operations. The AGM framework (Alchourrón, Gärdenfors, Makinson, 1985) establishes the mathematical foundation for rational belief change. Kumiho formally proves that its property graph memory system satisfies the basic AGM postulates at the belief base level (Hansson, 1999): **K*2 (Success)**: The new belief must be in the revised belief set. Graph mapping: The new revision node necessarily contains the input content; write is atomic. **K*3 (Inclusion)**: The revised belief set is a subset of the closure of the old set plus the new belief. Graph mapping: The new revision node inherits unretracted edges from its predecessor revision. **K*4 (Vacuous Success)**: If the new belief is consistent with current beliefs, no unnecessary content is added. Graph mapping: If content is compatible (no semantic conflict detected), only a tag pointer update is performed—no new revision node is created. **K*5 (Consistency)**: The revised set remains consistent unless the new belief itself is contradictory. Graph mapping: Conflict detection executes before write; Supersedes edges guarantee linear revision history with no circular dependencies. **K*6 (Extensionality)**: Logically equivalent inputs produce equivalent revision outcomes. Graph mapping: Content hashing ensures semantically equivalent inputs produce the same revision node. The system also satisfies Hansson's belief base postulates (Relevance and Core-Retainment), with a principled rejection of the Recovery postulate: because revisions are immutable, superseded beliefs are archived (permanently addressable via their revision URI) but not "recovered" into the active belief base. This design choice aligns with enterprise AI governance requirements where audit trails must be preservable but deprecated beliefs should not silently re-enter the active knowledge base. A critical design decision: the formal results are scoped to propositional logic over ground triples rather than description logics. Flouris et al. (2005) proved impossibility results for AGM-style belief revision in description logics (full OWL/RDF expressiveness). By constraining expressiveness, Kumiho achieves tractability and avoids these impossibility results while maintaining sufficient power for practical agent memory use cases.

Three Structural Primitives

Kumiho's core data model is built on three complementary structural primitives that simultaneously serve cognitive memory and versioned asset management: #

1. Immutable Revisions Every knowledge

update creates a new revision node rather than modifying existing data. Revision nodes are linked via directed Supersedes edges forming a linear revision chain. Prior revisions are permanently archived—always addressable via their URI, never deleted. This directly enables the AGM belief revision semantics: the system can always answer "what did this agent believe at time T?" and "when and why was this belief changed?" This contrasts sharply with vector database approaches (which typically overwrite or mark-delete old embeddings) and even Git-based approaches (which operate at file diff granularity rather than semantic belief granularity). #

2. Mutable Tag Pointers Analogous

to Git branch heads, tag pointers designate the "current version" of a knowledge chain. Tags (latest, approved, staging, published) can be moved to point to different revisions without creating new revision nodes. This enables multiple agents to pin to different approved versions of the same knowledge, staged deployment of belief updates (staging → approved → published), and operators to roll back to a prior revision by simply moving a tag pointer. #

3. Typed Dependency Edges Relationships

between nodes are expressed through semantically typed edges: - Supersedes: Revision replacement (establishes revision lineage) - DerivedFrom: Inference provenance (tracks which memories were used to generate a conclusion) - Contradicts: Semantic conflict marking (triggers conflict resolution) - RelatedTo: Associative relationships These typed edges enable the AnalyzeImpact operation to propagate revision effects across all edge types simultaneously—something impossible with flat vector stores. #

The Dual-Purpose Design

The same graph primitives that store cognitive memories also manage agent work products. Downstream agents locate inputs via URI resolution (deterministic, no semantic search needed), track current versions via tag pointer queries, and link their outputs back to inputs via typed DerivedFrom edges—while human operators audit the entire chain using the same SDK and inspection tools. This eliminates the need for separate memory systems and asset tracking systems.

URI-Based Universal Addressing

Every memory node is assigned a hierarchical URI at creation: ``` kumiho://space_id/entity_id@revision_id (Pinned revision reference) kumiho://space_id/entity_id#tag_name (Tag-relative reference, mutable) ``` This is the first agent memory system to implement structured, hierarchical URIs with these properties. URI addressing enables deterministic cross-agent memory references without semantic search, citeable revisions (every belief state is permanently addressable), and traceable provenance chains via graph traversal from any conclusion to its evidential sources.

Benchmark Results: State-of-the-Art

Performance #

LoCoMo Benchmark (Long Conversation Memory)

LoCoMo tests agent memory over extended conversations with temporal reasoning, relationship tracking, and adversarial probing: - Four-category retrieval F1: 0.447 (highest reported score across retrieval categories, n=1,540) - Adversarial refusal accuracy: 97.5% (n=446) - Overall F1 including adversarial (binary scoring): 0.565 (n=1,986) The near-perfect adversarial refusal rate is a structural consequence of the belief revision architecture: the memory graph contains no fabricated information by construction, so there is nothing for the answer model to hallucinate from. This is fundamentally different from systems that attempt to detect and filter hallucinations after the fact. #

LoCoMo-Plus Benchmark (Implicit Constraint Recall, Level-2)

LoCoMo-Plus is a more demanding benchmark specifically designed to test implicit constraint recall under intentional cue-trigger semantic disconnect—the trigger phrase used at query time intentionally differs semantically from the language used when the memory was stored. - Judge accuracy: 93.3% (n=401, all four constraint types) - vs. Best published baseline (Gemini 2.5 Pro): 45.7% — Kumiho exceeds by +47.6 percentage points - Recall accuracy: 98.5% (395/401) - Remaining 6.7% end-to-end gap: entirely attributable to answer model fabrication on correctly retrieved context (78% of failures), not retrieval failure Independent reproduction by the benchmark authors yielded results in the mid-80% range—still substantially outperforming all published baselines. #

Three Architectural Innovations Driving

Results **1. Prospective Indexing** At write time, an LLM generates hypothetical future query scenarios for each memory entry and indexes them alongside the memory summary. This explicitly bridges the cue-trigger semantic gap that defeats LoCoMo-Plus baselines: when a user asks about "vacation plans" but the stored memory used the phrase "summer trip schedule," the prospectively indexed hypothetical query covers this semantic variant. This innovation eliminated the 6-month accuracy cliff (37.5% → 84.4% on temporally distant memories). **2. Event Extraction** Structured events with explicit consequences are appended to memory summaries during consolidation. Standard narrative summarization drops causal connectives and temporal ordering; event extraction preserves them as structured data, maintaining the causal detail necessary for accurate constraint recall. **3. Client-Side LLM Reranking** After retrieval returns a candidate set of memory revisions, the consuming agent's own LLM selects the most relevant sibling revision from structured metadata. This adds a semantic selection step at zero additional inference cost (reusing the agent's already-active LLM context). The reranking step particularly benefits cases where multiple revision generations of the same entity are retrieved. #

AGM Compliance Verification

An automated test suite of 49 scenarios across 5 categories verifying operational adherence to all 7 claimed postulates (K*2–K*6, Relevance, Core-Retainment), including adversarial edge cases (rapid sequential revisions, deep dependency chains, mixed edge types). 100% pass rate confirms the implementation faithfully executes the formal specification.

Comparison

with Alternative Approaches **vs. MemGPT/Letta**: Letta's Git approach uses file-level merges that can detect but not semantically resolve contradictory beliefs. Kumiho's AGM-compliant belief revision operators resolve conflicts at the structural level by creating new revision nodes with Supersedes edges. Git primitives (branches, commits, file trees) don't naturally encode typed cognitive relationships between memories. **vs. Vector databases**: Vector DB approaches lack versioning entirely, have no provenance tracking, cannot express typed dependency relationships, and have no formal basis for conflict resolution. They are retrieval systems, not memory systems in the cognitive sense. **vs. Graphiti/Zep**: Graphiti shares the Neo4j substrate and temporal knowledge graph approach. Key differences: Graphiti lacks formal belief revision correspondence, lacks URI addressing for deterministic cross-system references, and processes/stores full conversation content server-side vs. Kumiho's BYO-storage design that keeps raw data on user's local storage. **vs. MAGMA**: MAGMA proposes a multi-graph architecture with four orthogonal graph layers (semantic, temporal, causal, entity), achieving the highest LoCoMo judge score of 0.70 with policy-guided retrieval traversal. MAGMA's design disentangles memory dimensions into separate graphs for cleaner retrieval routing, whereas Kumiho unifies all relationships in a single property graph with typed edges, enabling cross-dimensional traversal.

Model Decoupling and Cost

The architecture is deliberately model-decoupled. Switching the answer model from GPT-4o-mini (~88% accuracy) to GPT-4o (93.3%) improves end-to-end accuracy by 5.3 points without any pipeline changes. Recall accuracy is a property of the architecture, not the answer model—the architecture guarantees what reaches the answer model is relevant and accurate; the model's inherent capability determines how well it uses that retrieved context. Total cost for processing 401 LoCoMo-Plus entries: approximately $14 using GPT-4o-mini for the bulk of LLM operations.

Implementation and Availability The

Kumiho core graph server is available as a cloud service at kumiho.io. The Python SDK, MCP (Model Context Protocol) memory plugin, and benchmark suite are open-source at github.com/KumihoIO. The BYO-storage design philosophy keeps raw conversation data entirely on the user's local storage, with only structured summaries and graph relationships processed by the cloud service. This addresses enterprise data governance requirements while achieving state-of-the-art retrieval accuracy.

Significance

Kumiho's significance lies not in any single isolated innovation but in the systematic unification of formal belief revision semantics with practical engineering—validated on real benchmarks. It represents the first agent memory system to simultaneously achieve: (1) formal mathematical grounding with AGM postulate satisfaction proofs, (2) graph-native implementation with immutable audit trails, (3) state-of-the-art retrieval performance across multiple benchmarks, (4) zero-hallucination adversarial behavior as an architectural property, and (5) unified cognitive memory and agent work product management in a single graph. Open questions include extending the formal correspondence to supplementary postulates K*7 and K*8, validating the asset management unification in actual multi-agent pipeline deployments, and addressing anticipatory pre-computation as a complement to the current prospective indexing approach.

Sources

arXiv