Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

World models simulate environment dynamics for action planning, but conventional tokenizers encode each observation into hundreds of tokens, making decision-time planning computationally prohibitive. Researchers from KAIST and POSTECH introduce CompACT, a discrete tokenizer compressing each image into as few as 8 tokens (approximately 128 bits), compared to the 784 tokens required by SD-VAE in prior navigation world models.

CompACT features two key innovations: a semantic encoder built on frozen DINOv3 features that distills only planning-critical information through cross-attention resampling, and a generative decoder that synthesizes MaskGIT VQGAN intermediate tokens conditioned on the compact representation.

On RECON navigation planning, the 8-token model outperforms previous 64-token approaches while achieving approximately 40x speedup over 784-token baselines. Accepted at CVPR 2026, this work represents a practical step toward real-time deployment of world models.

Planning in 8 Tokens: How CompACT Redefines World Model Efficiency

The Computational Bottleneck

World models predict future states given current observations and actions. However, conventional tokenizers encode each observation into hundreds of tokens. NWM uses SD-VAE encoding each image into 784 tokens, requiring up to 3 minutes per planning episode on RTX 6000 ADA.

Core Hypothesis

CompACT proposes that aggressive compression benefits planning by forcing more abstract, action-relevant representations. Each image is compressed to just 128 bits (8 tokens of 16 bits each).

Architecture

Encoder: Frozen DINOv3 features + learnable query tokens via cross-attention + FSQ quantization. Only planning-critical semantic information is preserved.

Decoder: Generative MaskGIT-style decoding synthesizes 196 intermediate VQGAN tokens conditioned on compact representation.

World Model: Learns p(z_{t+1}|z_t, a_t) via masked generative modeling in compact latent space. Planning via MPC with CEM optimization.

Results

  • RECON navigation: 8-token outperforms 64-token methods, ~40x speedup vs 784-token baseline
  • RoboNet manipulation: comparable accuracy with 16x fewer tokens
  • Key insight: extreme compression improves planning quality

Significance

Accepted at CVPR 2026 (KAIST & POSTECH). Opens path to real-time world model deployment for robotic control.

References

  • arXiv paper
  • CVPR 2026

In-Depth Analysis and Industry Outlook

From a broader perspective, this development reflects the accelerating trend of AI technology transitioning from laboratories to industrial applications. Industry analysts widely agree that 2026 will be a pivotal year for AI commercialization. On the technical front, large model inference efficiency continues to improve while deployment costs decline, enabling more SMEs to access advanced AI capabilities. On the market front, enterprise expectations for AI investment returns are shifting from long-term strategic value to short-term quantifiable gains.

However, the rapid proliferation of AI also brings new challenges: increasing complexity of data privacy protection, growing demands for AI decision transparency, and difficulties in cross-border AI governance coordination. Regulatory authorities across multiple countries are closely monitoring these developments, attempting to balance innovation promotion with risk prevention. For investors, identifying AI companies with truly sustainable competitive advantages has become increasingly critical as the market transitions from hype to value validation.