How much inference speed improvement does ATLAS-2 achieve on DeepSeek-V3.1?

ATLAS-2 achieves up to 4x inference speed improvements on large language models like DeepSeek-V3.1, consistently reaching 500 TPS in high-load scenarios.

What is the core technological innovation of the ATLAS system?

ATLAS's core innovation lies in the clever combination of static and adaptive speculators - static speculators handle common query patterns while adaptive speculators learn from real-time traffic to dynamically adjust prediction strategies.

What is ATLAS's hot-swapping capability?

ATLAS's hot-swapping capability allows dynamic strategy updates without service interruption, achieving millisecond-level hot updates through carefully designed state management and strategy switching mechanisms.

What is the significance of the Aurora open-source framework?

The Aurora framework provides open-source implementation of ATLAS core algorithms and complete toolchains, enabling smaller AI companies and research institutions to benefit from advanced inference acceleration technologies.

Together.ai Launches ATLAS: 4x Faster LLM Inference with Runtime Learning Accelerators

Together.ai officially launched the ATLAS-2 inference accelerator at the AI Native Conference in March 2026, the latest version of an AdapTive-LeArning Speculator System. This system achieves up to 4x inference speed improvements on large language models like DeepSeek-V3.1 through a revolutionary speculative decoding framework, consistently reaching 500 TPS in high-load scenarios. ATLAS's core innovation lies in the clever combination of static and adaptive speculators. Traditional inference acceleration solutions often rely on fixed optimization strategies, struggling to adapt to the varied query patterns in production environments. ATLAS employs a dual speculative strategy: static speculators handle common query patterns through pre-trained model parameters, while adaptive speculators learn from real-time traffic to dynamically adjust prediction strategies. The Aurora open-source framework released alongside ATLAS-2 brings tremendous value to the AI community. Aurora includes not only the open-source implementation of ATLAS core algorithms but also provides complete toolchains and deployment guides, enabling smaller AI companies and research institutions to benefit from advanced inference acceleration technologies. One of ATLAS's most notable features is its hot-swapping capability, allowing dynamic strategy updates without service interruption. Through carefully designed state management and strategy switching mechanisms, ATLAS achieves millisecond-level hot updates, ensuring service continuity. This represents a significant technical advantage for AI service providers with high availability requirements. #

In-Depth Analysis and Industry Outlook From

a broader perspective, this development reflects the accelerating trend of AI technology transitioning from laboratories to industrial applications. Industry analysts widely agree that 2026 will be a pivotal year for AI commercialization. On the technical front, large model inference efficiency continues to improve while deployment costs decline, enabling more SMEs to access advanced AI capabilities. On the market front, enterprise expectations for AI investment returns are shifting from long-term strategic value to short-term quantifiable gains. However, the rapid proliferation of AI also brings new challenges: increasing complexity of data privacy protection, growing demands for AI decision transparency, and difficulties in cross-border AI governance coordination. Regulatory authorities across multiple countries are closely monitoring these developments, attempting to balance innovation promotion with risk prevention. For investors, identifying AI companies with truly sustainable competitive advantages has become increasingly critical as the market transitions from hype to value validation.