What is EnvFactory and how does it address tool-use limitations?

EnvFactory is a fully automated framework that discovers executable tool environments from real-world resources and synthesizes natural multi-turn interaction trajectories, solving data scarcity and instability bottlenecks in Agentic RL.

Why does this research matter? What performance gains were demonstrated?

Using only 85 validated environments, EnvFactory generated 2,575 high-quality training trajectories. Models trained on this data saw up to a 15% performance boost on benchmarks like BFCLv3 and MCP-Atlas.

What are the next steps and broader implications for AI development?

By drastically reducing reliance on costly APIs and manual data annotation, EnvFactory offers a scalable, robust foundation for Agentic RL, accelerating the deployment of autonomous AI agents in complex real-world business scenarios.

EnvFactory：透過可執行環境合成與魯棒強化學習擴展工具使用智能體

本文提出 EnvFactory，一種完全自動化的框架，旨在解決大語言模型智能體強化學習（Agentic RL）獲取工具使用能力時的兩大瓶頸：缺乏可擴展的魯棒執行環境以及缺乏捕捉隱性人類推理的真實訓練數據。現有方法依賴昂貴的真實 API、易產生幻覺的 LLM 模擬器或單輪合成環境，且合成軌跡往往過度指定，類似指令序列而非自然人類意圖。EnvFactory 從真實資源中自主探索並驗證狀態可執行的工具環境，通過拓撲感知採樣和校準細化合成自然的多輪軌跡，生成具有隱性意圖的接地查詢。僅使用 85 個經過驗證的跨 7 個領域的環境，EnvFactory 生成了 2575 個 SFT 和 RL 軌跡。儘管環境數量僅為先前工作的五分之一，該方法在訓練效率和下游性能上均表現優異，使 Qwen3 系列模型在 BFCLv3 上提升高達 15%，在 MCP-Atlas 上提升 8.6%，在 τ²-Bench 和 VitaBench 等對話基準上提升 6%。EnvFactory 為 Agentic RL 提供了可擴展、可擴展且魯棒的基礎。

Sources

arXiv