What is SVI-Bench and how does it evaluate video intelligence?

SVI-Bench uses team sports as dynamic micro-worlds, combining 35,000 hours of video with 15 million labeled actions to test perception and strategic planning.

What key findings did the benchmark reveal about current AI models?

Models scored 73% on perception but dropped to 5% on tasks requiring causal reasoning and integrating 1.8 million evidence clips, exposing severe cognitive gaps in multimodal AI.

Why are these results significant for the future of AI development?

The findings show that visual recognition alone cannot handle complex dynamic decision-making. Future AI must evolve toward causal reasoning and strategic simulation.

SVI-Bench：建構戰略影片智慧的動態微世界基準評測

本文提出SVI-Bench，一個旨在評估戰略影片智慧（SVI）的大規模基準測試。SVI超越傳統視覺感知，要求模型具備因果推理、模擬預測及戰略規劃能力。現有基準難以兼顧真實性與可驗證性，SVI-Bench利用團隊運動作為動態微世界，結合真實多智能體交互的複雜性與明確規則的確定性。該基準包含約3.5萬小時廣播影片、1500萬標註動作及籃球、足球、曲棍球的豐富結構化資料，涵蓋9項從動態場景理解到代理合成的任務。實驗顯示，模型在感知任務上表現尚可（細粒度動作問答準確率达73%），但在因果推理和戰略規劃層面出現顯著的能力懸崖——最強模型在自主整合180萬片段證據的代理任務中準確率僅5%，揭示了當前多模態模型在深層認知能力上的巨大缺口。

Sources

arXiv