What is Epi2Diff and how does it predict difficulty?

Epi2Diff is a framework that predicts human-assigned item difficulty using reasoning traces from large reasoning models. It maps traces to cognitive fragments and quantifies difficulty by modeling inference scale, effort allocation, and state transitions across reasoning steps.

Why is this approach important for educational assessment?

It eliminates costly human calibration while providing explainable, process-based evidence for difficulty. This enables automated, scalable item difficulty prediction that shifts educational measurement from result-oriented to process-oriented.

What key finding should researchers watch for?

High-difficulty items trigger more iterative, implementation-centered cognitive dynamics rather than simply longer responses, showing that difficulty stems from cognitive strategy adjustment and repeated verification, not just text generation volume.

Epi2Diff：基於大模型推理軌跡認知片段預測人類題目難度

本文提出 Epi2Diff 框架，旨在解決教育評估中人類題目難度預測的難題。傳統方法依賴昂貴的人工校準或僅利用題目文本語義，難以捕捉解題過程中的認知負擔。Epi2Diff 利用大型推理模型（LRM）生成的推理軌跡，將其映射為具有認知意義的片段序列，透過建模推理規模、努力分配及狀態轉換來量化難度。在四個真實人類難度資料集上的實驗表明，該方法顯著優於微調小語言模型、LLM 上下文學習及監督微調基線。在 SAT 衍生基準上，其相對增益達 8.1%。分析顯示，高難度題目引發更多迭代性和實施中心型的認知片段動態，而非單純延長回答長度，為教育測量提供了可解釋的新視角。

Sources

arXiv