What is the LAIT study and how does it evaluate literary translations?

The LAIT study uses fifteen experienced readers to compare English translations of fifteen novels, creating a reader-centric framework that exposes gaps between automated metrics and actual reading experiences.

Why do readers prefer human translations even when AI output is deemed passable?

While AI translations were rated adequate, readers favored human work for clarity, readability, and immersion. Human translations also maintained higher consistency compared to the volatile quality of machine outputs.

Why do automated metrics fail to reflect reader preference, and what should developers do?

Automated metrics and LLM-as-judge approaches systematically favor machine translations, missing true reader sentiment. Developers should incorporate direct user feedback mechanisms rather than relying solely on algorithmic scoring.

AI翻譯文學文本雖"合格"，但讀者仍偏好人工翻譯

本文針對AI翻譯在文學領域的實際閱讀體驗展開研究，指出當前自動指標和側重流暢度的人工評估無法準確捕捉讀者的沉浸感與文學效果。研究招募15位資深讀者，對比了15部近期出版的法國語、波蘭語及日語小說的英譯本，涉及人工翻譯（HT）與基於智能體大語言模型（LLM）生成的機器翻譯（MT）。透過沉浸式通讀與逐段精讀兩種實驗條件，共收集約8000字摘錄的對比數據。結果顯示，儘管讀者認為MT質量"尚可"，但在清晰度、易讀性及沉浸感上更偏好HT，尤其在細粒度對比中差異顯著。值得注意的是，讀者難以準確區分兩者，且易受先入為主觀念影響。自動指標包括LLM作為裁決的方法均未能反映真實讀者偏好，反而偏向MT。研究同時發布了LAIT數據集，包含千餘條讀者評論及數千條標註，為文學翻譯評估提供了新基準。

Sources

arXiv