DeBiasRAG is a retrieval-augmented generation framework that debiases large language model outputs without fine-tuning. It dynamically injects fairness constraints through offline-prepared bias contexts and gradient-guided context reordering, preserving the model's inherent capabilities.

Why does DeBiasRAG matter compared to existing debiasing methods?

Traditional fine-tuning is expensive and risks degrading model performance. DeBiasRAG achieves dynamic debiasing at near-zero cost, reducing social biases while preserving factual accuracy and language capabilities.

What are the next steps for DeBiasRAG research?

Future work will explore generalizing the reverse-generation and gradient reordering strategies across diverse domains. The framework opens new directions for parameter-efficient model alignment, with applications in healthcare, legal, and hiring systems where fairness is critical.

DeBiasRAG：無需微調的公平大語言模型檢索增強生成框架

大型語言模型在展現卓越生成能力的同時，往往因訓練資料中的刻板印象而產生涉及種族、性別和年齡的社會偏見。針對現有微調或提示工程方法成本高且可能損害模型原有能力的问题，本文提出DeBiasRAG框架。該方法無需額外訓練，透過檢索增強生成技術實現動態查詢特定的去偏。DeBiasRAG包含三個階段：首先利用離線準備的偏見上下文生成候選，其次反向生成去偏上下文作為公平性約束，最後結合常規檢索的上下文建構候選池，並透過梯度更新指導的上下文片段重排序最佳化結果。實驗表明，該框架在提升生成公平性的同時，有效保留了語言模型的內在表征能力，為動態去偏提供了高效且無損的新路徑。

Sources

arXiv