RSICCLLM is the first post-training framework based on large vision-language models for remote sensing image change captioning. With only 7 billion parameters, it uses difference-aware supervised fine-tuning and dual-negative preference optimization to outperform much larger baseline models.

Why does RSICCLLM matter?

It proves that in specialized domains like remote sensing, small parameter models can surpass much larger ones through quality data engineering and targeted post-training, significantly reducing deployment and inference costs for practical applications.

What comes next for RSICCLLM?

The team has released the RSICI instruction dataset and RSICP preference dataset along with a dedicated evaluation benchmark. Code and data will soon be open-sourced to advance standardized research in the field.

RSICCLLM：面向遙感圖像變化描述的視覺語言大模型新範式

本文針對遙感圖像變化描述（RSICC）任務中現有方法受限於傳統深度學習架構及模型容量不足的問題，提出了首個基於大視覺語言模型的後訓練框架RSICCLLM。儘管大模型在通用領域表現優異，但直接遷移到遙感場景面臨數據匱乏和細粒度變化理解兩大挑戰。為此，作者設計了數據生成範式並發布了指令數據集RSICI，同時構建了專用評估基準。技術上，引入差異感知監督微調以顯式提取變化表徵，並提出雙負偏好優化（DNPO）策略，通過兩種互補的負樣本建構方式完善偏好數據集RSICP。實驗表明，僅7B參數的RSICCLLM在性能上超越了規模大得多的對比模型，驗證了該方法的高效性與優越性，程式碼與數據將開源。

Sources

arXiv