What is Google's Gemini Omni?

A new unified-architecture multimodal AI that converts any input type directly to any output type, replacing siloed specialized models.

It drastically lowers multimedia creation costs but exponentially increases deepfake misuse risks, overwhelming current moderation tools.

What should we watch next?

How Google balances openness with safety, and whether the industry establishes universal deepfake watermarking and detection protocols.

實測 Google 全新 Gemini Omni：萬物互轉的 AI 模型太瘋狂

The Verge 對 Google 最新發布的 Gemini Omni 多模態模型進行了實測。該模型號稱能實現「任何輸入到任何輸出」的自由轉換，無需像傳統多模態模型那樣依賴預定義的模態路徑。實測中，作者將一張孩子的毛絨玩偶照片生成了充滿創意的「度假鹿」影片片段，效果堪比 Google 此前廣告中的深偽演示。Gemini Omni 的突破在於其統一架構——不再需要為圖像、文本、音頻、影片分別訓練獨立的轉換模組，而是通過一個通用模型實現跨模態的自由組合。這意味著未來 AI 創作將更加靈活，一個模型即可勝任圖文互轉、語音變影片、文本轉動畫等多種任務。不過文章也指出，這種能力同時帶來了濫用風險，尤其是深度偽造方面的倫理問題仍需社會層面的規範。

Sources

The Verge AI