Posts for: #VLM

「Hidden in plain sight： VLMs overlook their visual representations」の論文紹介

2025-07-28

#ディープラーニング #NLP #LLM #大規模言語モデル #VLM #CLIP #画像「Hidden in plain sight： VLMs overlook their visual representations」の論文紹介

「Hidden in plain sight： VLMs overlook their visual representations」の論文紹介

今回紹介するのは Hidden in plain sight: VLMs overlook their visual representations です.

テキストの生成というよりも画像が中心となるタスクに対し、オープンソースのVisual Language Modelの性能について調査した論文になっています. DINOやCLIPをLLMに組み込んだマルチモーダルモデルは、単体のViT系のモデルよりも性能が大きく下がることを示しています.