Paper Presentation for CVPR 2026

Created using ChatSlide

In this presentation, we explore the adaptation of Vision-Language Models (VLMs) like CLIP for Open Vocabulary Dense Prediction (OVDP) by introducing the DenseRC framework. Key insights reveal the importance of value embeddings and the challenges of spatial aggregation, with a focus on head-wise reweighting. We delve into the DenseRC methodology, which balances semantic alignment and coherence using Head-Selective Gating. Our experimental results show state-of-the-art performance on zero-shot...

Make your own slides with ChatSlide