Quantifying Semantic Entropy in Multimodal Datasets
Quantifying Semantic Entropy in Multimodal Datasets
Created using ChatSlide
Our research aims to measure semantic entropy and quantify multimodal ambiguity by resolving these complexities across various datasets. We explore audio-text datasets including AudioCaps, Clotho, and MSVD/MSR-VTT, which offer diverse semantic content. In accented speech analysis, datasets like VCTK, L2-ARCTIC, and CommonVoice provide valuable data on regional and non-native accents. For video modalities, LRS3, GRID, and AVSpeech present unique challenges and opportunities for lip reading and...