视频问答
Video Question Answering(VQA)是一种结合计算机视觉与自然语言处理技术的任务,旨在通过分析视频内容来准确回答用户提出的与视频相关的问题。其目标是实现对视频中视觉和语言信息的深度融合理解,从而提供精准、高效的信息检索和交互体验。VQA在智能视频助手、教育平台、娱乐系统等领域具有重要的应用价值。
NExT-QA
LLaMA-VQA (33B)
ActivityNet-QA
FrozenBiLM
MVBench
ST-LLM
TVBench
Tarsier-34B
STAR Benchmark
VLAP (4 frames)
MSRVTT-QA
FrozenBiLM
AGQA 2.0 balanced
GF (sup) - Faster RCNN
How2QA
Text + Text (no Multimodal Pretext Training)
iVQA
FrozenBiLM
MSRVTT-MC
Singularity-temporal
IntentQA
VideoChat2_mistral
Perception Test
InternVideo2 (8B)
SUTD-TrafficQA
TVQA
LLaMA-VQA
WildQA
LSMDC-MC
VIOLETv2
NExT-QA (Efficient)
ViLA (3B, 4 frames)
RoadTextVQA
GIT
DramaQA
Howto100M-QA
TimeSformer
LSMDC-FiB
Clover
MSR-VTT
MSR-VTT-MC
ATP (1<-16)
MSVD-QA
TGIF-QA
VideoQA
Just Ask (fine-tune)
VLEP