零样本视频问答
Zero-Shot Video Question Answer任务旨在使大语言模型在无需特定训练的情况下,直接对视频内容进行准确的问题回答。该任务属于计算机视觉领域,通过提升模型的跨模态理解能力,实现对未见过的视频数据的即时解析与响应,具有重要的应用价值,特别是在智能对话系统、视频内容检索和自动问答等场景中。
MSRVTT-QA
MovieChat
ActivityNet-QA
MovieChat
MSVD-QA
BT-Adapter (zero-shot)
EgoSchema (fullset)
BIMBA-LLaVA-Qwen2-7B
NExT-QA
Tarsier (34B)
TGIF-QA
IG-VLM
EgoSchema (subset)
Tarsier (34B)
IntentQA
IG-VLM
Video-MME
Gemini 1.5 Pro
TVQA
FrozenBiLM (with speech)
Video-MME (w/o subs)
Video-RAG (based on LLaVA-Video)
NExT-GQA
STAR Benchmark
VideoChat2
Zero-shot Video Question Answering on LongVideoBench
Gemini 1.5 Pro
MVBench
TS-LLaVA-34B
CinePile: A Long Video Question Answering Dataset and Benchmark