视觉问答 (VQA)
Visual Question Answering (VQA) 是计算机视觉领域的一项任务,旨在通过自然语言回答关于图像的问题。该任务的核心目标是使机器能够理解图像内容,并以准确、连贯的语言形式提供答案。VQA 在人机交互、智能辅助和内容理解等方面具有重要应用价值,能够显著提升机器的视觉认知能力。
GQA Test2019
VQA v2 test-dev
Oscar
VQA v2 test-std
BEiT-3
OK-VQA
MetaLM
MSVD-QA
HCRN
MSRVTT-QA
HCRN
DocVQA test
Human
InfographicVQA
Gemini Ultra (pixel only)
GQA test-dev
CFR
VizWiz 2020 VQA
A-OKVQA
CLEVR
NS-VQA (1K programs)
COCO Visual Question Answering (VQA) real images 1.0 open ended
InfiMM-Eval
GPT-4V
IconQA
Patch-TRM
TextVQA test-standard
PaLI
VCR (Q-A) test
VQA v2 val
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
COCO Visual Question Answering (VQA) real images 1.0 multiple choice
MCB 7 att.
VizWiz 2018
LXR955, No Ensemble
VQA-CP
CSS
VQA-CE
RandImg
VLM2-Bench
VCR (QA-R) test
UNITER (Large)
GQA test-std
ProTo
VCR (Q-AR) test
GPT4RoI
VQA v1 test-dev
SAAA (ResNet)
IllusionVQA
InfoSeek
VizWiz 2020 Answerability
VQA v1 test-std
SAAA (ResNet)
WHOOPS!
CLEVR-Humans
MDETR
QLEVR
MAC
AutoHallusion
GPT-4V
COCO Visual Question Answering (VQA) real images 2.0 open ended
HDU-USYD-UNCC
COCO Visual Question Answering (VQA) abstract images 1.0 open ended
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice
PlotQA-D1
PlotQA-D2
Visual7W
CMN
HallusionBench
GPT-4V
AI2D
PMC-VQA
F-VQA
ZS-F-VQA
FigureQA - test 1
PReFIL
VCR (Q-A) dev
VL-BERTLARGE
VCR (Q-AR) dev
VL-BERTLARGE
VCR (QA-R) dev
VL-BERTLARGE
DocVQA val
BERT LARGE Baseline
GQA
PEVL+
GRIT
TDIUC
Accuracy
TGIF-QA
VQA-X
Visual Genome (subjects)
Visual Genome (pairs)
CMN
VizWiz 2018 Answerability
ZS-F-VQA
SAN † - hard mask
ActivityNet
BLIP-2 T5
ArtQuest
PrefixLM with CLIP and T5
COCO
CORE-MM
DeepForm
DocVQA
DVQA test-familiar
PReFIL (Oracle OCR)
EgoSchema
Lyra-Pro
ImageNet
MM-Vet
MME
MVBench
OVAD benchmark
RetVQA
MI-BART
TextVQA
Video MME
WebSRC