Visual Question Answering (VQA)
Visual Question Answering (VQA) is a task in the field of computer vision that aims to answer questions about images using natural language. The core objective of this task is to enable machines to understand the content of images and provide answers in an accurate and coherent linguistic form. VQA has significant application value in human-computer interaction, intelligent assistance, and content understanding, significantly enhancing the visual cognitive abilities of machines.
GQA Test2019
VQA v2 test-dev
Oscar
VQA v2 test-std
BEiT-3
OK-VQA
MetaLM
MSVD-QA
HCRN
MSRVTT-QA
HCRN
DocVQA test
Human
InfographicVQA
Gemini Ultra (pixel only)
GQA test-dev
CFR
VizWiz 2020 VQA
A-OKVQA
CLEVR
NS-VQA (1K programs)
COCO Visual Question Answering (VQA) real images 1.0 open ended
InfiMM-Eval
GPT-4V
IconQA
Patch-TRM
TextVQA test-standard
PaLI
VCR (Q-A) test
VQA v2 val
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
COCO Visual Question Answering (VQA) real images 1.0 multiple choice
MCB 7 att.
VizWiz 2018
LXR955, No Ensemble
VQA-CP
CSS
VQA-CE
RandImg
VLM2-Bench
VCR (QA-R) test
UNITER (Large)
GQA test-std
ProTo
VCR (Q-AR) test
GPT4RoI
VQA v1 test-dev
SAAA (ResNet)
IllusionVQA
InfoSeek
VizWiz 2020 Answerability
VQA v1 test-std
SAAA (ResNet)
WHOOPS!
CLEVR-Humans
MDETR
QLEVR
MAC
AutoHallusion
GPT-4V
COCO Visual Question Answering (VQA) real images 2.0 open ended
HDU-USYD-UNCC
COCO Visual Question Answering (VQA) abstract images 1.0 open ended
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice
PlotQA-D1
PlotQA-D2
Visual7W
CMN
HallusionBench
GPT-4V
AI2D
PMC-VQA
F-VQA
ZS-F-VQA
FigureQA - test 1
PReFIL
VCR (Q-A) dev
VL-BERTLARGE
VCR (Q-AR) dev
VL-BERTLARGE
VCR (QA-R) dev
VL-BERTLARGE
DocVQA val
BERT LARGE Baseline
GQA
PEVL+
GRIT
TDIUC
Accuracy
TGIF-QA
VQA-X
Visual Genome (subjects)
Visual Genome (pairs)
CMN
VizWiz 2018 Answerability
ZS-F-VQA
SAN † - hard mask
ActivityNet
BLIP-2 T5
ArtQuest
PrefixLM with CLIP and T5
COCO
CORE-MM
DeepForm
DocVQA
DVQA test-familiar
PReFIL (Oracle OCR)
EgoSchema
Lyra-Pro
ImageNet
MM-Vet
MME
MVBench
OVAD benchmark
RetVQA
MI-BART
TextVQA
Video MME
WebSRC