Home News Latest Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Visual Question Answering On Clevr

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Model Name	Accuracy	Paper Title	Repository
NS-VQA (1K programs)	99.8	Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
MDETR	99.7	MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
CNN + LSTM + RN	95.50	A simple neural network module for relational reasoning
TbD + reg + hres	99.1	Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
OCCAM (ours)	99.4	Interpretable Visual Reasoning via Induced Symbolic Space
CNN + LSTM + RN + HAN	98.8	Learning Visual Question Answering by Bootstrapping Hard Attention	-
MAC	98.9	Compositional Attention Networks for Machine Reasoning
IEP-700K	96.9	Inferring and Executing Programs for Visual Reasoning
NeSyCoCo	99.7	NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization
single-hop + LCGN (ours)	97.9	Language-Conditioned Graph Networks for Relational Reasoning
NS-CL	98.9	The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
QGHC+Att+Concat	65.90	Question-Guided Hybrid Convolution for Visual Question Answering	-
XNM-Det supervised	97.7	Explainable and Explicit Visual Reasoning over Scene Graphs
DDRprog*	98.3	DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer	-
CNN+GRU+FiLM	97.7	FiLM: Visual Reasoning with a General Conditioning Layer

0 of 15 row(s) selected.