Referring Expression Segmentation
Referring Expression Segmentation aims to perform pixel-level annotation of specific object instances in images or videos through linguistic expressions. This task requires that the referring expression (RE) can uniquely identify the target object in the scene or dialogue, ensuring the accuracy and uniqueness of the annotation. This technology has significant application value in human-computer interaction, image editing, and content understanding.
RefCoCo val
CRIS
Refer-YouTube-VOS (2021 public validation)
GLEE-Pro
RefCOCO+ val
HyperSeg
RefCOCO+ testA
LAVT
RefCOCO+ test B
A2D Sentences
ACGA
J-HMDB
SgMg (Video-Swin-B)
RefCOCOg-val
MLCD-Seg-7B
DAVIS 2017 (val)
RefVOS
RefCOCOg-test
PolyFormer-L
RefCOCO testA
RefCOCO testB
EVP
PhraseCut
MDETR ENB3
RefCOCO
DETRIS
ReferIt
PolyFormer-L
Refer-YouTube-VOS
RefVOS-Human REs
Referring Expressions for DAVIS 2016 & 2017
MUTR
A2Dre test
RefVos
CLEVR-Ref+
IEP-Ref (700K prog.)
G-Ref val
G-Ref test B