Video Object Detection On Imagenet Vid

Metrics

MAP

Results

Performance results of various models on this benchmark

Model Name	MAP	Paper Title	Repository
YOLOV	87.5	YOLOV: Making Still Image Object Detectors Great at Video Object Detection
SELSA (ResNet-101)	82.69	Sequence Level Semantics Aggregation for Video Object Detection
REPP + SELSA (ResNet-101)	84.2	Robust and Efficient Post-Processing for Video Object Detection (REPP)
BoxMask (ResNet-50)	80.7	BoxMask: Revisiting Bounding Box Supervision for Video Object Detection	-
SELSA (ResNeXt-101)	84.3	Sequence Level Semantics Aggregation for Video Object Detection
YOLOV++	93.2	Practical Video Object Detection via Feature Selection and Aggregation
Ours (Faster RCNN + R101)	87.2	Objects do not disappear: Video object detection by single-frame object location anticipation
Ours (Def. DETR + SwinB)	91.3	Objects do not disappear: Video object detection by single-frame object location anticipation
DiffusionVID (ResNet-101)	87.1	DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection
SparseVOD (ResNet-50)	80.3	Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection	-
Online TSM	76.3	TSM: Temporal Shift Module for Efficient Video Understanding
DiffusionVID (Swin-B)	92.5	DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection
LSTS (ResNet-101)	81.7	Learning Where to Focus for Efficient Video Object Detection
REPP + YOLOv3	75.1	Robust and Efficient Post-Processing for Video Object Detection (REPP)
Tracklet-Conditioned Detection+DCNv2+FGFA	83.5	Integrated Object Detection and Tracking with Tracklet-Conditioned Detection	-
TransVOD (Swin Base)	90.1	TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
ClipVID	85.8	Identity-Consistent Aggregation for Video Object Detection
MEGA (ResNeXt101)	85.4	Memory Enhanced Global-Local Aggregation for Video Object Detection
PTSEFormer (ResNet-101)	88.1	PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Looking Fast and Slow	63.9	Looking Fast and Slow: Memory-Guided Mobile Video Object Detection

0 of 33 row(s) selected.