HyperAI超神经

首页资讯最新论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Text To Video Generation On Msr Vtt

评估指标

CLIPSIM

FID

FVD

评测结果

各个模型在此基准测试上的表现结果

模型名称	CLIPSIM	FID	FVD	Paper Title	Repository
ModelScopeT2V	0.2930	11.09	550	ModelScope Text-to-Video Technical Report
Video LDM	0.2929	-	-	Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
TF-T2V	0.2991	8.19	441	A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
NUWA	0.2439	47.68	-	NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
PixelDance	0.3125	-	381	Make Pixels Dance: High-Dynamic Video Generation	-
Make-A-Video	0.3049	13.17	-	Make-A-Video: Text-to-Video Generation without Text-Video Data
GODIVA	0.2402	-	-	GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
MMVG	0.2644	23.4	-	Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
CogVideo (English)	0.2631	23.59	-	Make-A-Video: Text-to-Video Generation without Text-Video Data
Snap Video (512x288)	0.2793	-	104.0	Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis	-
VideoPoet	0.3123	-	213	VideoPoet: A Large Language Model for Zero-Shot Video Generation
MagicVideo	-	36.5	998	MagicVideo: Efficient Video Generation With Latent Diffusion Models	-
HiGen	0.2947	8.60	406	Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Video-LaVIT	0.3012	11.27	188.36	Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
CogVideo (Chinese)	0.2614	-	-	Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
VideoComposer	0.2932	-	580	VideoComposer: Compositional Video Synthesis with Motion Controllability
Show-1	0.3072	13.08	538	Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Snap Video (288×288)	0.2793	-	110.4	Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis	-

0 of 18 row(s) selected.