Question Answering On Next Qa Open Ended
Metrics
Accuracy
Confidence Score
Results
Performance results of various models on this benchmark
Model Name | Accuracy | Confidence Score | Paper Title | Repository |
---|---|---|---|---|
MovieChat | 49.9 | 2.7 | MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | |
Video-ChatGPT | 54.6 | 3.2 | Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models | |
VideoChat | 56.6 | 3.2 | VideoChat: Chat-Centric Video Understanding | |
Vista-LLaMA | 60.7 | 3.4 | Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens | - |
Flash-VStream | 61.6 | 3.4 | Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | |
MovieChat+ | 54.8 | 3.0 | MovieChat+: Question-aware Sparse Memory for Long Video Question Answering |
0 of 6 row(s) selected.