Question Answering On Kilt Eli5
评估指标
F1
Rouge-L
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | F1 | Rouge-L |
---|---|---|
kilt-a-benchmark-for-knowledge-intensive | 17.88 | 17.41 |
kilt-a-benchmark-for-knowledge-intensive | 16.1 | 19.08 |
an-efficient-memory-augmented-transformer-for | 19.03 | 20.91 |
read-before-generate-faithful-long-form | 24.53 | 27.13 |
knowledge-infused-decoding-1 | - | 26.3 |
hurdles-to-progress-in-long-form-question | 23.1 | 23.4 |
kilt-a-benchmark-for-knowledge-intensive | 14.51 | 14.05 |