Question Answering On Boolq
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Accuracy |
---|---|
hierarchical-prompting-taxonomy-a-universal | 99.87 |
finetuned-language-models-are-zero-shot | 84.6 |
llama-2-open-foundation-and-fine-tuned-chat | 81.7 |
hungry-hungry-hippos-towards-language | 59.6 |
ask-me-anything-a-simple-strategy-for | 64.9 |
unifying-language-learning-paradigms | 63.1 |
hungry-hungry-hippos-towards-language | 60.6 |
bloomberggpt-a-large-language-model-for | 74.6 |
hierarchical-prompting-taxonomy-a-universal | 99.419 |
boolq-exploring-the-surprising-difficulty-of | 72.87 |
llama-2-open-foundation-and-fine-tuned-chat | 83.7 |
hungry-hungry-hippos-towards-language | 61.7 |
hyena-hierarchy-towards-larger-convolutional | 51.8 |
mixlora-enhancing-large-language-models-fine | 72.7 |
scaling-language-models-methods-analysis-1 | 79.3 |
palm-2-technical-report-1 | 88.1 |
toward-efficient-language-model-pretraining | 90.5 |
opt-iml-scaling-language-model-instruction | 71.4 |
mixlora-enhancing-large-language-models-fine | 75 |
opt-iml-scaling-language-model-instruction | 61.5 |
exploring-the-limits-of-transfer-learning | 76.4 |
llama-2-open-foundation-and-fine-tuned-chat | 77.4 |
toward-efficient-language-model-pretraining | 92 |
bloomberggpt-a-large-language-model-for | 46.4 |
boolq-exploring-the-surprising-difficulty-of | 75.57 |
exploring-the-limits-of-transfer-learning | 81.4 |
llama-open-and-efficient-foundation-language-1 | 76.5 |
llama-open-and-efficient-foundation-language-1 | 85.3 |
alexatm-20b-few-shot-learning-using-a-large | 69.4 |
exploring-the-limits-of-transfer-learning | 91.2 |
mixture-of-subspaces-in-low-rank-adaptation | 74.6 |
muppet-massive-multi-task-representations | 83.8 |
muppet-massive-multi-task-representations | 87.5 |
unifying-language-learning-paradigms | 90.8 |
opt-iml-scaling-language-model-instruction | 60.1 |
llama-2-open-foundation-and-fine-tuned-chat | 85 |
llama-open-and-efficient-foundation-language-1 | 83.1 |
finetuned-language-models-are-zero-shot | 82.9 |
n-grammer-augmenting-transformers-with-latent-1 | 65 |
language-models-are-few-shot-learners | 76.4 |
boolq-exploring-the-surprising-difficulty-of | 71.41 |
ask-me-anything-a-simple-strategy-for | 66.5 |
finetuned-language-models-are-zero-shot | 86.3 |
palm-2-technical-report-1 | 90.9 |
designing-effective-sparse-expert-models | 88.6 |
shakti-a-2-5-billion-parameter-small-language | 61.1 |
hungry-hungry-hippos-towards-language | 56.1 |
hungry-hungry-hippos-towards-language | 56.1 |
opt-iml-scaling-language-model-instruction | 66.9 |
opt-iml-scaling-language-model-instruction | 60.5 |
palm-scaling-language-modeling-with-pathways-1 | 92.2 |
boolq-exploring-the-surprising-difficulty-of | 80.4 |
deberta-decoding-enhanced-bert-with | 90.4 |
designing-effective-sparse-expert-models | 92.4 |
boolq-exploring-the-surprising-difficulty-of | 62.17 |
palm-2-technical-report-1 | 88.6 |
entailment-as-few-shot-learner | 86.0 |
mixlora-enhancing-large-language-models-fine | 77.1 |
language-models-are-few-shot-learners | 60.5 |
bloomberggpt-a-large-language-model-for | 57.5 |
exploring-the-limits-of-transfer-learning | 85.4 |
ask-me-anything-a-simple-strategy-for | 67.2 |
bloomberggpt-a-large-language-model-for | 52.9 |
opt-iml-scaling-language-model-instruction | 64 |
llama-open-and-efficient-foundation-language-1 | 78.1 |
training-compute-optimal-large-language | 83.7 |