HyperAI超神经

Question Answering On Truthfulqa

评估指标

EM

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称EM
chain-of-action-faithful-and-multimodal67.3
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
shakti-a-2-5-billion-parameter-small-language-
representation-engineering-a-top-down-
galactica-a-large-language-model-for-science-1-
模型 8-
galactica-a-large-language-model-for-science-1-
galactica-a-large-language-model-for-science-1-
scaling-language-models-methods-analysis-1-
chain-of-action-faithful-and-multimodal63.3
tree-of-thoughts-deliberate-problem-solving-166.6
scaling-language-models-methods-analysis-1-
truthx-alleviating-hallucinations-by-editing-
galactica-a-large-language-model-for-science-1-
llama-open-and-efficient-foundation-language-1-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
galactica-a-large-language-model-for-science-1-
模型 21-
模型 22-
truthfulqa-measuring-how-models-mimic-human-
scaling-language-models-methods-analysis-1-
llama-open-and-efficient-foundation-language-1-
llama-open-and-efficient-foundation-language-1-
gpt-4-technical-report-1-
truthx-alleviating-hallucinations-by-editing-
truthfulqa-measuring-how-models-mimic-human-
galactica-a-large-language-model-for-science-1-
automatic-chain-of-thought-prompting-in-large42.2
scaling-language-models-methods-analysis-1-
representation-engineering-a-top-down-