Question Answering On Truthfulqa

评估指标

评测结果

各个模型在此基准测试上的表现结果

模型名称	EM	Paper Title	Repository
CoA	67.3	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Gopher 280B (zero-shot, QA prompts)	-	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
LLaMA 65B	-	LLaMA: Open and Efficient Foundation Language Models
GPT-2 1.5B	-	TruthfulQA: Measuring How Models Mimic Human Falsehoods
Shakti-LLM (2.5B)	-	SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments	-
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)	-	Representation Engineering: A Top-Down Approach to AI Transparency
GAL 6.7B	-	Galactica: A Large Language Model for Science
Vicuna 7B + Inference Time Intervention (ITI)	-	-	-
GAL 30B	-	Galactica: A Large Language Model for Science
GAL 1.3B	-	Galactica: A Large Language Model for Science
Gopher 7.1 (zero-shot, QA prompts)	-	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
CoA w/o actions	63.3	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
ToT	66.6	Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Gopher 7.1B (zero-shot, Our Prompt + Choices)	-	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
LLaMa-2-7B-Chat + TruthX	-	TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
GAL 120B	-	Galactica: A Large Language Model for Science
LLaMA 7B	-	LLaMA: Open and Efficient Foundation Language Models
UnifiedQA 3B	-	TruthfulQA: Measuring How Models Mimic Human Falsehoods
Gopher 1.4 (zero-shot, QA prompts)	-	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
GAL 125M	-	Galactica: A Large Language Model for Science

0 of 33 row(s) selected.