Home News Latest Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Question Answering On Strategyqa

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Model Name	Accuracy	Paper Title	Repository
Rethinking with retrieval (GPT-3)	77.73	Rethinking with Retrieval: Faithful Large Language Model Inference
SearchChain	-	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
PaLM 540B	76.4	Transcending Scaling Laws with 0.1% Extra Compute	-
Least-to-Most	-	Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Self-Evaluation Guided Decoding (Codex, CoT, single reasoning chain, 6-shot gen, 4-shot eval)	77.2	-	-
SearchChain	-	Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks
CoA w/o actions	-	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
CoA	-	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Least-to-Most	-	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
U-PaLM 540B	76.6	Transcending Scaling Laws with 0.1% Extra Compute	-
Minerva 540B	61.9	Transcending Scaling Laws with 0.1% Extra Compute	-
PaLM 2 (few-shot, CoT, SC)	90.4	PaLM 2 Technical Report

0 of 12 row(s) selected.