HyperAI超神经

Arithmetic Reasoning On Gsm8K

评估指标

Accuracy
Parameters (Billion)

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称AccuracyParameters (Billion)
lever-learning-to-verify-language-to-code84.5175
composing-ensembles-of-pre-trained-models-via16.80.355
tora-a-tool-integrated-reasoning-agent-for72.67
openmathinstruct-1-a-1-8-million-math75.97
gemini-a-family-of-highly-capable-multimodal-186.5-
transcending-scaling-laws-with-0-1-extra58.5540
mathcoder-seamless-code-integration-in-llms74.17
openmathinstruct-2-accelerating-ai-for-math94.9-
the-art-of-llm-refinement-ask-refine-and82.6-
parameter-efficient-sparsity-crafting-from78.3-
orca-math-unlocking-the-potential-of-slms-in86.87
large-language-models-can-self-improve74.4540
llama-open-and-efficient-foundation-language-117.813
an-empirical-study-of-data-ability-boundary73.97
outcome-supervised-verifiers-for-planning-in84.77
solving-quantitative-reasoning-problems-with8962
模型 1795.275
on-the-advance-of-making-language-models83.2175
llama-open-and-efficient-foundation-language-153.133
dart-math-difficulty-aware-rejection-tuning-182.58
model-card-and-evaluations-for-claude-models85.2-
sparks-of-artificial-general-intelligence87.1-
wizardmath-empowering-mathematical-reasoning63.913
large-language-models-can-self-improve32.2540
wizardmath-empowering-mathematical-reasoning83.27
openmathinstruct-1-a-1-8-million-math86.813
palm-2-technical-report-180.7-
the-claude-3-model-family-opus-sonnet-haiku92.3-
large-language-models-are-zero-shot-reasoners51.5175
orca-2-teaching-small-language-models-how-to59.1413
scaling-relationship-on-learning-mathematical51.27
solving-math-word-problems-with-process-and76.570
palm-2-technical-report-191.0-
模型 3480.2-
openmathinstruct-2-accelerating-ai-for-math96.0-
solving-math-word-problems-with-process-and87.170
模型 3766.9-
mathcoder-seamless-code-integration-in-llms67.87
tora-a-tool-integrated-reasoning-agent-for84.370
an-empirical-study-of-data-ability-boundary80.57
large-language-models-are-zero-shot-reasoners17.9540
dart-math-difficulty-aware-rejection-tuning-182.67
solving-quantitative-reasoning-problems-with56.5540
large-language-models-are-zero-shot-reasoners55.0175
scaling-relationship-on-learning-mathematical64.879
模型 4696.47
metamath-bootstrap-your-own-mathematical82.370
large-language-models-are-zero-shot-reasoners40.7175
hierarchical-prompting-taxonomy-a-universal97.72-
large-language-models-can-self-improve82.1540
openmathinstruct-1-a-1-8-million-math90.870
模型 5271.9-
least-to-most-prompting-enables-complex68.01175
dart-math-difficulty-aware-rejection-tuning-190.470
teaching-inspired-integrated-prompting94.8-
solving-quantitative-reasoning-problems-with 28.48
openchat-advancing-open-source-language77.37
large-language-models-are-zero-shot-reasoners10.4175
model-card-and-evaluations-for-claude-models88-
composing-ensembles-of-pre-trained-models-via18.30.355
openmathinstruct-1-a-1-8-million-math80.734
模型 6276.47
openmathinstruct-1-a-1-8-million-math84.670
openmathinstruct-1-a-1-8-million-math78.813
wizardmath-empowering-mathematical-reasoning81.670
orca-2-teaching-small-language-models-how-to47.237
solving-quantitative-reasoning-problems-with56.88
tora-a-tool-integrated-reasoning-agent-for80.734
dart-math-difficulty-aware-rejection-tuning-188.27
deepseekmath-pushing-the-limits-of88.27
an-empirical-study-of-data-ability-boundary80.434
openmathinstruct-1-a-1-8-million-math90.170
llama-open-and-efficient-foundation-language-111.07
the-claude-3-model-family-opus-sonnet-haiku88.9-
qwen2-technical-report96.772
large-language-models-are-zero-shot-reasoners58.1540
solving-quantitative-reasoning-problems-with 52.462
composing-ensembles-of-pre-trained-models-via12.20.355
large-language-models-can-self-improve56.5540
openmathinstruct-2-accelerating-ai-for-math91.7-
the-unreasonable-effectiveness-of-eccentric4313
solving-quantitative-reasoning-problems-with4.18
llama-open-and-efficient-foundation-language-135.633
openmathinstruct-2-accelerating-ai-for-math94.1-
模型 8593.27
模型 8689.013
gpt-4-technical-report-157.1-
llama-open-and-efficient-foundation-language-118.17
模型 8994.137
frugal-lms-trained-to-invoke-symbolic-solvers35.213
openmathinstruct-1-a-1-8-million-math84.87
kwaiyiimath-technical-report73.313
dart-math-difficulty-aware-rejection-tuning-181.17
unifying-language-learning-paradigms4.420
openmathinstruct-1-a-1-8-million-math80.27
large-language-models-can-self-improve73.5540
tora-a-tool-integrated-reasoning-agent-for75.813
solving-quantitative-reasoning-problems-with16.28
tinygsm-achieving-80-on-gsm8k-with-small81.52.6
large-language-models-are-zero-shot-reasoners41.3175
llama-2-open-foundation-and-fine-tuned-chat56.870
the-unreasonable-effectiveness-of-eccentric417
learning-from-self-sampled-correct-and19.52.7
mathcoder-seamless-code-integration-in-llms72.613
math-shepherd-a-label-free-step-by-step89.17
scaling-relationship-on-learning-mathematical55.313
outcome-supervised-verifiers-for-planning-in73.77
llama-open-and-efficient-foundation-language-150.965
metamath-bootstrap-your-own-mathematical77.77
large-language-models-can-self-improve17.9540
codet5-open-code-large-language-models-for73.80.77
query-and-response-augmentation-cannot-help69.87
solving-math-word-problems-with-process-and87.370
模型 11495.17
metamath-bootstrap-your-own-mathematical66.47
openmathinstruct-1-a-1-8-million-math84.770
model-card-and-evaluations-for-claude-models80.9-
math-shepherd-a-label-free-step-by-step84.17
self-consistency-improves-chain-of-thought74.4540
query-and-response-augmentation-cannot-help82.370
llemma-an-open-language-model-for-mathematics36.47
solving-quantitative-reasoning-problems-with78.5540
llemma-an-open-language-model-for-mathematics51.534
metamath-bootstrap-your-own-mathematical71.013
mathcoder-seamless-code-integration-in-llms83.970
query-and-response-augmentation-cannot-help7413
the-unreasonable-effectiveness-of-eccentric6170
solving-quantitative-reasoning-problems-with33.062
the-claude-3-model-family-opus-sonnet-haiku95-
toward-self-improvement-of-llms-via9270
mistral-7b52.27
llama-open-and-efficient-foundation-language-129.313
unifying-language-learning-paradigms4.120
learning-from-self-sampled-correct-and7.50.125
step-dpo-step-wise-preference-optimization94.0-
模型 13674.77
openmathinstruct-1-a-1-8-million-math86.97
composing-ensembles-of-pre-trained-models-via20.80.355
breaking-the-ceiling-of-the-llm-community-by90.91-
tinygsm-achieving-80-on-gsm8k-with-small74.32.7
solving-quantitative-reasoning-problems-with68.562
tora-a-tool-integrated-reasoning-agent-for88.370
模型 14372.3-
wizardmath-empowering-mathematical-reasoning54.97
llama-open-and-efficient-foundation-language-169.765
dart-math-difficulty-aware-rejection-tuning-181.18
mathcoder-seamless-code-integration-in-llms81.734
solving-math-word-problem-via-cooperative63.212
模型 14985.5-
boosting-llm-reasoning-push-the-limits-of-few59.5970
parameter-efficient-sparsity-crafting-from77.8-
branch-train-mix-mixing-expert-llms-into-a37.1-
openmathinstruct-1-a-1-8-million-math88.034
an-empirical-study-of-data-ability-boundary87.27
mathcoder-seamless-code-integration-in-llms64.27
tora-a-tool-integrated-reasoning-agent-for85.134
模型 15787.414
dart-math-difficulty-aware-rejection-tuning-189.670
dart-math-difficulty-aware-rejection-tuning-186.87
outcome-supervised-verifiers-for-planning-in82.67