HyperAI超神经

首页资讯最新论文教程数据集顶会百科 SOTA LLM 模型天梯

中文

HyperAI超神经

Bbh

评估指标

bbh

bbhbooleanexpressions

bbhcausaljudgement

bbhdateunderstanding

bbhdisambiguationqa

bbhdycklanguages

bbhformalfallacies

bbhgeometricshapes

bbhhyperbaton

bbhlogicaldeductionfiveobjects

bbhlogicaldeductionsevenobjects

bbhlogicaldeductionthreeobjects

bbhmovierecommendation

bbhmultisteparithmetictwo

bbhnavigate

bbhobjectcounting

bbhpenguinsinatable

bbhreasoningaboutcoloredobjects

bbhruinnames

bbhsalienttranslationerrordetection

bbhsnarks

bbhsportsunderstanding

bbhtemporalsequences

bbhtrackingshuffledobjectsfiveobjects

bbhtrackingshuffledobjectssevenobjects

bbhtrackingshuffledobjectsthreeobjects

bbhweboflies

bbhwordsorting

key

model

num

org

rank

time

评测结果

各个模型在此基准测试上的表现结果

比较表格

模型名称	bbh	bbhbooleanexpressions	bbhcausaljudgement	bbhdateunderstanding	bbhdisambiguationqa	bbhdycklanguages	bbhformalfallacies	bbhgeometricshapes	bbhhyperbaton	bbhlogicaldeductionfiveobjects	bbhlogicaldeductionsevenobjects	bbhlogicaldeductionthreeobjects	bbhmovierecommendation	bbhmultisteparithmetictwo	bbhnavigate	bbhobjectcounting	bbhpenguinsinatable	bbhreasoningaboutcoloredobjects	bbhruinnames	bbhsalienttranslationerrordetection	bbhsnarks	bbhsportsunderstanding	bbhtemporalsequences	bbhtrackingshuffledobjectsfiveobjects	bbhtrackingshuffledobjectssevenobjects	bbhtrackingshuffledobjectsthreeobjects	bbhweboflies	bbhwordsorting	key	model	num	org	rank	time
模型 1	86.700000	96.400000	72.200000	90.000000	85.600000	63.200000	81.200000	49.600000	99.200000	83.600000	58.800000	98.400000	87.200000	87.600000	98.800000	99.600000	97.300000	97.600000	89.200000	69.600000	90.400000	95.200000	100.000000	100.000000	100.000000	100.000000	100.000000	50.800000	1.000000	GPT-4	N/A	OpenAI	1.000000	2023/3/15