HyperAI超神经

Arithmetic Reasoning On Game Of 24

评估指标

Success

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称Success
tree-of-thoughts-deliberate-problem-solving-10.74