HyperAI超神经

问答

Question Answering是自然语言处理领域的重要任务,旨在通过计算机系统自动回答用户提出的问题。该任务可细分为社区问答和知识库问答等子任务,评估指标主要包括EM和F1分数。当前热门的基准数据集有SQuAD、HotPotQA、bAbI、TriviaQA和WikiQA等。近年来,T5和XLNet等模型在这一领域表现出色,推动了问答系统的准确性和实用性。

SQuAD2.0
SQuAD1.1
RuBERT
HotpotQA
Beam Retrieval
PIQA
GPT-3 175B (0-shot)
BoolQ
Gemma-7B
COPA
PaLM 540B (finetuned)
TriviaQA
SpanBERT
SQuAD1.1 dev
T5-11B
Natural Questions
Atlas (full, Wiki-dec-2018 index)
OpenBookQA
WebQuestions
Memory Networks (ensemble)
TruthfulQA
CoA
MultiRC
DeBERTa-1.5B
CronQuestions
PubMedQA
PubMedBERT uncased
MedQA
DRAGON + BioLinkBERT
WikiQA
TANDA-RoBERTa (ASNQ, WikiQA)
SIQA
LLaMA 65B (zero-shot)
StoryCloze
BLOOMZ
DaNetQA
TimeQuestions
Quora Question Pairs
DeBERTa (large)
CNN / Daily Mail
DROP Test
QDGAT (ensemble)
NewsQA
OpenAI/o3-mini-2025-01-31-high
bAbi
STM
Natural Questions (long)
DensePhrases
SQuAD2.0 dev
XLNet (single model)
TrecQA
TANDA DeBERTa-V3-Large + ALL
StrategyQA
PaLM 2 (few-shot, CoT, SC)
MultiTQ
NarrativeQA
Masque (NarrativeQA + MS MARCO)
Bamboogle
CoQA
BERT Large Augmented (single model)
OBQA
FLAN 137B (zero-shot)
TIQ
WikiHop
BigBird-etc
Children's Book Test
NSE
FEVER
TempQuestions
QAap
BioASQ
PubMedBERT uncased
FQuAD
KILT: ELI5
QASent
Attentive LSTM
Quasart-T
RACE
SQA3D
ScanQA (w/ auxiliary loss)
Story Cloze
Neo-6B (QA + WS)
YahooCQA
sMIM (1024) +
DROP
FinQA
ELASTIC (RoBERTa-large)
FriendsQA
NExT-QA (Open-ended VideoQA)
NQ (BEIR)
PeerQA
GPT-4o-2024-08-06-128k
SemEvalCQA
HyperQA
AI2 Kaggle Dataset
BLURB
BioLinkBERT (large)
catbAbI LM-mode
Fast Weight Memory
catbAbI QA-mode
Fast Weight Memory
CheGeKa
Complex-CronQuestions
SubGTR
EgoTaskQA
FairytaleQA
BART fine-tuned on FairytaleQA
FiQA-2018 (BEIR)
HotpotQA (BEIR)
BM25+CE
HybridQA
MAFiD
Molweni
MS MARCO
MultiQ
NaturalQA
DPR
QuALITY
RuOpenBookQA
CaseHOLD
Custom Legal-BERT
ConditionalQA
FiD
ConvFinQA
DuoRC
Vector Database (ChromaDB)
Mathematics Dataset
TP-Transformer
OTT-QA
Fusion Retriever+ETC
ReClor
XLNet-large
SberQuAD
SCDE
TweetQA
ByT5
VNHSGE-English
AGI Eval
Aristo Kaggle Allen AI 8th grade questions
Cardal
CliCR
Gated-Attention Reader
CODAH
G-DAUG-Combo + RoBERTa-Large
COMPLEXQUESTIONS
WebQA
GeoQuestions1089
GeoQA2
MapEval-API
Claude-3.5-Sonnet (ReAct)
MCTest-500
MedTurkQuAD: Medical Turkish Question-Answering Dataset
MRQA
MuLD (HotpotQA)
MuLD (NarrativeQA)
PopQA
SelfRAG-7b
PubChemQA
BioMedGPT-10B
QuAC
FlowQA (single model)
Reverb
SQuAD
TempQA-WD
Torque
ECONET
UniProtQA
VNHSGE-Biology
VNHSGE-Chemistry
VNHSGE-Civic
Bing Chat
VNHSGE-Geography
VNHSGE-History
VNHSGE-Literature
VNHSGE Mathematics
VNHSGE-Physics
WikiSQL
WikiTableQuestions
TabSQLify (col+row)
JD Product Question Answer
PAAG
AviationQA
KGT5
BBH
ChAII - Hindi and Tamil Question Answering
MuCoT
COCO Visual Question Answering (VQA) real images 1.0 open ended
ComplexWebQuestions
TOME-2
EfficientQA dev
EfficientQA test
GraphQuestions
ChatGPT
HellaSwag
JaQuAD
BERT-Japanese
KQA Pro
MapEval-Textual
MCTest-160
syntax, frame, coreference, and word embedding features
MedMCQA Dev
MedMobile (3.8B)
MetaQA
T5-small+prolog
MMLU
MRQA out-of-domain
RGX
MultiSpanQA
RoBERTa-large Tagger + LIQUID (Ensemble)
QASPER
Longformer Encoder Decoder (base)
RecipeQA
multimodal+LXMERT+ConstrainedMaxPooling
SchizzoSQUAD
SimpleQuestions
StepGame
TP-MANN
SWAG
DeBERTaV3large
TAT-QA
TagOp
WebQuestionsSP
ChatGPT
WebSRC
adversarial_qa
squad_adversarial
squad_v2
squadshifts amazon
squadshifts new_wiki
squadshifts nyt
squadshifts reddit