HyperAI

Question Answering

Question Answering is an important task in the field of natural language processing, aimed at automatically answering questions posed by users through computer systems. This task can be subdivided into subtasks such as community question answering and knowledge base question answering, with evaluation metrics primarily including EM (Exact Match) and F1 scores. Currently, popular benchmark datasets include SQuAD, HotPotQA, bAbI, TriviaQA, and WikiQA. In recent years, models like T5 and XLNet have performed exceptionally well in this area, advancing the accuracy and practicality of question answering systems.

SQuAD2.0
SQuAD1.1
RuBERT
HotpotQA
Beam Retrieval
PIQA
GPT-3 175B (0-shot)
BoolQ
Gemma-7B
COPA
PaLM 540B (finetuned)
TriviaQA
SpanBERT
SQuAD1.1 dev
T5-11B
Natural Questions
Atlas (full, Wiki-dec-2018 index)
OpenBookQA
WebQuestions
Memory Networks (ensemble)
TruthfulQA
CoA
MultiRC
DeBERTa-1.5B
CronQuestions
PubMedQA
PubMedBERT uncased
MedQA
DRAGON + BioLinkBERT
WikiQA
TANDA-RoBERTa (ASNQ, WikiQA)
SIQA
LLaMA 65B (zero-shot)
StoryCloze
BLOOMZ
DaNetQA
TimeQuestions
Quora Question Pairs
DeBERTa (large)
CNN / Daily Mail
DROP Test
QDGAT (ensemble)
NewsQA
OpenAI/o3-mini-2025-01-31-high
bAbi
STM
Natural Questions (long)
DensePhrases
SQuAD2.0 dev
XLNet (single model)
TrecQA
TANDA DeBERTa-V3-Large + ALL
StrategyQA
PaLM 2 (few-shot, CoT, SC)
MultiTQ
NarrativeQA
Masque (NarrativeQA + MS MARCO)
Bamboogle
CoQA
BERT Large Augmented (single model)
OBQA
FLAN 137B (zero-shot)
TIQ
WikiHop
BigBird-etc
Children's Book Test
NSE
FEVER
TempQuestions
QAap
BioASQ
PubMedBERT uncased
FQuAD
KILT: ELI5
QASent
Attentive LSTM
Quasart-T
RACE
SQA3D
ScanQA (w/ auxiliary loss)
Story Cloze
Neo-6B (QA + WS)
YahooCQA
sMIM (1024) +
DROP
FinQA
ELASTIC (RoBERTa-large)
FriendsQA
NExT-QA (Open-ended VideoQA)
NQ (BEIR)
PeerQA
GPT-4o-2024-08-06-128k
SemEvalCQA
HyperQA
AI2 Kaggle Dataset
BLURB
BioLinkBERT (large)
catbAbI LM-mode
Fast Weight Memory
catbAbI QA-mode
Fast Weight Memory
CheGeKa
Complex-CronQuestions
SubGTR
EgoTaskQA
FairytaleQA
BART fine-tuned on FairytaleQA
FiQA-2018 (BEIR)
HotpotQA (BEIR)
BM25+CE
HybridQA
MAFiD
Molweni
MS MARCO
MultiQ
NaturalQA
DPR
QuALITY
RuOpenBookQA
CaseHOLD
Custom Legal-BERT
ConditionalQA
FiD
ConvFinQA
DuoRC
Vector Database (ChromaDB)
Mathematics Dataset
TP-Transformer
OTT-QA
Fusion Retriever+ETC
ReClor
XLNet-large
SberQuAD
SCDE
TweetQA
ByT5
VNHSGE-English
AGI Eval
Aristo Kaggle Allen AI 8th grade questions
Cardal
CliCR
Gated-Attention Reader
CODAH
G-DAUG-Combo + RoBERTa-Large
COMPLEXQUESTIONS
WebQA
GeoQuestions1089
GeoQA2
MapEval-API
Claude-3.5-Sonnet (ReAct)
MCTest-500
MedTurkQuAD: Medical Turkish Question-Answering Dataset
MRQA
MuLD (HotpotQA)
MuLD (NarrativeQA)
PopQA
SelfRAG-7b
PubChemQA
BioMedGPT-10B
QuAC
FlowQA (single model)
Reverb
SQuAD
TempQA-WD
Torque
ECONET
UniProtQA
VNHSGE-Biology
VNHSGE-Chemistry
VNHSGE-Civic
Bing Chat
VNHSGE-Geography
VNHSGE-History
VNHSGE-Literature
VNHSGE Mathematics
VNHSGE-Physics
WikiSQL
WikiTableQuestions
TabSQLify (col+row)
JD Product Question Answer
PAAG
AviationQA
KGT5
BBH
ChAII - Hindi and Tamil Question Answering
MuCoT
COCO Visual Question Answering (VQA) real images 1.0 open ended
ComplexWebQuestions
TOME-2
EfficientQA dev
EfficientQA test
GraphQuestions
ChatGPT
HellaSwag
JaQuAD
BERT-Japanese
KQA Pro
MapEval-Textual
MCTest-160
syntax, frame, coreference, and word embedding features
MedMCQA Dev
MedMobile (3.8B)
MetaQA
T5-small+prolog
MMLU
MRQA out-of-domain
RGX
MultiSpanQA
RoBERTa-large Tagger + LIQUID (Ensemble)
QASPER
Longformer Encoder Decoder (base)
RecipeQA
multimodal+LXMERT+ConstrainedMaxPooling
SchizzoSQUAD
SimpleQuestions
StepGame
TP-MANN
SWAG
DeBERTaV3large
TAT-QA
TagOp
WebQuestionsSP
ChatGPT
WebSRC
adversarial_qa
squad_adversarial
squad_v2
squadshifts amazon
squadshifts new_wiki
squadshifts nyt
squadshifts reddit