HyperAI

Question Answering On Pubmedqa

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Model Name
Accuracy
Paper TitleRepository
MediSwift-XL76.8MediSwift: Efficient Sparse Pre-trained Biomedical Language Models-
PaLM (8B, Few-shot)34Large Language Models Encode Clinical Knowledge-
BioGPT(345M)78.2BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
PaLM (62B, Few-shot)57.8Large Language Models Encode Clinical Knowledge-
PubMedBERT uncased55.84Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Claude 3 Opus (5-shot)75.8The Claude 3 Model Family: Opus, Sonnet, Haiku-
Flan-T5-XXL76.80Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark-
GAL 120B (zero-shot)77.6Galactica: A Large Language Model for Science
Human Performance (single annotator)78.0PubMedQA: A Dataset for Biomedical Research Question Answering
BioELECTRA uncased64.2BioELECTRA:Pretrained Biomedical text Encoder using Discriminators
BioLinkBERT (base)70.2LinkBERT: Pretraining Language Models with Document Links
BLOOM (zero-shot)73.6Galactica: A Large Language Model for Science
Flan-PaLM (540B, Few-shot)79Large Language Models Encode Clinical Knowledge-
Med-PaLM 2 (CoT + SC)74.0Towards Expert-Level Medical Question Answering with Large Language Models
Med-PaLM 2 (ER)75.0Towards Expert-Level Medical Question Answering with Large Language Models
Flan-PaLM (62B, Few-shot)77.2Large Language Models Encode Clinical Knowledge-
BioMedGPT-10B76.1BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
PaLM (540B, Few-shot)55Large Language Models Encode Clinical Knowledge-
Med-PaLM 2 (5-shot)79.2Towards Expert-Level Medical Question Answering with Large Language Models
BioLinkBERT (large)72.2LinkBERT: Pretraining Language Models with Document Links
0 of 29 row(s) selected.