语言建模
Language Modeling 是预测文档中下一个词或字符的任务,可训练语言模型应用于文本生成、文本分类和问答等自然语言处理任务。自2010年代起,神经语言模型取代了N-gram模型,2020年代后,大型语言模型(LLMs)成为实现最先进水平的唯一途径。模型的能力通过交叉熵和困惑度评估,常用数据集包括WikiText-103、One Billion Word、Text8、C4和The Pile。
WikiText-103
RETRO (7.5B)
Penn Treebank (Word Level)
GPT-3 (Zero-Shot)
enwik8
GPT-2 (48 layers, h=1600)
The Pile
Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)
WikiText-2
SparseGPT (175B, 50% Sparsity)
LAMBADA
GPT-3 175B (Few-Shot)
One Billion Word
OmniNetT (Large)
Text8
GPT-2
Penn Treebank (Character Level)
Mogrifier LSTM + dynamic eval
Hutter Prize
Transformer-XL + RMS dynamic eval
C4
Primer
SALMon
Spirit-LM (Expr.)
OpenWebText
GPT2-Hermite
BIG-bench-lite
GLM-130B (3-shot)
Wiki-40B
FLASH-Quad-8k
CLUE (AFQMC)
CLUE (C3)
CLUE (CMNLI)
CLUE (CMRC2018)
GLM-130B
CLUE (DRCD)
CLUE (OCNLI_50K)
GLM-130B
CLUE (WSC1.1)
FewCLUE (BUSTM)
FewCLUE (CHID-FC)
FewCLUE (CLUEWSC-FC)
FewCLUE (EPRSTMT)
FewCLUE (OCNLI-FC)
VietMed
Hybrid 4-gram VietMed-Train + ExtraText
Ethereum Phishing Transaction Network
100 sleep nights of 8 caregivers
Gpt3
2000 HUB5 English
MMLU
Arxiv HEP-TH citation graph
Bookcorpus2
Books3
Curation Corpus
DM Mathematics
enwik8 dev
Transformer-LS (small)
enwiki8
PAR Transformer 24B
FreeLaw
GitHub
Gutenberg PG-19
HackerNews
language-modeling-recommendation
GPT2
NIH ExPorter
OpenSubtitles
OpenWebtext2
PhilPapers
Pile CC
PTB Diagnostic ECG Database
I-DARTS
PubMed Cognitive Control Abstracts
PubMed Central
StackExchange
Gopher
Text8 dev
Transformer-LS (small)
Ubuntu IRC
USPTO Backgrounds