Home News Latest Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Speech Recognition

Speech recognition is the task of converting spoken language into text, involving the identification of words from audio recordings and transcribing them into written format. Its goal is to accurately transcribe audio content in real-time or recorded audio while considering factors such as accents, speaking rate, and background noise to improve the accuracy and reliability of the transcription. This technology has significant application value in areas like human-computer interaction, automatic subtitle generation, and voice assistants.

LibriSpeech test-clean

HuBERT with Libri-Light

LibriSpeech test-other

wav2vec 2.0 with Libri-Light

Switchboard + Hub500

Common Voice German

wav2vec 2.0 XLS-R 1B + TEVR (5-gram)

swb_hub_500 WER fullSWBCH

QuartzNet15x5DE (D37)

Common Voice French

ConformerCTC-L (5-gram)

Common Voice Spanish

ConformerCTC-L (4-gram)

Paraformer-large

GigaSpeech TEST

Zipformer+pruned transducer w/ CR-CTC (no external language model)

Hub5'00 SwitchBoard

LAS + SpecAugment (with LM, Switchboard mild policy)

Libri-Light test-clean

Libri-Light test-other

CHiME-6 dev_gss12

Common Voice vi

Vietnamese end-to-end speech recognition using wav2vec 2.0 by VietAI

Europarl-ASR EN Guest-test

Triphone (39 features) + LDA and MLLT + SGMM

Speech Commands

khanhld/chunkformer-large-vie

Common Voice English

Whisper (Large v2)

Common Voice Italian

Whisper (Large v2)

Europarl-ASR EN MEP-test

Whisper-LLaMa-7b

AISHELL-2 Test Android

AISHELL-2 Test IOS

AISHELL-2 Test Mic

WavLM Large & EEND-vector clustering

CALLHOME Spanish Speech

Common Voice Frisian

Common Voice Japanese

Common Voice Portuguese

XLSR53 Wav2Vec2 Portuguese by Orlem Santos

Common Voice Russian

Whisper (Large v2)

facebook/multilingual_librispeech german

Conformer/Transformer-AED

Google Speech Commands - Musan

Hub5'00 FISHER-SWBD

Hub5'00 CallHome

LibriSpeech 100h test-clean

LibriSpeech 100h test-other

Branchformer + GFSA

LibriSpeech train-clean-100 test-clean

wav2vec_wav2letter

LibriSpeech train-clean-100 test-other

wav2vec_wav2letter

Switchboard (300hr)

Switchboard CallHome

Switchboard SWBD

AISHELL-2 Android

ATCOSIM corpus (Air Traffic Control Communications)

ATCOSIM dataset (Air Traffic Control Communications)

Common Voice 7.0 Abkhaz

Common Voice 7.0 Arabic

Common Voice 7.0 Bashkir

Common Voice 7.0 German

Common Voice 7.0 Hindi

Common Voice 7.0 Odia

Common Voice 7.0 Portuguese

Common Voice 7.0 Votic

Common Voice 8.0 Assamese

Common Voice 8.0 Basaa

Common Voice 8.0 Breton

Common Voice 8.0 Bulgarian

Common Voice 8.0 Central Kurdish

Common Voice 8.0 Dutch

Common Voice 8.0 Erzya

Common Voice 8.0 French

Common Voice 8.0 Galician

Common Voice 8.0 German

Common Voice 8.0 Guarani

Common Voice 8.0 Hausa

Common Voice 8.0 Hindi

Common Voice 8.0 Hungarian

Common Voice 8.0 Japanese

Common Voice 8.0 Kabyle

Common Voice 8.0 Kazakh

Common Voice 8.0 Kurmanji Kurdish

Common Voice 8.0 Maltese

Common Voice 8.0 Marathi

Common Voice 8.0 Odia

Common Voice 8.0 Portuguese

Common Voice 8.0 Punjabi

Common Voice 8.0 Romansh Sursilvan

Common Voice 8.0 Romansh Vallader

Common Voice 8.0 Russian

Common Voice 8.0 Santali (Ol Chiki)

Common Voice 8.0 Serbian

Common Voice 8.0 Slovenian

Common Voice 8.0 Sorbian, Upper

Common Voice 8.0 Swahili

Common Voice 8.0 Tatar

Common Voice 8.0 Uzbek

Common Voice 8.0 Votic

Common Voice Arabic

Common Voice Breton

Common Voice Catalan

Common Voice Chinese (China)

Common Voice Czech

Common Voice Dutch

Common Voice Georgian

Common Voice Hindi

Common Voice Indonesian

Common Voice Lithuanian

Common Voice Maltese

Common Voice Odia

Common Voice Persian

Common Voice Polish

Common Voice Swedish

Common Voice Tamil

Common Voice Turkish

Common Voice Vietnamese

Common Voice Welsh

German ASR Data-Mix

Kazakh Speech Corpus v1.1

Mozilla Common Voice 15.0 Persian

Mozilla Common Voice 16.1

Mozilla Common Voice 9.0

projecte-aina/parlament_parla ca

Robust Speech Event - Catalan Dev Data

Robust Speech Event - Dev Data

Russian LibriSpeech

UWB-ATCC dataset (Air Traffic Control Communications)