CohereSpeech → Text

Cohere Releases Top-Ranked Multilingual Transcription Model

The new automatic speech recognition model from Cohere Labs sets a new benchmark on the Hugging Face Open ASR Leaderboard for multilingual performance.

Mar 24, 2026

NotableOther

Cohere has released a new model for automatic speech recognition (ASR), Cohere Transcribe, immediately claiming the top position on the Hugging Face Open ASR Leaderboard. The model demonstrates state-of-the-art performance, particularly on challenging multilingual benchmarks like FLEURS (Few-shot Learning Evaluation of Universal Representations of Speech).

Trained on a large dataset of professionally transcribed audio, the model is designed to accurately convert spoken language into text across multiple languages. This capability makes it a powerful new tool for developers building voice-enabled applications, transcription services, and other features that rely on understanding human speech.

The release of a high-performing ASR model from a major AI lab like Cohere provides a strong alternative to existing leaders in the space, such as OpenAI's Whisper. As more powerful, openly-available models for speech are released, the barrier to creating sophisticated audio-based applications continues to fall for researchers and builders.

While the model's weights are publicly accessible, it is important to note the usage restrictions. Cohere Transcribe is available under a Cohere Non-Commercial License, meaning it is intended for research and non-commercial projects rather than for deployment in production commercial applications.

Sources

CohereLabs/cohere-transcribe-03-2026
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026