NVIDIA Releases Canary 1B v2 Multilingual Speech Model
The new 1-billion-parameter model handles both transcription and translation across five languages using the company's efficient FastConformer architecture.

NVIDIA has released Canary 1B v2, a versatile 1-billion-parameter model for automatic speech recognition (ASR) and translation. Published with a permissive CC-BY-4.0 license, the model provides developers with a powerful new tool for building voice-enabled applications.
The model is built on NVIDIA's FastConformer architecture, which is designed for high-performance and efficient speech processing. Canary excels at multilingual tasks, handling both transcription in a source language and translation from that language into English within a single model.
Core Capabilities
According to its official release card, Canary 1B v2 was trained to handle several key tasks without the need for separate models:
- Transcription: Supports English, German, French, Spanish, and Mandarin.
- Translation: Can translate any of the supported source languages into English text.
- Formatting: Includes automatic punctuation and capitalization to produce more readable output.
This release adds another high-quality, open-source option to a field largely defined by models like OpenAI's Whisper. By providing a permissively licensed and efficient alternative, NVIDIA gives developers more flexibility for integrating advanced speech AI. The model and its usage instructions are available on NVIDIA's Hugging Face page.
Sources
- Visit
nvidia/canary-1b-v2
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition
Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

NVIDIA Releases Nemotron-3.5 Streaming ASR Model
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

Xiaomi Releases MiMo Model for Speech Recognition
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.