NVIDIASpeech → Text

NVIDIA Releases Canary 1B v2 Multilingual Speech Model

The new 1-billion-parameter model handles both transcription and translation across five languages using the company's efficient FastConformer architecture.

Aug 4, 2025

NotableCC BY 4.0

NVIDIA has released Canary 1B v2, a versatile 1-billion-parameter model for automatic speech recognition (ASR) and translation. Published with a permissive CC-BY-4.0 license, the model provides developers with a powerful new tool for building voice-enabled applications.

The model is built on NVIDIA's FastConformer architecture, which is designed for high-performance and efficient speech processing. Canary excels at multilingual tasks, handling both transcription in a source language and translation from that language into English within a single model.

Core Capabilities

According to its official release card, Canary 1B v2 was trained to handle several key tasks without the need for separate models:

Transcription: Supports English, German, French, Spanish, and Mandarin.
Translation: Can translate any of the supported source languages into English text.
Formatting: Includes automatic punctuation and capitalization to produce more readable output.

This release adds another high-quality, open-source option to a field largely defined by models like OpenAI's Whisper. By providing a permissively licensed and efficient alternative, NVIDIA gives developers more flexibility for integrating advanced speech AI. The model and its usage instructions are available on NVIDIA's Hugging Face page.

Sources

nvidia/canary-1b-v2
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026

Core Capabilities

According to its official release card, Canary 1B v2 was trained to handle several key tasks without the need for separate models:

Transcription: Supports English, German, French, Spanish, and Mandarin.

Translation: Can translate any of the supported source languages into English text.

Formatting: Includes automatic punctuation and capitalization to produce more readable output.