NVIDIA Releases Nemotron-3.5 Streaming ASR Model
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

NVIDIA has released Nemotron-3.5 ASR Streaming, a new 600-million-parameter model specialized for automatic speech recognition. Designed for low-latency performance, the model targets applications that require real-time transcription of multilingual audio.
At its core, the model employs a FastConformer architecture paired with a Recurrent Neural Network Transducer (RNN-T) decoder. This design is particularly effective for streaming use cases, as it can process audio in small chunks as it arrives rather than waiting for an entire clip. NVIDIA notes that the model is "cache-aware," an optimization that helps maintain efficiency and speed during continuous audio processing.
This release provides developers with a powerful tool for building features like live captioning, voice command systems, and in-meeting transcription services. While not a general-purpose language model, its specialization makes it a significant addition to the open-source toolkit for speech-based AI.
The model is available on the Hugging Face Hub for download and use. It is released under the NVIDIA Open Model License Agreement, which permits distribution and the creation of derivative works.
Sources
- Visit
nvidia/nemotron-3.5-asr-streaming-0.6b
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition
Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

Xiaomi Releases MiMo Model for Speech Recognition
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

IBM Releases 2B Granite Model for Multilingual Speech
The new two-billion-parameter model offers transcription capabilities for at least five major languages under a permissive Apache 2.0 license.