NVIDIASpeech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026

NotableOther

NVIDIA has released Nemotron-3.5 ASR Streaming, a new 600-million-parameter model specialized for automatic speech recognition. Designed for low-latency performance, the model targets applications that require real-time transcription of multilingual audio.

At its core, the model employs a FastConformer architecture paired with a Recurrent Neural Network Transducer (RNN-T) decoder. This design is particularly effective for streaming use cases, as it can process audio in small chunks as it arrives rather than waiting for an entire clip. NVIDIA notes that the model is "cache-aware," an optimization that helps maintain efficiency and speed during continuous audio processing.

This release provides developers with a powerful tool for building features like live captioning, voice command systems, and in-meeting transcription services. While not a general-purpose language model, its specialization makes it a significant addition to the open-source toolkit for speech-based AI.

The model is available on the Hugging Face Hub for download and use. It is released under the NVIDIA Open Model License Agreement, which permits distribution and the creation of derivative works.

Sources

nvidia/nemotron-3.5-asr-streaming-0.6b
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026