NVIDIASpeech → Text

NVIDIA Releases 600M Parakeet for Speech Recognition

The new FastConformer model uses a specialized training technique to improve transcription accuracy in noisy, real-world environments.

Aug 4, 2025

NotableCC BY 4.0

NVIDIA has released a new open model for automatic speech recognition (ASR) called Parakeet TDT 0.6B. As part of its NeMo toolkit for conversational AI, this 600-million-parameter model is designed to transcribe speech across multiple languages with high accuracy.

The model's architecture and training method are key to its performance. It uses a FastConformer encoder, which is known for its efficiency in processing audio sequences. The "TDT" in its name signifies Transducer with Denoising Training, a technique that makes the model more robust by training it to ignore noise and focus on the primary speech signal, a common challenge in real-world applications.

This release provides developers with a powerful and relatively lightweight tool for building speech-enabled products. With a permissive CC-BY-4.0 license, Parakeet can be freely used and modified for both research and commercial projects. Its 0.6-billion-parameter size makes it more accessible to deploy than the massive, multi-billion-parameter systems that often dominate ASR research.

By open-sourcing a specialized model like Parakeet, NVIDIA is contributing a significant building block to the conversational AI ecosystem. Developers interested in experimenting with the model can find the weights and usage instructions on the official Hugging Face repository.

Sources

nvidia/parakeet-tdt-0.6b-v3
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026