Maya ResearchText → Speech

Veena TTS Model Targets Indian Languages with Llama Base

Maya Research has released a 3-billion-parameter model designed to generate natural-sounding speech in Hindi and English.

Jun 24, 2025

NotableOther

Maya Research has introduced Veena, a new open-source model for text-to-speech (TTS) synthesis. The model is specifically designed to address the need for high-quality voice generation in major Indian languages.

What sets Veena apart is its foundation on a Llama-style architecture. The 3-billion-parameter model is trained to generate natural-sounding speech from text in both Hindi and English, catering to the nuances of Indian accents and dialects. This architectural choice leverages the powerful text-processing capabilities of large language models for the distinct task of audio generation.

The release marks a significant step for open-source AI in a region where high-quality, accessible models have been less common. By focusing on widely spoken Indian languages, Veena could enable a new range of applications, from localized voice assistants and accessibility tools to automated content creation for one of the world's largest digital audiences.

The model, its weights, and usage instructions are available for download on the Hugging Face Hub. It is released under a custom license, and potential users should review the terms before implementation.

Sources

maya-research/Veena
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026