Qwen · AlibabaAny-to-Any

Qwen's Fun-Audio-Chat: An Open Speech-to-Speech LLM

The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.

Dec 23, 2025

NotableApache 2.0

Alibaba's Qwen team has released Fun-Audio-Chat-8B, an 8-billion-parameter model designed for seamless speech-to-speech conversation. Released under the permissive Apache 2.0 license, the model can process spoken input and generate a spoken response, creating a more natural conversational flow than traditional text-based interfaces.

The system functions as a comprehensive audio pipeline, integrating a speech encoder, the Qwen-7B-Chat language model for reasoning, and a text-to-speech (TTS) component to vocalize the final answer. This architecture allows it to handle complex interactions entirely through audio, supporting both English and Chinese languages.

While many multimodal models can accept audio as an input, few open-source projects close the loop with integrated, high-quality speech synthesis for true conversational interaction. Fun-Audio-Chat provides a powerful, unified foundation for developers building more intuitive voice assistants, accessibility tools, and real-time interactive agents.

The model and its components are available now on Hugging Face for researchers and developers to explore. Its open license permits a wide range of academic and commercial applications, encouraging further innovation in audio-native AI.

Sources

FunAudioLLM/Fun-Audio-Chat-8B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Qwen · AlibabaAny-to-Any

Qwen's Fun-Audio-Chat: An Open Speech-to-Speech LLM

The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.

Dec 23, 2025

NotableApache 2.0

Sources

FunAudioLLM/Fun-Audio-Chat-8B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026