StepFunAny-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025

NotableApache 2.0

AI research company StepFun has released Step-Audio 2 mini, a new audio language model designed for both understanding and generating human speech. The model is presented as a compact, end-to-end system, suggesting an efficient and unified architecture.

Unlike traditional pipelines that use separate components for automatic speech recognition (ASR) and text-to-speech (TTS), Step-Audio 2 mini integrates these capabilities. This approach aims to simplify the development of voice-enabled applications by handling the entire audio processing loop within a single framework.

The 'mini' in its name suggests it is a smaller variant, likely optimized for resource efficiency, though a specific parameter count was not disclosed. By releasing the model under the permissive Apache 2.0 license, StepFun is making it available for a wide range of academic and commercial projects.

Step-Audio 2 mini joins a growing field of open, multimodal models that seek to process and generate data across different formats. Interested developers can explore the model and its capabilities on the Hugging Face Hub.

Sources

stepfun-ai/Step-Audio-2-mini
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

StepFunAny-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025

NotableApache 2.0

Sources

stepfun-ai/Step-Audio-2-mini
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026