The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestStepFun2-mini
StepFunAny-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025
NotableApache 2.0
StepFun · Any-to-Any
Step-Audio 2 mini
Step-Audio 2 mini

AI research company StepFun has released Step-Audio 2 mini, a new audio language model designed for both understanding and generating human speech. The model is presented as a compact, end-to-end system, suggesting an efficient and unified architecture.

Unlike traditional pipelines that use separate components for automatic speech recognition (ASR) and text-to-speech (TTS), Step-Audio 2 mini integrates these capabilities. This approach aims to simplify the development of voice-enabled applications by handling the entire audio processing loop within a single framework.

The 'mini' in its name suggests it is a smaller variant, likely optimized for resource efficiency, though a specific parameter count was not disclosed. By releasing the model under the permissive Apache 2.0 license, StepFun is making it available for a wide range of academic and commercial projects.

Step-Audio 2 mini joins a growing field of open, multimodal models that seek to process and generate data across different formats. Interested developers can explore the model and its capabilities on the Hugging Face Hub.

Sources

  • stepfun-ai/Step-Audio-2-mini

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseAPACHE-2.0
Downloads2.9K

Modalities

Any-to-AnyText → SpeechSpeech → Text

More in Any-to-Any

MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4 12B Multimodal Model

The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.

May 23, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4, a 12B 'Any-to-Any' Model

The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.

May 23, 2026