The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestXiaomi7B-Instruct
XiaomiAny-to-Any

Xiaomi's MiMo-Audio 7B Tackles Complex Speech Tasks

This new instruction-tuned model from Xiaomi can handle a flexible combination of audio and text inputs and outputs, from transcription to voice synthesis.

Sep 18, 2025
NotableMIT
Xiaomi · Any-to-Any
MiMo-Audio-7B-Instruct
MiMo-Audio-7B-Instruct

Xiaomi has released MiMo-Audio-7B-Instruct, a versatile 7-billion-parameter model designed to handle a wide array of speech and audio tasks. Published under a permissive MIT license, the model marks a notable open-source contribution from the major electronics company, providing a powerful new tool for developers working with audio AI.

The key innovation of MiMo-Audio is its "any-to-any" architecture. Unlike specialized models that perform a single function, MiMo-Audio is a generalist system that can process and generate audio and text in flexible combinations. This allows it to act as a unified solution for multiple distinct tasks.

A Unified Model for Speech AI

According to its release materials on Hugging Face, the instruction-tuned model is capable of performing a variety of functions, including:

  • Speech Recognition (ASR): Transcribing spoken audio to text.
  • Text-to-Speech (TTS): Synthesizing speech from written text.
  • Speech-to-Speech Translation (S2ST): Translating spoken language directly into another spoken language.
  • Audio Captioning and Generation: Describing sounds or creating audio from text prompts.

This flexibility makes MiMo-Audio a compelling foundation for building complex voice-enabled applications. By releasing a capable, general-purpose audio model under an open license, Xiaomi is providing a significant building block for the open-source AI community and a strong alternative to proprietary speech APIs.

Sources

  • XiaomiMiMo/MiMo-Audio-7B-Instruct

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters7B
Context window—
LicenseMIT
Downloads44.9K

Modalities

Any-to-AnyText → SpeechSpeech → Text

More in Any-to-Any

MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4 12B Multimodal Model

The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.

May 23, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4, a 12B 'Any-to-Any' Model

The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.

May 23, 2026