The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestHKUSTAudio1.0
HKUSTAudioAny-to-Any

HKUST Releases Audio-Omni, a Unified Audio Model

The new diffusion-based model handles speech, music, and general audio tasks like conversion and editing within a single, versatile framework.

Mar 27, 2026
NotableCC BY-NC 4.0
HKUSTAudio · Any-to-Any
Audio-Omni
Audio-Omni

Researchers from the Hong Kong University of Science and Technology (HKUST) have released Audio-Omni, a new model that aims to unify a wide range of audio generation tasks. Unlike specialized models designed for a single purpose, Audio-Omni is an "any-to-any" system, capable of handling diverse audio inputs and outputs.

The model is built on a diffusion-based architecture, which allows it to generate high-fidelity audio by progressively refining noise into a coherent signal. This single framework is designed to understand and process various audio modalities, from human speech to complex musical compositions and environmental sounds, treating them all as interchangeable data types.

A Generalist Approach to Audio

Audio-Omni's versatility allows it to perform a broad set of tasks that would typically require multiple different models. As detailed on its Hugging Face repository, its key capabilities include:

  • Conversion: Transforming speech to music, music to speech, or one style of music to another.
  • Generation: Creating music or speech from text prompts.
  • Editing: Modifying existing audio, such as separating stems or in-painting missing sections.
  • Continuation: Extending an existing audio clip in a consistent style.

This release represents another step toward building more generalized foundation models for audio. By consolidating disparate tasks into one model, Audio-Omni points to a future where audio generation is less fragmented and more universally accessible. The model is available for research and non-commercial use under a CC BY-NC 4.0 license.

Sources

  • HKUSTAudio/Audio-Omni

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters—
Context window—
LicenseCC-BY-NC-4.0
Downloads0

Modalities

Any-to-AnyMusicText → Speech

More in Any-to-Any

MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4 12B Multimodal Model

The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.

May 23, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4, a 12B 'Any-to-Any' Model

The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.

May 23, 2026