The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestAIDC-AIU1
AIDC-AIAny-to-Any

Ovis-U1-3B Unifies Image Understanding and Generation

The new 3-billion-parameter model from AIDC-AI combines vision-language understanding and image generation into a single 'any-to-any' framework.

Jun 28, 2025
NotableApache 2.0
AIDC-AI · Any-to-Any
Ovis-U1-3B
Ovis-U1-3B

The field of open-source multimodal AI has a flexible new entry with the release of Ovis-U1-3B, a 3-billion-parameter model from research group AIDC-AI. Released under a permissive Apache 2.0 license, Ovis aims to bridge the gap between models that understand images and those that create them.

Unlike specialized models that handle either vision-language tasks or text-to-image generation, Ovis-U1-3B is designed as a unified, "any-to-any" system. This means it can accept various combinations of text and images as input to produce either text or images as output. The model's capabilities include standard tasks like visual question answering and image captioning, but also extend to text-to-image generation and instruction-based image editing within the same architecture.

According to the project's release notes, Ovis is built upon a pre-trained language model and a stable diffusion model, using a shared interface to manage its diverse set of tasks. This integrated approach allows it to handle complex instructions that might involve both analyzing and modifying an image in a single turn.

The significance of Ovis lies in its versatility at a relatively compact size. By combining traditionally separate capabilities, the model provides a foundation for more integrated and conversational AI assistants. For developers, this could simplify the toolchain required for building applications that need to both see and create. The model and its weights are available now on Hugging Face.

Sources

  • AIDC-AI/Ovis-U1-3B

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters3B
Context window—
LicenseAPACHE-2.0
Downloads441

Modalities

Any-to-AnyVision-LanguageText → Image

More in Any-to-Any

MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4 12B Multimodal Model

The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.

May 23, 2026
Google DeepMind
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases Gemma 4, a 12B 'Any-to-Any' Model

The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.

May 23, 2026