The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

Chronological

Release timeline

Every release we've tracked, day by day.

Wednesday, June 17, 2026

1 release

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases MIT-Licensed GLM-5.2 MoE Model

The new bilingual model from the Chinese AI firm uses a Mixture of Experts architecture and sparse attention under a fully permissive license.

Jun 17, 2026
Text / LLMReasoning
GLM-5.2
GLM-5.2
Friday, June 12, 2026

1 release

Weibo AI/Reasoning

Weibo AI Releases VibeThinker-3B, a Compact Reasoning Model

The new 3-billion-parameter model from the Chinese tech giant focuses on challenging benchmarks in mathematics, coding, and graduate-level questions.

Jun 12, 2026
ReasoningText / LLM
VibeThinker-3B
VibeThinker-3B
Thursday, June 11, 2026

2 releases

Moonshot AI/CodeMajor release

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
CodeVision-Language
Kimi-K2.7-Code
Kimi-K2.7-Code
Zyphra/Text → Speech

Zyphra Releases Open-Source Zonos 2 TTS Model

The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Jun 11, 2026
Text → Speech
Zonos 2
Zonos 2
Tuesday, June 9, 2026

2 releases

Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
Text / LLMVision-Language
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Zhipu AI/Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation

The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

Jun 9, 2026
Image → Video
SCAIL-2
SCAIL-2
Friday, June 5, 2026

1 release

Cohere/Code

Cohere Releases North-Mini-Code, an Open MoE Model

The new Apache 2.0-licensed model is designed for code generation and agentic chat applications, using a Mixture-of-Experts architecture for efficiency.

Jun 5, 2026
CodeText / LLM
North-Mini-Code 1.0
North-Mini-Code 1.0
Thursday, June 4, 2026

1 release

Boson AI/Text → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026
Text → Speech
Higgs Audio v3 TTS 4B
Higgs Audio v3 TTS 4B
Tuesday, June 2, 2026

2 releases

MiniMax/Vision-LanguageMajor release

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026
Vision-LanguageAny-to-Any
MiniMax-M3
MiniMax-M3
JD/Text → Video

JD.com Enters Open-Source AI Video with JoyAI-Echo

The Chinese e-commerce giant has released a new model capable of generating long-form, multi-shot videos with synchronized audio from text prompts.

Jun 2, 2026
Text → Video
JoyAI-Echo
JoyAI-Echo
Saturday, May 30, 2026

1 release

Ideogram/Text → Image

Ideogram 4.0: A 9.3B Open-Weight Text-to-Image Model

The new 9.3 billion parameter model uses a Diffusion Transformer architecture and excels at rendering coherent text within generated images.

May 30, 2026
Text → Image
Ideogram 4.0
Ideogram 4.0
Friday, May 29, 2026

1 release

Baidu/Text → Video

Baidu Releases NAVA for Text-to-Video with Audio

The new model from the Chinese tech giant uses a Multimodal Diffusion Transformer to generate synchronized audio and video from text or image prompts.

May 29, 2026
Text → Video
NAVA
NAVA
Monday, May 25, 2026

1 release

OpenMOSS/Text → Speech

MOSS-TTS Aims for More Robust Speech Synthesis

A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

May 25, 2026
Text → Speech
MOSS-TTS v1.5
MOSS-TTS v1.5
Saturday, May 23, 2026

2 releases

Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4 12B Multimodal Model

The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.

May 23, 2026
Any-to-AnyVision-Language
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4, a 12B 'Any-to-Any' Model

The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.

May 23, 2026
Any-to-AnyVision-Language
Gemma 4 12B
Gemma 4 12B
Thursday, May 21, 2026

2 releases

NVIDIA/Image → Video

NVIDIA Releases Cosmos3 Image-to-Video World Model

The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.

May 21, 2026
Image → Video
Cosmos3 Super Image2Video
Cosmos3 Super Image2Video
MisoLabs/Text → Speech

MisoLabs Debuts MisoTTS, an Open Voice Model

The new text-to-speech system adapts the decoder-only architecture of language models like Llama to generate more natural-sounding speech.

May 21, 2026
Text → Speech
MisoTTS
MisoTTS
Tuesday, May 19, 2026

1 release

zhifeixie/Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026
Speech → Text
Mega-ASR
Mega-ASR
Monday, May 18, 2026

1 release

NVIDIA/Image → Video

NVIDIA Releases SANA, a Camera-Controllable Video Model

The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.

May 18, 2026
Image → VideoText → Video
SANA-WM Bidirectional
SANA-WM Bidirectional
Friday, May 15, 2026

2 releases

NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026
Speech → Text
Nemotron 3.5 ASR Streaming 0.6B
Nemotron 3.5 ASR Streaming 0.6B
ByteDance/Any-to-AnyMajor release

ByteDance Releases Lance, a Unified Generative AI Model

The 3-billion-parameter model handles image and video generation, editing, and understanding from a single set of weights under a permissive license.

May 15, 2026
Any-to-AnyText → Image
Lance
Lance
Thursday, May 14, 2026

1 release

SenseTime/Any-to-Any

SenseTime Releases 8B 'Any-to-Any' Infographic Model

The new 8B-parameter SenseNova U1 model from SenseTime is designed for complex multimodal tasks, including the in-conversation generation and editing of infographics.

May 14, 2026
Any-to-AnyText → Image
SenseNova U1 8B MoT Infographic
SenseNova U1 8B MoT Infographic
Monday, May 11, 2026

2 releases

Lightricks/Image → Video

Lightricks Releases LoRA for AI Lip-Dubbing

The new 'Identity-Control' adapter fine-tunes the company's LTX-2.3 video model to create realistic lip-syncing for dubbing workflows.

May 11, 2026
Image → VideoText → Video
LTX-2.3
LTX-2.3
Tencent/Text / LLM

Tencent Releases 1.8B Model for Multilingual Translation

The 1.8 billion-parameter model from the Chinese tech giant is designed for high-quality translation across a wide range of language pairs.

May 11, 2026
Text / LLM
Hunyuan-MT2 1.8B
Hunyuan-MT2 1.8B
Wednesday, May 6, 2026

1 release

Supertone/Text → Speech

Supertone Releases On-Device Multilingual TTS Model

The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.

May 6, 2026
Text → Speech
Supertonic 3
Supertonic 3
Tuesday, April 28, 2026

1 release

NVIDIA/Image Editing

NVIDIA Releases PiD for High-Quality Image Upscaling

The new component is a specialized VAE decoder that works with Stability AI's Z-Image model to enhance super-resolution tasks.

Apr 28, 2026
Image Editing
NVIDIA PiD (Pixel Diffusion Decoder)
NVIDIA PiD (Pixel Diffusion Decoder)
Friday, April 24, 2026

1 release

NVIDIA/Any-to-Any

NVIDIA Releases Efficient Nemotron-3 Multimodal MoE

The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.

Apr 24, 2026
Any-to-AnyReasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Thursday, April 23, 2026

5 releases

Google DeepMind/Any-to-Any

Google Releases Gemma 4 Multimodal Open Model

The new 26-billion-parameter model from DeepMind uses a mixture-of-experts design for greater efficiency and is tuned for assistant-style tasks.

Apr 23, 2026
Any-to-AnyText / LLM
Gemma 4 26B-A4B Instruct (MoE)
Gemma 4 26B-A4B Instruct (MoE)
Google DeepMind/Any-to-AnyMajor release

Google Releases Multimodal Gemma 4 31B Model

The new 31-billion-parameter model is an instruction-tuned, 'any-to-any' powerhouse released under a permissive Apache 2.0 license.

Apr 23, 2026
Any-to-AnyText / LLM
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-Any

Google Releases 4B Multimodal Gemma 4 Assistant

The new 4-billion-parameter model is instruction-tuned for 'any-to-any' tasks, handling a flexible mix of data types.

Apr 23, 2026
Any-to-AnyText / LLM
Gemma 4 E4B-it Assistant
Gemma 4 E4B-it Assistant
Google DeepMind/Any-to-Any

Google Releases 2B Multimodal Gemma 4 Assistant Model

The new compact model from DeepMind is instruction-tuned for "any-to-any" tasks, capable of processing and generating mixed data types.

Apr 23, 2026
Any-to-AnyText / LLM
Gemma 4 E2B-it Assistant
Gemma 4 E2B-it Assistant
Xiaomi/Speech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026
Speech → Text
MiMo-V2.5-ASR
MiMo-V2.5-ASR
Wednesday, April 22, 2026

4 releases

inclusionAI/Any-to-Any

LLaDA2.0-Uni: A Unified MoE for Vision Tasks

The new open-source model from inclusionAI uses a Mixture-of-Experts architecture to handle multiple vision tasks in a single, diffusion-based system.

Apr 22, 2026
Any-to-AnyText → Image
LLaDA2.0-Uni
LLaDA2.0-Uni
DeepSeek/Text / LLMMajor release

DeepSeek Releases V4-Pro, an Open MoE Contender

The new flagship model combines a Mixture-of-Experts architecture with a permissive MIT license, positioning it for wide commercial adoption.

Apr 22, 2026
Text / LLMReasoning
DeepSeek-V4-Pro
DeepSeek-V4-Pro
DeepSeek/Text / LLMMajor release

DeepSeek Releases V4-Flash, a Fast MIT-Licensed MoE Model

The new Mixture of Experts model from the Beijing-based AI lab is optimized for fast, efficient conversational AI and carries a fully permissive license.

Apr 22, 2026
Text / LLMReasoning
DeepSeek-V4-Flash
DeepSeek-V4-Flash
SenseTime/Any-to-Any

SenseTime Releases 8B Any-to-Any Multimodal Model

The new SenseNova-U1 model unifies image understanding, generation, and editing within a single 8-billion-parameter framework.

Apr 22, 2026
Any-to-AnyText → Image
SenseNova-U1-8B-MoT
SenseNova-U1-8B-MoT
Tuesday, April 21, 2026

1 release

Qwen · Alibaba/Vision-Language

Alibaba's Qwen Releases Open 27B Vision Model

The new dense model, licensed under Apache 2.0, brings both text and image understanding to the midrange parameter space.

Apr 21, 2026
Vision-LanguageText / LLM
Qwen3.6-27B
Qwen3.6-27B
Monday, April 20, 2026

1 release

NVIDIA/Any-to-Any

NVIDIA Releases Nemotron-3-Nano Omni-Modal MoE

The new 30-billion-parameter Mixture-of-Experts model handles any combination of modalities with just 3 billion active parameters.

Apr 20, 2026
Any-to-AnyReasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Nemotron-3 Nano Omni 30B-A3B Reasoning
Friday, April 17, 2026

1 release

Resemble AI/Text → Speech

Resemble AI Releases Dramabox Voice Cloning TTS Model

The new text-to-speech model uses a diffusion-transformer architecture for high-quality, expressive audio and one-shot voice cloning.

Apr 17, 2026
Text → Speech
Dramabox TTS
Dramabox TTS
Thursday, April 16, 2026

1 release

IBM/Speech → Text

IBM Releases 2B Granite Model for Multilingual Speech

The new two-billion-parameter model offers transcription capabilities for at least five major languages under a permissive Apache 2.0 license.

Apr 16, 2026
Speech → Text
Granite Speech 4.1 2B
Granite Speech 4.1 2B
Wednesday, April 15, 2026

1 release

Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases 35B Multimodal Mixture-of-Experts Model

The new Qwen3.6-35B-A3B from Alibaba's Qwen team combines vision and language capabilities using an efficient sparse architecture.

Apr 15, 2026
Vision-LanguageText / LLM
Qwen3.6-27B
Qwen3.6-27B
Tuesday, April 14, 2026

2 releases

Motif Technologies/Text → Video

Motif Releases 2B Open-Source Text-to-Video Model

The new Apache 2.0 licensed model uses a diffusion transformer architecture to offer a new open alternative for video generation research.

Apr 14, 2026
Text → VideoImage → Video
Motif-Video-2B
Motif-Video-2B
Moonshot AI/Vision-LanguageMajor release

Moonshot AI Releases Kimi-K2.6 Multimodal Model

The Chinese AI lab has published weights for its new vision-language model, though a restrictive license limits its use to research applications.

Apr 14, 2026
Vision-LanguageText / LLM
Kimi-K2.6
Kimi-K2.6
Monday, April 13, 2026

1 release

OpenBMB/Vision-Language

OpenBMB Releases MiniCPM-V for On-Device Vision

The new open-source vision-language model is designed for high-resolution image understanding on mobile and edge devices.

Apr 13, 2026
Vision-Language
MiniCPM-V-4.6
MiniCPM-V-4.6
Thursday, April 9, 2026

1 release

MiniMax/Text / LLM

MiniMax Releases M2.7, an MoE Model with FP8 Weights

The new conversational language model from the Chinese AI company uses a Mixture-of-Experts architecture and 8-bit weights, but is released under a restrictive custom license.

Apr 9, 2026
Text / LLMReasoning
MiniMax-M2.7
MiniMax-M2.7
Tuesday, April 7, 2026

1 release

Baidu/Text → Image

Baidu Releases 8B Text-to-Image Model ERNIE-Image

The large diffusion model from the Chinese tech giant is available under the commercially permissive Apache 2.0 license, a notable release for the community.

Apr 7, 2026
Text → Image
ERNIE-Image
ERNIE-Image
Monday, April 6, 2026

1 release

Black Forest Labs/Text → Image

Black Forest Labs Releases Open FLUX.2 Image Decoder

This new component is part of a novel transformer-based architecture for text-to-image generation, released under a permissive Apache 2.0 license.

Apr 6, 2026
Text → ImageImage Editing
FLUX.2 small decoder
FLUX.2 small decoder
Friday, April 3, 2026

2 releases

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases Open-Source GLM-5.1 MoE Model

The new bilingual model from the Chinese AI firm features an efficient Mixture-of-Experts architecture and a fully permissive MIT license.

Apr 3, 2026
Text / LLMReasoning
GLM-5.1
GLM-5.1
OpenBMB/Text → Speech

OpenBMB Releases VoxCPM2 for Expressive TTS

The new diffusion-based model from the OpenBMB research group supports multilingual speech, emotional control, and zero-shot voice cloning.

Apr 3, 2026
Text → Speech
VoxCPM2
VoxCPM2
Thursday, April 2, 2026

2 releases

OpenMOSS/Text → Speech

MOSS-TTS-Nano Delivers Multilingual Speech at 100M Params

The new open-source model from OpenMOSS-Team generates high-quality speech in multiple languages while maintaining a remarkably small footprint.

Apr 2, 2026
Text → Speech
MOSS-TTS-Nano-100M
MOSS-TTS-Nano-100M
Tencent/Vision-Language

Tencent Releases 2B Vision Model for Robotics

The new HY-Embodied 0.5 is a vision-language model designed specifically for multi-object tracking in dynamic, real-world environments.

Apr 2, 2026
Vision-Language
HY-Embodied 0.5
HY-Embodied 0.5
Tuesday, March 31, 2026

2 releases

Tencent/Image → Video

Tencent Releases HY-OmniWeaving for Multi-Image Video

Built on their HunyuanVideo-1.5 architecture, the new model synthesizes video by combining multiple static images and text prompts into a cohesive narrative.

Mar 31, 2026
Image → VideoText → Video
HY-OmniWeaving
HY-OmniWeaving
JD/Image Editing

JD.com Releases Open-Source Bilingual Image Editor

The new JoyAI-Image-Edit model allows for instruction-based photo manipulation in both English and Chinese under a permissive Apache 2.0 license.

Mar 31, 2026
Image Editing
JoyAI-Image-Edit
JoyAI-Image-Edit
Monday, March 30, 2026

2 releases

KRAFTON/Any-to-Any

KRAFTON Releases 9B Bilingual Speech Model

The gaming giant behind 'PUBG' has released Raon-Speech-9B, a multimodal model for English and Korean speech recognition and synthesis.

Mar 30, 2026
Any-to-AnyText → Speech
Raon-Speech-9B
Raon-Speech-9B
k2-fsa/Text → Speech

OmniVoice TTS Offers Zero-Shot Multilingual Voice Cloning

A new open-source text-to-speech model from the k2-fsa project can replicate a voice and generate speech in multiple languages from a single short audio sample.

Mar 30, 2026
Text → Speech
OmniVoice
OmniVoice
Friday, March 27, 2026

1 release

HKUSTAudio/Any-to-Any

HKUST Releases Audio-Omni, a Unified Audio Model

The new diffusion-based model handles speech, music, and general audio tasks like conversion and editing within a single, versatile framework.

Mar 27, 2026
Any-to-AnyMusic
Audio-Omni
Audio-Omni
Wednesday, March 25, 2026

1 release

Meituan/Any-to-Any

Meituan Releases LongCat-Next 'Any-to-Any' AI Model

The Chinese tech company has released the weights for a unified model that can process and generate combinations of text, images, audio, and video.

Mar 25, 2026
Any-to-AnyText / LLM
LongCat-Next
LongCat-Next
Tuesday, March 24, 2026

1 release

Cohere/Speech → Text

Cohere Releases Top-Ranked Multilingual Transcription Model

The new automatic speech recognition model from Cohere Labs sets a new benchmark on the Hugging Face Open ASR Leaderboard for multilingual performance.

Mar 24, 2026
Speech → Text
Cohere Transcribe 03-2026
Cohere Transcribe 03-2026
Monday, March 23, 2026

1 release

Aratako/Text → Speech

Irodori-TTS v2 Offers Open Japanese Speech Synthesis

The 500-million-parameter model from researcher Aratako provides a high-quality, single-speaker voice under a permissive MIT license.

Mar 23, 2026
Text → Speech
Irodori-TTS-500M v2
Irodori-TTS-500M v2
Saturday, March 21, 2026

1 release

GAIR/Image → Video

GAIR Releases daVinci-MagiHuman for Video Generation

The new open-source model from the General Artificial Intelligence Research team can create video clips complete with audio from a variety of inputs.

Mar 21, 2026
Image → VideoText → Video
daVinci-MagiHuman
daVinci-MagiHuman
Wednesday, March 18, 2026

1 release

Baidu/Vision-Language

Baidu Releases Qianfan-OCR for Document Intelligence

The new vision-language model from the Chinese tech giant is designed for complex, multilingual optical character recognition and layout analysis.

Mar 18, 2026
Vision-Language
Qianfan-OCR
Qianfan-OCR
Wednesday, March 11, 2026

2 releases

Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4, a 26B Vision-Language Model

The new open-source model from DeepMind uses a Mixture-of-Experts architecture to handle both text and image inputs efficiently.

Mar 11, 2026
Vision-LanguageText / LLM
Gemma 4 12B
Gemma 4 12B
Google DeepMind/Any-to-AnyMajor release

Google Releases Multimodal Gemma 4 31B Model

The new 31-billion-parameter model is instruction-tuned and can process both text and images, marking a significant expansion for the Gemma family.

Mar 11, 2026
Vision-LanguageText / LLM
Gemma 4 12B
Gemma 4 12B
Monday, March 9, 2026

2 releases

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases 9B FLUX.2 klein Image Model

The new open-weight model offers a more compact, distilled version of the advanced FLUX architecture for text-to-image and editing tasks.

Mar 9, 2026
Text → ImageImage Editing
FLUX.2 Klein 9B
FLUX.2 Klein 9B
OpenAI/Text → Speech

Fish Audio's S2-Pro Brings Expressive TTS to Open Source

The new text-to-speech model can follow natural language instructions to control tone, clone voices from short clips, and speak multiple languages.

Mar 9, 2026
Text → Speech
Fish Audio S2-Pro
Fish Audio S2-Pro
Wednesday, March 4, 2026

1 release

Lightricks/Image → Video

Lightricks LTX-2.3 Generates Video and Audio Together

The new model, based on Stable Video Diffusion, can create video and a corresponding soundtrack simultaneously from text, image, or audio prompts.

Mar 4, 2026
Image → VideoText → Video
LTX-2.3
LTX-2.3
Monday, March 2, 2026

6 releases

NVIDIA/Vision-Language

NVIDIA's New 3B VLM Pinpoints Objects in Images

The new 3-billion-parameter model, based on the company's Eagle architecture, is designed for high-precision visual grounding tasks.

Mar 2, 2026
Vision-Language
LocateAnything-3B
LocateAnything-3B
Google DeepMind/Any-to-AnyMajor release

Google Releases Compact Gemma 4 E2B Multimodal Model

The new 2-billion-parameter model from Google DeepMind brings efficient image-and-text understanding to the open-source Gemma family.

Mar 2, 2026
Any-to-AnyVision-Language
Gemma 4 E2B
Gemma 4 E2B
Google DeepMind/Any-to-AnyMajor release

Google's Gemma 4 Arrives with Any-to-Any Multimodal Skills

The new 2-billion-parameter model from DeepMind can process text, vision, and audio, making it a versatile and efficient foundation for developers.

Mar 2, 2026
Any-to-AnyVision-Language
Gemma 4 E2B
Gemma 4 E2B
Google DeepMind/Any-to-Any

Google Releases Gemma 4 E4B, a 4B Multimodal Model

The new 4-billion-parameter vision-language model brings image and text understanding to Google's popular open-source family.

Mar 2, 2026
Any-to-AnyVision-Language
Gemma 4 E4B
Gemma 4 E4B
Google DeepMind/Any-to-AnyMajor release

Google's Gemma 4 Debuts with Any-to-Any Multimodality

The new 4-billion parameter model from Google DeepMind is designed for versatile input and output, handling text, images, and other data types.

Mar 2, 2026
Any-to-AnyVision-Language
Gemma 4 E4B
Gemma 4 E4B
Xiaomi/Image Editing

Xiaomi Releases Bilingual Image Editing Model FireRed 1.1

The new open-source model from Xiaomi's FireRedTeam leverages the Qwen-Image-Edit pipeline to offer instruction-based image editing in both English and Chinese.

Mar 2, 2026
Image Editing
FireRed Image Edit 1.1
FireRed Image Edit 1.1
Saturday, February 28, 2026

1 release

Qwen · Alibaba/Vision-Language

Alibaba's Qwen Releases Compact 0.8B Vision Model

The new 800-million-parameter model is the smallest in the Qwen3.5 family, designed for efficient multimodal tasks on consumer-grade hardware.

Feb 28, 2026
Vision-LanguageText / LLM
Qwen3.5-0.8B
Qwen3.5-0.8B
Friday, February 27, 2026

3 releases

IBM/Speech → Text

IBM Releases 1B Granite Model for Multilingual Speech

The new Apache 2.0-licensed model is part of the company's Granite family and aims to provide high-quality speech-to-text across several languages.

Feb 27, 2026
Speech → Text
Granite 4.0 1B Speech
Granite 4.0 1B Speech
Qwen · Alibaba/Vision-Language

Alibaba's Qwen team releases 4B vision-language model

The new Qwen3.5-4B model combines text and image understanding in a compact, permissively licensed package for developers.

Feb 27, 2026
Vision-LanguageText / LLM
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B
Qwen · Alibaba/Vision-Language

Qwen Releases 9B Multimodal Model in New 3.5 Series

The new open-source vision-language model from Alibaba's Qwen team offers strong performance in a compact, Apache 2.0-licensed package.

Feb 27, 2026
Vision-LanguageText / LLM
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B
Tuesday, February 24, 2026

3 releases

Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases Flagship 122B Multimodal MoE Model

The new Qwen3.5-122B-A10B combines a massive parameter count with an efficient Mixture-of-Experts architecture for advanced vision and language tasks.

Feb 24, 2026
Vision-LanguageText / LLM
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B
Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases 27B Vision Model with Long Context

The new model from Alibaba's Qwen team combines multimodal understanding with a 131K token context window under a permissive Apache 2.0 license.

Feb 24, 2026
Vision-LanguageText / LLM
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B
Qwen · Alibaba/Vision-Language

Qwen Releases Efficient 35B Multimodal MoE Model

The new Qwen3.5-35B-A3B model from Alibaba combines vision and language capabilities with a resource-friendly Mixture of Experts design.

Feb 24, 2026
Vision-LanguageText / LLM
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B
Monday, February 16, 2026

2 releases

Qwen · Alibaba/Vision-LanguageMajor release

Qwen releases flagship 397B multimodal MoE

The new open-source model from Alibaba uses a Mixture-of-Experts architecture to balance massive scale with efficient inference.

Feb 16, 2026
Vision-LanguageText / LLM
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B
OpenAI/Text → Speech

Hume AI Releases 3B Multilingual Text-to-Speech Model

The new model, Tada-3B-ML, is designed for fine-grained control over vocal expression across more than 10 languages.

Feb 16, 2026
Text → Speech
Tada-3B-ML
Tada-3B-ML
Thursday, February 12, 2026

2 releases

OpenAI/Text → Speech

Kani-TTS-2 Offers New Open-Source Voice Generation

An independent researcher has released a new English text-to-speech model under a permissive license, built on a modern generative foundation.

Feb 12, 2026
Text → Speech
Kani-TTS-2 (English)
Kani-TTS-2 (English)
MiniMax/Text / LLM

MiniMax Releases M2.5 Mixture-of-Experts Model

The Chinese AI company's first open-weight release uses an efficient FP8 data type but comes with a restrictive, non-commercial license.

Feb 12, 2026
Text / LLM
MiniMax-M2.5
MiniMax-M2.5
Wednesday, February 11, 2026

1 release

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases Open-Source GLM-5 MoE Model

The new Mixture-of-Experts model from the Chinese AI company combines an advanced architecture with a fully permissive MIT license for commercial use.

Feb 11, 2026
Text / LLMReasoning
GLM-5
GLM-5
Tuesday, February 10, 2026

2 releases

inclusionAI/Any-to-Any

inclusionAI's Ming 2.0 Tackles Any-to-Any Multimodality

The new open-source Mixture-of-Experts model can process and generate content across text, images, and audio in any combination.

Feb 10, 2026
Any-to-Any
Ming-flash-omni 2.0
Ming-flash-omni 2.0
Nanbeige/Text / LLM

Nanbeige Releases 3B Chinese-Enhanced Language Model

The new Llama-based model was trained from scratch on 3.5 trillion tokens of Chinese and English data to enhance its bilingual capabilities.

Feb 10, 2026
Text / LLM
Nanbeige4.1-3B
Nanbeige4.1-3B
Friday, February 6, 2026

2 releases

OpenMOSS/Text → Speech

MOSS-TTS: A New Multilingual Text-to-Speech Model

The new system from the OpenMOSS Team uses a novel 'delay-pattern' architecture to generate natural-sounding speech in Chinese, English, and Japanese.

Feb 6, 2026
Text → Speech
MOSS-TTS
MOSS-TTS
OpenAI/Music

Soul-AILab Releases Zero-Shot Singing Voice Model

The new model, SoulX-Singer, can replicate a singing voice from a short audio sample and supports both English and Chinese under a permissive license.

Feb 6, 2026
MusicText → Speech
SoulX-Singer
SoulX-Singer
Tuesday, February 3, 2026

1 release

OpenBMB/Any-to-Any

OpenBMB Releases 'Any-to-Any' Multimodal Model

The new MiniCPM-o 4.5 model from the open-source research group can process and generate interleaved combinations of images, text, and audio.

Feb 3, 2026
Any-to-AnyVision-Language
MiniCPM-o 4.5
MiniCPM-o 4.5
Monday, February 2, 2026

1 release

OpenBMB/Any-to-Any

MiniCPM-o 4.5 Offers 'Any-to-Any' Multimodal AI

The new model from OpenBMB supports mixed-modality inputs and outputs, from text and images to audio and video, in a single efficient package.

Feb 2, 2026
Any-to-AnyVision-Language
MiniCPM-o 4.5
MiniCPM-o 4.5
Friday, January 30, 2026

2 releases

Qwen · Alibaba/Code

Qwen Releases Coder-Next, A New Open MoE Coding Model

The new model from Alibaba's Qwen team uses a Mixture-of-Experts architecture and is released under the commercially-friendly Apache 2.0 license.

Jan 30, 2026
CodeText / LLM
Qwen3-Coder-Next
Qwen3-Coder-Next
Zhipu AI/Vision-Language

Zhipu AI Releases Multilingual GLM-OCR Vision Model

The new vision-language model from the creators of the GLM series is specialized for recognizing and extracting text from images across multiple languages.

Jan 30, 2026
Vision-Language
GLM-OCR
GLM-OCR
Wednesday, January 28, 2026

7 releases

Black Forest Labs/Text → Image

Black Forest Labs Releases FLUX.2 Klein 9B

The open-weight text-to-image model brings a 9-billion-parameter base release to the FLUX.2 Klein family.

Jan 28, 2026
Text → Image
FLUX.2 Klein 9B
OpenMOSS/Any-to-Any

OpenMOSS Releases MOVA, a 720p Multimodal Video Generator

The new open model can generate high-definition video with synchronized audio from a flexible combination of text and image prompts.

Jan 28, 2026
Any-to-AnyImage → Video
MOVA 720p
MOVA 720p
Baidu/Vision-Language

Baidu Releases Open VLM for Advanced Document OCR

The new PaddleOCR-VL model is built to parse not just text, but also the tables, formulas, and page layouts found in complex documents.

Jan 28, 2026
Vision-Language
PaddleOCR-VL-1.5
PaddleOCR-VL-1.5
OpenMOSS/Image → Video

OpenMOSS Releases MOVA for Joint Video and Audio Gen

The new model generates 360p video from text or images and creates corresponding audio tracks simultaneously, a notable step for integrated audiovisual synthesis.

Jan 28, 2026
Image → VideoText → Video
MOVA-360p
MOVA-360p
Qwen · Alibaba/Speech → Text

Qwen Releases 0.6B Model for Audio-Text Alignment

The new open-source tool, based on the Qwen3 architecture, precisely synchronizes audio recordings with their corresponding text transcripts.

Jan 28, 2026
Speech → Text
Qwen3 ForcedAligner 0.6B
Qwen3 ForcedAligner 0.6B
Qwen · Alibaba/Speech → Text

Qwen3 Family Expands into Speech Recognition

Alibaba's Qwen team has released a new 1.7-billion-parameter model designed specifically for automatic speech recognition.

Jan 28, 2026
Speech → Text
Qwen3-ASR-1.7B
Qwen3-ASR-1.7B
Qwen · Alibaba/Speech → Text

Qwen open-sources compact model for speech recognition

The new 600-million-parameter Qwen3-ASR model is designed for efficient, high-quality audio transcription under a permissive license.

Jan 28, 2026
Speech → Text
Qwen3-ASR-0.6B
Qwen3-ASR-0.6B
Tuesday, January 27, 2026

1 release

DeepSeek/Vision-Language

DeepSeek-OCR-2 Tackles Multilingual Document AI

The new open vision-language model is designed to extract text and understand structure from complex, multilingual documents.

Jan 27, 2026
Vision-Language
DeepSeek-OCR-2
DeepSeek-OCR-2
Monday, January 26, 2026

1 release

robbyant/Image → Video

Lingbot-World Animates Images with Camera Control

The new open-source world model from researcher robbyant generates short video clips from a single image, giving users control over the virtual camera path.

Jan 26, 2026
Image → Video
Lingbot World Base Cam
Lingbot World Base Cam
Friday, January 23, 2026

1 release

Qwen · Alibaba/Text → Image

Alibaba's Qwen Team Releases Z-Image Diffusion Model

The makers of the popular Qwen language models have published their first open-source text-to-image generator with a permissive Apache 2.0 license.

Jan 23, 2026
Text → Image
Z-Image
Z-Image
Thursday, January 22, 2026

1 release

OpenMOSS/Text → Speech

LuxTTS Delivers Lightweight, Open-Source Speech Synthesis

The new text-to-speech model is optimized for the ONNX runtime, making it a promising option for efficient, on-device audio generation.

Jan 22, 2026
Text → Speech
LuxTTS
LuxTTS
Wednesday, January 21, 2026

6 releases

Mistral AI/Speech → Text

Mistral Enters Speech AI with Voxtral Mini Model

The company, known for its powerful text models, has released its first open-source speech recognition system designed for real-time, multilingual transcription.

Jan 21, 2026
Speech → Text
Voxtral Mini 4B Realtime
Voxtral Mini 4B Realtime
Microsoft/Speech → Text

Microsoft Releases VibeVoice for Speech Transcription

The new open-source automatic speech recognition model handles multilingual transcription and speaker identification out of the box.

Jan 21, 2026
Speech → Text
VibeVoice ASR
VibeVoice ASR
Qwen · Alibaba/Text → Speech

Qwen Releases Open-Source Voice Cloning Model

The new 600-million-parameter Qwen3-TTS model can generate speech in multiple languages and clone voices from short audio clips.

Jan 21, 2026
Text → Speech
Qwen3-TTS 0.6B Base
Qwen3-TTS 0.6B Base
Qwen · Alibaba/Text → Speech

Qwen Releases a Compact Custom-Voice TTS Model

The new 600-million-parameter model from Alibaba's Qwen team can clone voices from short audio clips for multilingual speech synthesis.

Jan 21, 2026
Text → Speech
Qwen3-TTS-12Hz-0.6B CustomVoice
Qwen3-TTS-12Hz-0.6B CustomVoice
Qwen · Alibaba/Text → Speech

Qwen Releases Open 1.7B Custom Voice Synthesis Model

Alibaba's Qwen team has released a new text-to-speech model capable of cloning voices from just a few seconds of audio.

Jan 21, 2026
Text → Speech
Qwen3-TTS 1.7B CustomVoice
Qwen3-TTS 1.7B CustomVoice
Qwen · Alibaba/Text → Speech

Qwen Unveils Open Model for Custom Voice Synthesis

The new 1.7-billion-parameter text-to-speech model from Alibaba's Qwen team can generate novel voices from short audio prompts.

Jan 21, 2026
Text → Speech
Qwen3-TTS-12Hz-1.7B-VoiceDesign
Qwen3-TTS-12Hz-1.7B-VoiceDesign
Monday, January 19, 2026

1 release

Zhipu AI/Text / LLM

Zhipu AI Releases GLM-4.7-Flash MoE Model

The new Mixture-of-Experts model from the Beijing-based AI company is optimized for speed and released under the permissive MIT license.

Jan 19, 2026
Text / LLM
GLM-4.7-Flash
GLM-4.7-Flash
Friday, January 16, 2026

1 release

LightOn/Vision-Language

LightOn Releases OCR-2, a 1B Document AI Model

The new vision model from the Paris-based AI lab uses Mistral architecture to extract text and structure from complex documents like PDFs and forms.

Jan 16, 2026
Vision-Language
LightOnOCR-2 1B
LightOnOCR-2 1B
Wednesday, January 14, 2026

6 releases

Black Forest Labs/Text → Image

Black Forest Labs Releases 9B FLUX.2 Image Model

The new text-to-image model emphasizes speed and efficiency with a novel architecture and FP8 quantization.

Jan 14, 2026
Text → ImageImage Editing
FLUX.2 Klein 9B
FLUX.2 Klein 9B
OpenMOSS/Text → Speech

Soprano TTS Model Leverages Qwen3 Architecture

The new 80-million-parameter text-to-speech model adapts a powerful language model architecture for efficient, open-source audio generation.

Jan 14, 2026
Text → Speech
Soprano-1.1-80M
Soprano-1.1-80M
Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases Open-Source FLUX.2 Klein 4B

The new 4-billion-parameter model is a distilled version of the powerful FLUX.2 architecture, released under a commercially-friendly Apache 2.0 license.

Jan 14, 2026
Text → ImageImage Editing
FLUX.2 Klein 9B
FLUX.2 Klein 9B
Black Forest Labs/Text → Image

FLUX.2 Klein: A Compact 4B Open-Source Image Model

The new 4-billion-parameter model from Black Forest Labs offers an efficient, transformer-based alternative to latent diffusion for image generation.

Jan 14, 2026
Text → ImageImage Editing
FLUX.2 Klein 9B
FLUX.2 Klein 9B
Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases 9B FLUX.2 Image Model

The new 9-billion-parameter model uses a Diffusion Transformer architecture, promising higher performance than existing open-source alternatives.

Jan 14, 2026
Text → ImageImage Editing
FLUX.2 Klein 9B
FLUX.2 Klein 9B
Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases New FLUX.2 Image Model

The new 9-billion-parameter text-to-image model uses a novel architecture that operates directly on pixels for faster, more efficient generation.

Jan 14, 2026
Text → ImageImage Editing
FLUX.2 Klein 9B
FLUX.2 Klein 9B
Monday, January 12, 2026

2 releases

OpenAI/Text → Speech

Hume AI Releases TADA 1B for Expressive Speech

The new 1-billion-parameter model combines a Llama 3.2 base with text-to-speech to generate more natural and nuanced audio.

Jan 12, 2026
Text → Speech
TADA 1B
TADA 1B
Google DeepMind/Text / LLM

Google Releases TranslateGemma for Open Translation

The new 4B-parameter model is an instruction-tuned variant of Gemma, designed specifically for high-quality multilingual translation tasks.

Jan 12, 2026
Text / LLM
TranslateGemma 4B IT
TranslateGemma 4B IT
Sunday, January 11, 2026

1 release

OpenMOSS/Text → Speech

OpenMOSS Releases KugelAudio for European Languages

The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.

Jan 11, 2026
Text → Speech
KugelAudio-0-open
KugelAudio-0-open
Thursday, January 8, 2026

1 release

Zhipu AI/Text → Image

Zhipu AI Releases Open, Bilingual GLM-Image Model

The new text-to-image model is fluent in both Chinese and English, built on the CogView2 architecture and released under a permissive MIT license.

Jan 8, 2026
Text → Image
GLM-Image
GLM-Image
Wednesday, January 7, 2026

1 release

Google DeepMind/Vision-Language

Google's MedGemma brings open vision AI to medicine

The new 4-billion-parameter vision-language model is specialized for tasks in radiology, pathology, and complex clinical reasoning.

Jan 7, 2026
Vision-LanguageReasoning
MedGemma 1.5 4B IT
MedGemma 1.5 4B IT
Tuesday, January 6, 2026

1 release

Supertone/Text → Speech

Supertone Open-Sources Supertonic 2 Voice Model

The new text-to-speech model from the audio AI company supports English, Korean, and Spanish and comes in the efficient ONNX format for deployment.

Jan 6, 2026
Text → Speech
Supertonic 2
Supertonic 2
Saturday, January 3, 2026

1 release

Lightricks/Image → VideoMajor release

Lightricks Releases LTX-2 Multimodal Video Generator

The new diffusion model from the creative app company can generate short video clips from text, images, audio, and even other videos.

Jan 3, 2026
Image → VideoText → Video
LTX-2
LTX-2
Thursday, January 1, 2026

1 release

Moonshot AI/Vision-LanguageMajor release

Moonshot AI Releases Kimi K2.5 Multimodal Model

The new vision-language model from the Chinese AI firm uses a Mixture-of-Experts architecture and is now available on Hugging Face.

Jan 1, 2026
Vision-LanguageText / LLM
Kimi K2.5
Kimi K2.5
Tuesday, December 30, 2025

1 release

Qwen · Alibaba/Text → Image

Qwen Releases Bilingual Open-Source Image Model

Alibaba's latest text-to-image generator, Qwen-Image 2512, is optimized for creating visuals from both English and Chinese prompts.

Dec 30, 2025
Text → Image
Qwen-Image 2512
Qwen-Image 2512
Tuesday, December 23, 2025

1 release

Qwen · Alibaba/Any-to-Any

Qwen's Fun-Audio-Chat: An Open Speech-to-Speech LLM

The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.

Dec 23, 2025
Any-to-AnyText → Speech
Fun-Audio-Chat-8B
Fun-Audio-Chat-8B
Saturday, December 20, 2025

1 release

MiniMax/Text / LLM

MiniMax Debuts M2.1, an MoE Model Optimized with FP8

The new Mixture of Experts model from the Chinese AI firm uses 8-bit floating-point precision for a smaller memory footprint and faster inference.

Dec 20, 2025
Text / LLM
MiniMax-M2.1
MiniMax-M2.1
Thursday, December 18, 2025

1 release

Google DeepMind/Speech → Text

Google Releases MedASR for Medical Transcription

The new speech recognition model from DeepMind is trained specifically on medical dictation, aiming for higher accuracy in clinical notes.

Dec 18, 2025
Speech → Text
MedASR
MedASR
Wednesday, December 17, 2025

4 releases

OpenMOSS/Text → Speech

MiraTTS Brings Qwen2 to Bilingual Speech Synthesis

A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.

Dec 17, 2025
Text → Speech
MiraTTS
MiraTTS
ekwek/Text → Speech

Soprano-80M: A Tiny TTS Model Based on Qwen3

Developer 'ekwek' has released a compact 80-million-parameter text-to-speech model, notable for its unconventional use of a Qwen3 language model architecture.

Dec 17, 2025
Text → Speech
Soprano-80M
Soprano-80M
Qwen · Alibaba/Image Editing

Qwen Releases Open, Bilingual Image Editing Model

The new diffusion model from Alibaba's team allows for precise, instruction-based image modifications in both English and Chinese.

Dec 17, 2025
Image Editing
Qwen-Image-Edit 2511
Qwen-Image-Edit 2511
NVIDIA/Speech → Text

NVIDIA Releases Streaming Speech-to-Text Model

The 600-million-parameter Nemotron model is designed for real-time English transcription using a cache-aware FastConformer architecture.

Dec 17, 2025
Speech → Text
Nemotron Speech Streaming EN 0.6B
Nemotron Speech Streaming EN 0.6B
Monday, December 15, 2025

1 release

Qwen · Alibaba/Speech → Text

Qwen Releases Compact ASR Model for Streaming Audio

The new Fun-ASR-Nano model from Alibaba's team packs real-time multilingual transcription, speaker diarization, and hotword detection into an efficient package.

Dec 15, 2025
Speech → Text
Fun-ASR-Nano-2512
Fun-ASR-Nano-2512
Saturday, December 13, 2025

1 release

OpenBMB/Image → Video

PersonaLive Model Animates Portraits in Real Time

The new open-source model from OpenBMB uses a diffusion-based architecture to generate expressive video from a single still image.

Dec 13, 2025
Image → Video
PersonaLive
PersonaLive
Friday, December 12, 2025

1 release

Tencent/Image → Video

Tencent's HY-WorldPlay Creates 3D Scenes from One Image

The new model from Tencent's Hunyuan team generates dynamic video and reconstructs 3D environments using a single static picture.

Dec 12, 2025
Image → VideoText → 3D
HY-WorldPlay
HY-WorldPlay
Thursday, December 11, 2025

1 release

Qwen · Alibaba/Text → Speech

Alibaba Releases CosyVoice 3 for Expressive TTS

The new 500-million-parameter text-to-speech model from the Qwen team offers multilingual voice cloning and emotional control.

Dec 11, 2025
Text → Speech
Fun-CosyVoice3-0.5B
Fun-CosyVoice3-0.5B
Wednesday, December 10, 2025

1 release

Zhipu AI/Text → Speech

Zhipu AI Releases GLM-TTS for Zero-Shot Voice Cloning

This new text-to-speech model can replicate a voice from just a few seconds of audio, using a novel combination of flow matching and reinforcement learning.

Dec 10, 2025
Text → Speech
GLM-TTS
GLM-TTS
Tuesday, December 9, 2025

1 release

Zhipu AI/Speech → Text

Zhipu AI Releases Compact Bilingual Speech Model

The new GLM-ASR-Nano model is designed for efficient automatic speech recognition in both English and Mandarin Chinese.

Dec 9, 2025
Speech → Text
GLM-ASR-Nano-2512
GLM-ASR-Nano-2512
Sunday, December 7, 2025

1 release

Zhipu AI/Vision-Language

Zhipu AI Releases Fast, Open Vision Model GLM-4.6V-Flash

The new model from the GLM-4.6V family offers a fast, MIT-licensed option for developers working with both text and images.

Dec 7, 2025
Vision-Language
GLM-4.6V-Flash
GLM-4.6V-Flash
Friday, December 5, 2025

2 releases

OpenBMB/Text → Speech

VoxCPM 1.5 Brings Open-Source Voice Cloning

The new 500-million-parameter text-to-speech model from OpenBMB supports both English and Chinese and can replicate a voice from a short audio sample.

Dec 5, 2025
Text → Speech
VoxCPM 1.5
VoxCPM 1.5
Meituan/Image Editing

Meituan Releases Open, Bilingual Image Editing Model

The new LongCat-Image-Edit model follows natural language instructions to perform complex photo manipulations in both English and Chinese.

Dec 5, 2025
Image Editing
LongCat-Image-Edit
LongCat-Image-Edit
Thursday, December 4, 2025

2 releases

Baidu/Image → Video

Baidu's Live-Avatar Animates Photos With Audio

The new 14-billion-parameter model uses audio input to generate realistic talking head videos from a single still image.

Dec 4, 2025
Image → Video
Live-Avatar
Live-Avatar
Microsoft/Text → Speech

Microsoft Releases VibeVoice for Real-Time AI Speech

The new 500-million-parameter model is designed for generating natural, long-form speech with very low latency for interactive applications.

Dec 4, 2025
Text → Speech
VibeVoice Realtime 0.5B
VibeVoice Realtime 0.5B
Tuesday, December 2, 2025

1 release

Resemble AI/Text → Speech

Resemble AI Releases Chatterbox Turbo for Open TTS

The new text-to-speech model focuses on performance and offers voice cloning capabilities for English under a permissive MIT license.

Dec 2, 2025
Text → Speech
Chatterbox Turbo
Chatterbox Turbo
Monday, December 1, 2025

1 release

DeepSeek/Text / LLM

DeepSeek-V3.2 Arrives With FP8 Weights, MIT License

The new Mixture-of-Experts model from DeepSeek AI combines an efficient FP8 architecture with a fully permissive license for commercial use.

Dec 1, 2025
Text / LLM
DeepSeek-V3.2
DeepSeek-V3.2
Friday, November 28, 2025

1 release

FlashLabs/Any-to-Any

FlashLabs Releases Chroma-4B, an Any-to-Any Model

The new 4-billion-parameter model handles text, image, and speech inputs and outputs, including direct speech-to-speech translation.

Nov 28, 2025
Any-to-Any
Chroma-4B
Chroma-4B
Tuesday, November 25, 2025

1 release

Qwen · Alibaba/Text → Image

Alibaba Releases Z-Image-Turbo, A Fast Open Image Model

The new text-to-image model from the team behind Qwen uses a diffusion transformer to generate high-resolution images in just a single step.

Nov 25, 2025
Text → Image
Z-Image-Turbo
Z-Image-Turbo
Saturday, November 22, 2025

1 release

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases Open-Source FLUX.2 Image Model

The developer preview of the next-generation text-to-image architecture promises significant architectural improvements over its predecessor.

Nov 22, 2025
Text → ImageImage Editing
FLUX.2 [dev]
FLUX.2 [dev]
Tuesday, November 18, 2025

2 releases

NVIDIA/Text → VideoMajor release

Tencent Releases HunyuanVideo 1.5 Generation Model

The new diffusion model generates short video clips from text and image prompts, adding another major player to the open video space.

Nov 18, 2025
Text → VideoImage → Video
HunyuanVideo 1.5
HunyuanVideo 1.5
Tencent/Vision-Language

Tencent Releases 1B Parameter HunyuanOCR Model

The new vision-language model from Tencent Hunyuan offers a compact, end-to-end solution for optical character recognition.

Nov 18, 2025
Vision-Language
HunyuanOCR
HunyuanOCR
Monday, November 17, 2025

1 release

Mistral AI/Text → Speech

Mistral AI Releases Voxtral, an Open-Source TTS Model

The French AI leader expands beyond large language models with a new, 4-billion-parameter model for generating multilingual speech.

Nov 17, 2025
Text → Speech
Voxtral 4B TTS
Voxtral 4B TTS
Saturday, November 15, 2025

1 release

OpenAI/Text → Speech

Nari Labs Releases Dia2-2B, an Open Voice Cloning Model

The 2-billion-parameter text-to-speech model can clone voices from a short audio sample and is available under an Apache 2.0 license.

Nov 15, 2025
Text → Speech
Dia2-2B
Dia2-2B
Friday, November 7, 2025

1 release

NVIDIA/Vision-Language

Baidu Releases Open Vision-Language MoE Model

The new ERNIE 4.5 VL model brings advanced multimodal reasoning to the open-source community with an efficient Mixture-of-Experts architecture.

Nov 7, 2025
Vision-LanguageReasoning
ERNIE 4.5 VL 28B A3B Thinking
ERNIE 4.5 VL 28B A3B Thinking
Tuesday, November 4, 2025

1 release

Moonshot AI/ReasoningMajor release

Moonshot AI Releases Kimi-K2 Reasoning Model

The new Mixture-of-Experts model is designed for complex tasks but arrives in a custom compressed format with a restrictive license.

Nov 4, 2025
ReasoningText / LLM
Kimi K2 Thinking
Kimi K2 Thinking
Friday, October 31, 2025

1 release

Allen Institute for AI/Any-to-Any

BAAI Releases Emu3.5, an 'Any-to-Any' Multimodal Model

The new open-source model from the Allen Institute for AI unifies text and image understanding and generation into a single architecture.

Oct 31, 2025
Any-to-AnyVision-Language
Emu3.5
Emu3.5
Thursday, October 30, 2025

1 release

Microsoft/Vision-Language

Microsoft Releases Fara-7B Vision Agent Model

The 7-billion-parameter model is designed to understand and interact with graphical user interfaces, building on Alibaba's open-source Qwen2.5-VL.

Oct 30, 2025
Vision-Language
Fara-7B
Fara-7B
Monday, October 27, 2025

1 release

OpenMOSS/Text → Speech

SoulX-Podcast 1.7B Offers Open Multi-Speaker TTS

The new 1.7 billion-parameter model from OpenMOSS is trained on conversational data to generate natural dialogue in English and Chinese.

Oct 27, 2025
Text → Speech
SoulX-Podcast 1.7B
SoulX-Podcast 1.7B
Friday, October 24, 2025

1 release

Meituan/Text → Video

Meituan Releases Open-Source LongCat-Video Model

The Chinese tech giant has released a new MIT-licensed model capable of generating video from text, images, or by continuing existing clips.

Oct 24, 2025
Text → VideoImage → Video
LongCat-Video
LongCat-Video
Thursday, October 23, 2025

1 release

Meituan/Any-to-Any

Meituan Debuts LongCat-Flash-Omni, an Any-to-Any AI Model

The new open-source Mixture-of-Experts model can process and generate any combination of text, images, video, audio, and 3D data.

Oct 23, 2025
Any-to-Any
LongCat-Flash-Omni
LongCat-Flash-Omni
Wednesday, October 22, 2025

2 releases

NVIDIA/Speech → Text

NVIDIA Releases Real-Time Speaker Diarization Model

The new Sortformer-based model is designed for streaming audio, identifying up to four distinct speakers in real time.

Oct 22, 2025
Speech → Text
Streaming Sortformer Diarization 4spk v2.1
Streaming Sortformer Diarization 4spk v2.1
MiniMax/Text / LLMMajor release

MiniMax Releases M2, an Open-Weight MoE for Agents

The Shanghai-based AI startup has released a new Mixture-of-Experts model focused on complex reasoning, coding, and agentic tasks.

Oct 22, 2025
Text / LLMReasoning
MiniMax-M2
MiniMax-M2
Tuesday, October 21, 2025

1 release

Datalab/Vision-Language

Datalab Releases Chandra, a New OCR Vision Model

The new vision-language model from Datalab is fine-tuned from Qwen2-VL to specialize in extracting text and structure from complex documents.

Oct 21, 2025
Vision-Language
Chandra OCR
Chandra OCR
Saturday, October 18, 2025

2 releases

Kuaishou/Any-to-Any

Kling Releases UniVideo for Generation and Understanding

The new open-source model combines both video generation and comprehension, a rare dual capability built on the Qwen2.5 vision-language foundation.

Oct 18, 2025
Any-to-AnyText → Video
UniVideo
UniVideo
Maya Research/Text → Speech

Maya Research Releases Maya1, an Expressive TTS Model

The new Apache 2.0 licensed model uses a Llama-based architecture to generate more natural and emotionally nuanced speech from text.

Oct 18, 2025
Text → Speech
Maya1
Maya1
Friday, October 17, 2025

1 release

DeepSeek/Vision-LanguageMajor release

DeepSeek-OCR Tackles Document Parsing with Vision AI

The new vision-language model uses a novel context compression technique to efficiently extract text and structure from complex documents.

Oct 17, 2025
Vision-Language
DeepSeek-OCR
DeepSeek-OCR
Thursday, October 16, 2025

1 release

Baidu/Vision-Language

Baidu Releases PaddleOCR-VL for Document AI

The new vision-language model is fine-tuned to understand not just text, but the complex structure of tables, charts, and formulas.

Oct 16, 2025
Vision-Language
PaddleOCR-VL
PaddleOCR-VL
Wednesday, October 15, 2025

1 release

NVIDIA/Speech → Text

NVIDIA's Parakeet ASR Tackles Multi-Speaker Audio

The 600-million-parameter model offers real-time speech-to-text with speaker diarization, built on the efficient FastConformer architecture.

Oct 15, 2025
Speech → Text
Multitalker Parakeet Streaming 0.6B
Multitalker Parakeet Streaming 0.6B
Tuesday, October 14, 2025

1 release

inclusionAI/Any-to-Any

inclusionAI Debuts 'Any-to-Any' Multimodal MoE Model

The new Ming-flash-omni-Preview aims to handle any combination of data modalities using an efficient Mixture of Experts architecture.

Oct 14, 2025
Any-to-Any
Ming-flash-omni-Preview
Ming-flash-omni-Preview
Saturday, October 11, 2025

1 release

Qwen · Alibaba/Vision-Language

Alibaba Releases Qwen3-VL, an 8B Open-Source Vision Model

The latest vision-language model from the popular Qwen series is instruction-tuned and available under an Apache 2.0 license.

Oct 11, 2025
Vision-Language
Qwen3-VL-8B-Instruct
Qwen3-VL-8B-Instruct
Wednesday, October 8, 2025

3 releases

Google DeepMind/Text / LLM

Google Releases Compact FunctionGemma Model

The new 270-million-parameter model from Google DeepMind is fine-tuned specifically for reliable function calling and tool use.

Oct 8, 2025
Text / LLM
FunctionGemma 270M IT
FunctionGemma 270M IT
EPFL VITA/Image → Video

EPFL Releases SVI for Streaming Image-to-Video

The new open-source model from Swiss researchers uses a novel chunking method to generate indefinitely long videos from a single still image.

Oct 8, 2025
Image → Video
SVI
SVI
Krea/Text → Video

Krea Releases Open-Source Real-Time Video Model

The new 14-billion-parameter model is a distilled, more efficient version of a larger foundation, designed for interactive video generation.

Oct 8, 2025
Text → Video
Krea Realtime Video
Krea Realtime Video
Tuesday, September 30, 2025

4 releases

inclusionAI/Any-to-Any

inclusionAI Releases Ming-UniVision MoE Multimodal Model

The new 16-billion-parameter model uses a sparse Mixture-of-Experts design to efficiently handle 'any-to-any' data combinations, from text to images.

Sep 30, 2025
Any-to-AnyVision-Language
Ming-UniVision-16B-A3B
Ming-UniVision-16B-A3B
Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases 30B MoE Vision Model, Qwen3-VL

The new open-source model from Alibaba uses a Mixture-of-Experts architecture to make its powerful vision-language capabilities more efficient to run.

Sep 30, 2025
Vision-LanguageAny-to-Any
Qwen3-VL-8B-Instruct
Qwen3-VL-8B-Instruct
nineninesix/Text → Speech

Kani TTS 370M Offers Compact Multilingual Speech

Based on the Language-Free Modeling for Multilingual Text-To-Speech (LFM2) architecture, the new model offers an efficient solution for developers.

Sep 30, 2025
Text → Speech
Kani TTS 370M
Kani TTS 370M
chetwinlow1/Image → Video

Ovi Syncs Audio and Video in New Open-Source Model

Built on the Wan2.2 architecture, this new 5-billion-parameter model generates short video clips from a single image and simultaneously creates synchronized audio.

Sep 30, 2025
Image → Video
Ovi
Ovi
Monday, September 29, 2025

2 releases

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases Open-Weight MoE Model GLM-4.6

The new Mixture-of-Experts model is available under a permissive MIT license and is optimized for complex reasoning and coding tasks.

Sep 29, 2025
Text / LLMReasoning
GLM-4.6
GLM-4.6
inclusionAI/Any-to-Any

Ming-UniAudio Brings MoE to Unified Audio AI

A new 16-billion-parameter model from inclusionAI uses a Mixture-of-Experts architecture to handle a wide range of audio tasks efficiently.

Sep 29, 2025
Any-to-AnyText → Speech
Ming-UniAudio-16B-A3B
Ming-UniAudio-16B-A3B
Friday, September 26, 2025

1 release

ByteDance/Image → Video

ByteDance Releases Lynx for Identity-Preserving Video

The new model from the TikTok parent company generates short video clips that maintain a person's likeness from a single reference image.

Sep 26, 2025
Image → Video
Lynx
Lynx
Thursday, September 25, 2025

2 releases

Tencent/Text → Image

Tencent Debuts HunyuanImage 3.0 with MoE Design

The new text-to-image generator from the Chinese tech giant uses a Mixture-of-Experts architecture for more efficient and detailed image creation.

Sep 25, 2025
Text → Image
HunyuanImage 3.0 Instruct
HunyuanImage 3.0 Instruct
Tencent/Text → ImageMajor release

Tencent Releases HunyuanImage 3.0 Text-to-Image Model

The new text-to-image generator from the Chinese tech giant uses a Mixture-of-Experts architecture for improved efficiency and output quality.

Sep 25, 2025
Text → Image
HunyuanImage 3.0 Instruct
HunyuanImage 3.0 Instruct
Monday, September 22, 2025

1 release

Qwen · Alibaba/Image Editing

Qwen Releases Open-Source Instruction-Based Image Editor

The new model from Alibaba's Qwen team allows users to modify images using natural language prompts instead of complex tools or masks.

Sep 22, 2025
Image Editing
Qwen-Image-Edit-2509
Qwen-Image-Edit-2509
Saturday, September 20, 2025

1 release

Qwen · Alibaba/Any-to-AnyMajor release

Qwen3-Omni Arrives With Any-to-Any Multimodality

The new 30B Mixture-of-Experts model from Alibaba's Qwen team can process and generate content across text, image, and audio formats.

Sep 20, 2025
Any-to-AnyVision-Language
Qwen3-Omni-30B-A3B-Instruct
Qwen3-Omni-30B-A3B-Instruct
Thursday, September 18, 2025

1 release

Xiaomi/Any-to-Any

Xiaomi's MiMo-Audio 7B Tackles Complex Speech Tasks

This new instruction-tuned model from Xiaomi can handle a flexible combination of audio and text inputs and outputs, from transcription to voice synthesis.

Sep 18, 2025
Any-to-AnyText → Speech
MiMo-Audio-7B-Instruct
MiMo-Audio-7B-Instruct
Tuesday, September 16, 2025

1 release

OpenBMB/Text → Speech

OpenBMB Releases VoxCPM for Open Voice Synthesis

The new 500-million-parameter model offers high-quality text-to-speech and zero-shot voice cloning under a permissive license.

Sep 16, 2025
Text → Speech
VoxCPM-0.5B
VoxCPM-0.5B
Monday, September 15, 2025

3 releases

Qwen · Alibaba/Any-to-Any

Qwen Releases 'Thinking' Multimodal MoE Model

The new 30-billion-parameter Mixture-of-Experts model from Alibaba's Qwen team is designed to show its reasoning process for complex multimodal tasks.

Sep 15, 2025
Any-to-AnyReasoning
Qwen3-Omni-30B-A3B-Thinking
Qwen3-Omni-30B-A3B-Thinking
Qwen · Alibaba/Any-to-Any

Qwen Releases 30B Model for Audio Captioning

The new Mixture-of-Experts model from Alibaba is fine-tuned to generate detailed, multilingual descriptions for complex audio content.

Sep 15, 2025
Any-to-AnyText → Speech
Qwen3-Omni-30B-A3B-Captioner
Qwen3-Omni-30B-A3B-Captioner
neuphonic/Text → Speech

Neuphonic Releases NeuTTS Air for On-Device AI Speech

The new Apache 2.0 text-to-speech model is built on a Qwen2 architecture and optimized for local inference with GGUF support.

Sep 15, 2025
Text → Speech
NeuTTS Air
NeuTTS Air
Thursday, September 11, 2025

1 release

moondream/Vision-Language

Moondream 3 Arrives in Preview Release

The next generation of the efficient, open-source vision-language model is now available for early testing and feedback.

Sep 11, 2025
Vision-Language
Moondream 3 (preview)
Moondream 3 (preview)
Wednesday, September 10, 2025

2 releases

ByteDance/Image → Video

ByteDance Releases HuMo for Human Video Generation

The new open-source model specializes in creating realistic videos of people, separating appearance from motion for greater control.

Sep 10, 2025
Image → Video
HuMo
HuMo
Qwen · Alibaba/Text → Video

Alibaba's Wan2.2 Adds Control to Open Video

The new 14-billion-parameter model from Alibaba's PAI team offers fine-grained control over video generation using inputs like sketches and depth maps.

Sep 10, 2025
Text → Video
Wan2.2-VACE-Fun-A14B
Wan2.2-VACE-Fun-A14B
Tuesday, September 9, 2025

2 releases

Qwen · Alibaba/Text / LLMMajor release

Qwen Releases 80B Mixture-of-Experts Model

The new Qwen3-Next model from Alibaba combines a large parameter count with an efficient MoE architecture to balance performance and computational cost.

Sep 9, 2025
Text / LLM
Qwen3-Next-80B-A3B-Instruct
Qwen3-Next-80B-A3B-Instruct
Alpha-VLLM/Any-to-Any

Lumina-DiMOO: A Diffusion Model for Any-to-Any AI

This new open-source model uses a diffusion architecture instead of a typical transformer to generate and understand a mix of media types.

Sep 9, 2025
Any-to-AnyText → Image
Lumina-DiMOO
Lumina-DiMOO
Monday, September 8, 2025

1 release

Tencent/Text → Image

Tencent SRPO Fine-Tunes SDXL with Preference Optimization

The new text-to-image model uses a novel rejection sampling technique to align Stable Diffusion XL more closely with human aesthetic preferences.

Sep 8, 2025
Text → Image
SRPO
SRPO
Friday, September 5, 2025

1 release

Tencent/Text → Image

Tencent Releases HunyuanImage 2.1 for Bilingual AI Art

The new text-to-image model from the Chinese tech giant is designed to understand both Chinese and English prompts at high resolutions.

Sep 5, 2025
Text → Image
HunyuanImage 2.1
HunyuanImage 2.1
Thursday, September 4, 2025

2 releases

Microsoft/Text → Speech

Microsoft Releases VibeVoice, a 7B Podcast TTS Model

The new 7-billion-parameter model is designed for generating long-form, multi-speaker audio in English and Chinese under a permissive MIT license.

Sep 4, 2025
Text → Speech
VibeVoice-7B
VibeVoice-7B
Microsoft/Text → Speech

Microsoft Releases VibeVoice, a Podcast-Ready TTS Model

The new open-source model specializes in generating long-form, multi-speaker audio in both English and Mandarin, mimicking a natural podcast conversation.

Sep 4, 2025
Text → Speech
VibeVoice Large
VibeVoice Large
Thursday, August 28, 2025

1 release

StepFun/Any-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025
Any-to-AnyText → Speech
Step-Audio 2 mini
Step-Audio 2 mini
Wednesday, August 27, 2025

1 release

Tencent/Image → Video

Tencent's Voyager Model Turns Images into 3D Worlds

The new model from Tencent AI Lab generates temporally and spatially consistent video sequences from a single image, enabling virtual exploration of static scenes.

Aug 27, 2025
Image → VideoText → 3D
HunyuanWorld-Voyager
HunyuanWorld-Voyager
Monday, August 25, 2025

2 releases

Microsoft/Text → Speech

Microsoft Releases VibeVoice for Long-Form Audio

The new 1.5-billion-parameter text-to-speech model is designed to generate natural, multi-speaker audio for podcasts and other long-form content.

Aug 25, 2025
Text → Speech
VibeVoice-1.5B
VibeVoice-1.5B
Qwen · Alibaba/Image → Video

Alibaba Releases 14B Model for Audio-Driven Video

The new Wan2.2-S2V model takes a still image and a speech track to generate a realistic talking-head animation, available under a permissive license.

Aug 25, 2025
Image → Video
Wan2.2-S2V-14B
Wan2.2-S2V-14B
Sunday, August 24, 2025

1 release

OpenBMB/Vision-Language

OpenBMB Releases Compact Multimodal Model MiniCPM-V 4.5

The new vision-language model from the open-source research group demonstrates strong OCR and video understanding capabilities in a small package.

Aug 24, 2025
Vision-Language
MiniCPM-V 4.5
MiniCPM-V 4.5
Tuesday, August 19, 2025

1 release

DeepSeek/Text / LLMMajor release

DeepSeek Releases 671B MoE Model Under MIT License

The new DeepSeek-V3.1-Base is a massive 671-billion-parameter Mixture-of-Experts model designed for efficient, large-scale research and development.

Aug 19, 2025
Text / LLMReasoning
DeepSeek-V3.1-Base
DeepSeek-V3.1-Base
Sunday, August 17, 2025

1 release

Qwen · Alibaba/Image EditingMajor release

Qwen Releases Open Model for Image Editing

The new open-source model from Alibaba lets users edit images with simple text commands in both English and Chinese.

Aug 17, 2025
Image Editing
Qwen-Image-Edit
Qwen-Image-Edit
Friday, August 15, 2025

1 release

NexaAI/Any-to-Any

NexaAI Releases OmniNeural-4B for On-Device AI

The new 4-billion-parameter model is designed for 'any-to-any' multimodal tasks and optimized to run efficiently on mobile hardware.

Aug 15, 2025
Any-to-Any
OmniNeural-4B
OmniNeural-4B
Wednesday, August 13, 2025

1 release

Tencent/Image → Video

Tencent Releases Controllable Game Video Model

The new Hunyuan-GameCraft 1.0 is an open image-to-video model that generates interactive game-like scenes with precise camera control.

Aug 13, 2025
Image → Video
Hunyuan-GameCraft 1.0
Hunyuan-GameCraft 1.0
Tuesday, August 12, 2025

1 release

FrancisRing/Image → Video

StableAvatar Brings Open Source Talking Heads to Life

A new diffusion-based model from developer FrancisRing animates still images into talking avatars using only an audio track.

Aug 12, 2025
Image → Video
StableAvatar
StableAvatar
Sunday, August 10, 2025

1 release

Zhipu AI/Vision-LanguageMajor release

Zhipu AI Releases Open Vision Model GLM-4.5V

The new Mixture-of-Experts model offers strong multimodal reasoning capabilities under a permissive MIT license.

Aug 10, 2025
Vision-LanguageReasoning
GLM-4.5V
GLM-4.5V
Friday, August 8, 2025

1 release

Skywork/Image → Video

Skywork Releases Open 'World Model' for Playable Video

The new 1.3-billion-parameter model functions as an interactive 'world model,' generating controllable video scenes from a single static image.

Aug 8, 2025
Image → Video
Matrix-Game 2.0
Matrix-Game 2.0
Tuesday, August 5, 2025

1 release

Google DeepMind/Text / LLM

Google Releases Gemma 3 270M for On-Device AI

The new ultra-compact model from DeepMind is designed for efficient performance in resource-constrained environments like mobile and web.

Aug 5, 2025
Text / LLM
Gemma 3 270M
Gemma 3 270M
Monday, August 4, 2025

4 releases

OpenAI/ReasoningMajor release

OpenAI Releases 21B Open-Weight MoE Model

The new `gpt-oss-20b` is an Apache 2.0-licensed Mixture-of-Experts model designed to run efficiently on consumer-grade hardware.

Aug 4, 2025
ReasoningText / LLM
gpt-oss-20b
gpt-oss-20b
OpenAI/ReasoningMajor release

OpenAI Releases Its First Open-Source MoE Model

The new 117-billion-parameter `gpt-oss-120b` is a Mixture-of-Experts model focused on reasoning, released under a permissive Apache 2.0 license.

Aug 4, 2025
ReasoningText / LLM
gpt-oss-20b
gpt-oss-20b
NVIDIA/Speech → Text

NVIDIA Releases Canary 1B v2 Multilingual Speech Model

The new 1-billion-parameter model handles both transcription and translation across five languages using the company's efficient FastConformer architecture.

Aug 4, 2025
Speech → Text
Canary 1B v2
Canary 1B v2
NVIDIA/Speech → Text

NVIDIA Releases 600M Parakeet for Speech Recognition

The new FastConformer model uses a specialized training technique to improve transcription accuracy in noisy, real-world environments.

Aug 4, 2025
Speech → Text
Parakeet TDT 0.6B v3
Parakeet TDT 0.6B v3
Saturday, August 2, 2025

1 release

Qwen · Alibaba/Text → ImageMajor release

Qwen releases open model for text-in-image generation

The new Apache 2.0 diffusion model from Alibaba's Qwen team focuses on accurately rendering both English and Chinese characters within generated images.

Aug 2, 2025
Text → Image
Qwen-Image
Qwen-Image
Thursday, July 31, 2025

1 release

Qwen · Alibaba/Code

Qwen Releases Compact 30B MoE for Coding Agents

The new Apache 2.0 model from Alibaba's Qwen team uses a Mixture-of-Experts architecture to deliver strong performance with only 3B active parameters.

Jul 31, 2025
CodeText / LLM
Qwen3-Coder-30B-A3B-Instruct
Qwen3-Coder-30B-A3B-Instruct
Wednesday, July 30, 2025

1 release

rednote-hilab/Vision-Language

New VLM `dots.ocr` Takes on Complex Documents

The new 3B-parameter model from rednote-hilab uses a vision-language approach to parse tables, layouts, and even mathematical formulas.

Jul 30, 2025
Vision-Language
dots.ocr
dots.ocr
Tuesday, July 29, 2025

1 release

Skywork/Any-to-Any

Skywork Releases UniPic, a Unified 1.5B Vision Model

The new autoregressive model from the Chinese AI lab can understand, generate, and edit images within a single, compact framework.

Jul 29, 2025
Any-to-AnyText → Image
Skywork-UniPic-1.5B
Skywork-UniPic-1.5B
Monday, July 28, 2025

3 releases

Qwen · Alibaba/Image → VideoMajor release

Alibaba Releases Wan2.2, a 14B MoE Video Model

The new open-source diffusion model from the team behind Qwen uses a Mixture-of-Experts architecture to animate still images.

Jul 28, 2025
Image → Video
Wan2.2-I2V-A14B
Wan2.2-I2V-A14B
Tencent/Text → Video

Tencent Releases Wan2.2, a 14B MoE Video Model

The new Apache 2.0-licensed generator uses a Mixture-of-Experts architecture and is available in the popular Diffusers library format for easier integration.

Jul 28, 2025
Text → Video
Wan2.2 T2V A14B
Wan2.2 T2V A14B
Qwen · Alibaba/Text → Video

Qwen Releases Wan2.2, a 5B Open-Source Video Model

The new Apache 2.0 licensed model from Alibaba's team generates video from either text prompts or still images, offering a unified approach in a compact package.

Jul 28, 2025
Text → VideoImage → Video
Wan2.2-TI2V-5B
Wan2.2-TI2V-5B
Thursday, July 24, 2025

2 releases

Qwen · Alibaba/Text → Video

Qwen Unveils Wan2.2, a 14B Open Text-to-Video Model

The new Apache 2.0-licensed model from Alibaba's team uses a Mixture-of-Experts architecture for efficient, high-quality video generation.

Jul 24, 2025
Text → Video
Wan2.2 T2V A14B
Wan2.2 T2V A14B
Qwen · Alibaba/Image → Video

Qwen Releases Wan2.2, a 14B Image-to-Video Model

The new 14-billion parameter model from Alibaba's AI team uses a Mixture-of-Experts design and is available under the permissive Apache 2.0 license.

Jul 24, 2025
Image → Video
Wan2.2-I2V-A14B
Wan2.2-I2V-A14B
Tuesday, July 22, 2025

1 release

Qwen · Alibaba/CodeMajor release

Qwen Releases 480B Open-Source Model for Code Agents

The new flagship coding model from Alibaba's Qwen team uses a massive Mixture-of-Experts architecture and is released under a permissive Apache-2.0 license.

Jul 22, 2025
CodeText / LLM
Qwen3-Coder-30B-A3B-Instruct
Qwen3-Coder-30B-A3B-Instruct
Sunday, July 20, 2025

1 release

Zhipu AI/ReasoningMajor release

Z.ai Releases 355B Parameter GLM-4.5 Under MIT License

The new Mixture-of-Experts model combines massive scale with a fully permissive license, targeting complex reasoning and agentic applications.

Jul 20, 2025
ReasoningText / LLM
GLM-4.5
GLM-4.5
Friday, July 18, 2025

1 release

Qwen · Alibaba/Text → VideoMajor release

Qwen Releases Wan 2.2, a 5B Open Video AI Model

The new Apache 2.0 licensed model from Alibaba's team can generate video from both text and image prompts, adding a powerful new tool to the open-source creative ecosystem.

Jul 18, 2025
Text → VideoImage → Video
Wan2.2-TI2V-5B
Wan2.2-TI2V-5B
Wednesday, July 16, 2025

1 release

HiDream.ai/Image Editing

HiDream.ai Releases 17B Open Image Editing Model

The new MIT-licensed model, HiDream-E1.1, allows for complex image modifications by following natural language instructions.

Jul 16, 2025
Image Editing
HiDream-E1.1
HiDream-E1.1
Tuesday, July 15, 2025

1 release

inclusionAI/Any-to-Any

Ming-Lite-Omni 1.5 Brings Any-to-Any Modality to Open Source

The new MIT-licensed model from inclusionAI can process and generate a mix of text, images, audio, and video, pushing the boundaries of open multimodal AI.

Jul 15, 2025
Any-to-Any
Ming-Lite-Omni 1.5
Ming-Lite-Omni 1.5
Monday, July 14, 2025

2 releases

RaphaelLiu/Image → Video

Pusa V1: A New Open Model for Image-to-Video Animation

Based on the Wan2.1 architecture, this new 14B parameter model offers fine-grained control over video generation from still images and text.

Jul 14, 2025
Image → VideoText → Video
Pusa V1
Pusa V1
T-Tech/Speech → Text

T-Tech Releases T-one for Russian Speech Recognition

The new streaming Conformer model from the Russian digital bank is optimized for real-time transcription of telephone conversations.

Jul 14, 2025
Speech → Text
T-one
T-one
Friday, July 11, 2025

1 release

Moonshot AI/Text / LLMMajor release

Moonshot AI Releases Trillion-Parameter Kimi-K2 Model

The new Mixture-of-Experts model brings massive scale to the open-weights community, focusing on complex reasoning and coding tasks with a 128K context window.

Jul 11, 2025
Text / LLMReasoning
Kimi-K2-Instruct
Kimi-K2-Instruct
Monday, July 7, 2025

1 release

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases FLUX.1 Krea Image Model

The new 12-billion-parameter model, tuned by creative AI platform Krea, focuses on high-quality aesthetic output and prompt fidelity.

Jul 7, 2025
Text → Image
FLUX.1 Krea [dev]
FLUX.1 Krea [dev]
Wednesday, July 2, 2025

1 release

ByteDance/Any-to-Any

ByteDance Releases Tar-7B for 'Any-to-Any' Multimodality

The new 7-billion-parameter model from the company's SEED team can process and generate a mix of text, images, audio, and video in a single unified framework.

Jul 2, 2025
Any-to-Any
Tar-7B
Tar-7B
Tuesday, July 1, 2025

1 release

Boson AI/Text → Speech

Boson AI Releases Higgs Audio v2 for Expressive TTS

The new 3-billion-parameter model focuses on generating expressive, multilingual speech and is fully open for commercial use under an Apache 2.0 license.

Jul 1, 2025
Text → Speech
Higgs Audio v2 (3B)
Higgs Audio v2 (3B)
Monday, June 30, 2025

1 release

Kyutai/Text → Speech

Kyutai Releases 1.6B Bilingual TTS Model

The French AI lab's new open-source model generates streaming audio in English and French under a permissive license.

Jun 30, 2025
Text → Speech
Kyutai TTS 1.6B (en/fr)
Kyutai TTS 1.6B (en/fr)
Saturday, June 28, 2025

2 releases

Zhipu AI/Vision-Language

Zhipu AI Open-Sources 9B Vision Model with 'Thinking' Mode

The new GLM-4.1V-9B-Thinking model makes its vision and chain-of-thought reasoning capabilities available under a permissive MIT license.

Jun 28, 2025
Vision-LanguageReasoning
GLM-4.1V-9B-Thinking
GLM-4.1V-9B-Thinking
AIDC-AI/Any-to-Any

Ovis-U1-3B Unifies Image Understanding and Generation

The new 3-billion-parameter model from AIDC-AI combines vision-language understanding and image generation into a single 'any-to-any' framework.

Jun 28, 2025
Any-to-AnyVision-Language
Ovis-U1-3B
Ovis-U1-3B
Thursday, June 26, 2025

1 release

NVIDIA/Speech → Text

NVIDIA Fuses LLM and ASR in Canary-Qwen 2.5B Model

The 2.5 billion-parameter speech model combines a FastConformer encoder with a Qwen LLM decoder, a hybrid approach to transcription.

Jun 26, 2025
Speech → Text
Canary-Qwen 2.5B
Canary-Qwen 2.5B
Tuesday, June 24, 2025

1 release

Maya Research/Text → Speech

Veena TTS Model Targets Indian Languages with Llama Base

Maya Research has released a 3-billion-parameter model designed to generate natural-sounding speech in Hindi and English.

Jun 24, 2025
Text → Speech
Veena
Veena
Monday, June 23, 2025

1 release

FreedomIntelligence/Any-to-Any

Janus-4o-7B Adds Image Generation to 7B Multimodal AI

The new 7-billion-parameter model from FreedomIntelligence can process various inputs and generate or edit images based on text prompts.

Jun 23, 2025
Any-to-AnyText → Image
Janus-4o-7B
Janus-4o-7B
Wednesday, September 25, 2024

1 release

Unknown/Text → Image

Illustrious-XL v0.1 Arrives as Anime Image Base

A 3.5B-parameter SDXL derivative aims to become the go-to foundation for anime-style text-to-image fine-tunes.

Sep 25, 2024
Text → Image
Illustrious-XL v0.1
Friday, August 2, 2024

1 release

Black Forest Labs/Text → ImageMajor release

FLUX.1 Dev Becomes the Open-Weight Base to Beat

Black Forest Labs' 12B rectified-flow model has quietly emerged as the default foundation for open image generation.

Aug 2, 2024
Text → Image
FLUX.1 Dev
FLUX.1 Dev