Release timeline

DeepSeek/Text / LLM

DeepSeek Ships V4-Flash, a 304B MoE Tuned for Agents

The latest checkpoint in DeepSeek's V4 line leans into agentic workflows while keeping the permissive MIT license.

Jul 31, 2026

DeepSeek/Text / LLMMajor release

DeepSeek Refreshes V4-Flash With New 0731 Checkpoint

The MIT-licensed mixture-of-experts model returns in an updated build shipping with FP8 weights for cheaper inference.

Jul 31, 2026

Wednesday, July 29, 2026

1 release

LGAI EXAONE/Text / LLM

LG AI Research debuts K-EXAONE 2.0, a 750B MoE model

The new mixture-of-experts model activates 37B parameters per token and targets English, Korean, and Spanish reasoning tasks.

Jul 29, 2026

Tuesday, July 28, 2026

3 releases

MiniMax/Text → Video

MiniMax Releases H3 Video Model on Hugging Face

The company's new diffusion model handles text-to-video and image-to-video, with support for joint audio-video generation.

Jul 28, 2026

Skt/Text / LLM

SK Telecom Releases A.X-K2 Multilingual LLM

The Korean telecom carrier's latest open language model targets English, Korean, Chinese, Japanese, and Spanish under a permissive license.

Jul 28, 2026

Audio8/Text → Speech

Audio8 debuts a 0.6B multilingual zero-shot TTS preview

The compact text-to-speech model promises voice cloning across languages from a footprint small enough to run without heavy hardware.

Jul 28, 2026

Monday, July 27, 2026

4 releases

Thinkingmachines/Vision-Language

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

LiquidAI/Embeddings

Liquid AI's LFM2.5 encoder targets fast CPU inference

A 230M-parameter bidirectional encoder built for long-context English and German embeddings without a GPU.

Jul 27, 2026

LiquidAI/Embeddings

Liquid AI ships a 350M encoder built for CPUs

The compact LFM2.5 encoder targets fast, long-context text embeddings without a GPU.

Jul 27, 2026

KRAFTON/Any-to-Any

KRAFTON releases A.X-K2 Raon speech MoE model

The game maker's new open model blends text-to-speech and speech recognition in a single 21B mixture-of-experts system with just 3B active parameters.

Jul 27, 2026

Sunday, July 26, 2026

1 release

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Text / LLM Vision-Language

Friday, July 24, 2026

2 releases

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026

Microsoft/Speech → Text

Microsoft's VibeVoice ASR Goes BitNet for CPU Speech

A BitNet-quantized speech recognition model trades GPU dependence for efficient CPU inference in English and Chinese.

Jul 24, 2026

Thursday, July 23, 2026

2 releases

Amd/Reasoning

AMD's Instella-MoE Brings Reasoning to ROCm Hardware

A new open mixture-of-experts model with 16B total parameters and just 3B active is tuned to run on AMD's own accelerator stack.

Jul 23, 2026

Kwaipilot/Code

Kwaipilot Releases KAT-Coder V2.5 Dev, an Agentic MoE Coder

Kuaishou's coding team ships an open mixture-of-experts model built on the Qwen3.5 MoE architecture and tuned for agentic development work.

Jul 23, 2026

Wednesday, July 22, 2026

1 release

Upstage/Text / LLM

Upstage's Solar Open2 arrives as a 250B MoE model

The Korean AI firm's latest open release scales to 250 billion parameters with a mixture-of-experts design tuned for English and Korean.

Jul 22, 2026

Tuesday, July 21, 2026

1 release

Microsoft/Text → Image

Microsoft's Mage-Flow packs image editing into 4B

A compact model handles both text-to-image generation and instruction-based edits at native resolution, under a permissive MIT license.

Jul 21, 2026

Monday, July 20, 2026

1 release

Motif Technologies/Text / LLM

Motif Technologies debuts Motif 3 Beta, an MoE model

The Korean AI lab's preview release is a mixture-of-experts language model built for long-context, multilingual work.

Jul 20, 2026

Friday, July 17, 2026

1 release

Microsoft/Vision-Language

Microsoft's Fara1.5-27B targets computer-use agents

A 27B-parameter vision-language model built to drive browsers and desktop apps like a human operator.

Jul 17, 2026

Thursday, July 16, 2026

5 releases

NVIDIA/Any-to-Any

NVIDIA's Audio-Visual Flamingo Fuses Sound and Sight

A fully open multimodal model aims to reason jointly across audio, images, and long-form video.

Jul 16, 2026

Unknown/Text / LLM

German consortium releases open 30B model Soofi S

A collaborative European effort ships a dense 30-billion-parameter model that claims top marks on both English and German benchmarks.

Jul 16, 2026

NVIDIA/Embeddings

NVIDIA's Nemotron-3-Embed 8B tops RTEB retrieval test

The 8-billion-parameter text embedding model claims the number one overall spot on the RTEB benchmark, with an eye toward agentic retrieval.

Jul 16, 2026

Internlm/Vision-Language

InternLM Previews 397B Vision-Language Model

The Intern-S2 preview arrives as a very large multimodal system under a permissive Apache-2.0 license.

Jul 16, 2026

inclusionAI/Text / LLM

inclusionAI ships LLaDA2.2-flash diffusion LLM

A new Apache-2.0 mixture-of-experts model that generates text through diffusion rather than left-to-right decoding.

Jul 16, 2026

Wednesday, July 15, 2026

2 releases

Thinkingmachines/Any-to-AnyMajor release

Thinking Machines Lab debuts Inkling, its first open model

The lab's inaugural open-weights release is a mixture-of-experts system that takes image and audio inputs, shipped under a permissive Apache 2.0 license.

Jul 15, 2026

Nyralabs/Speech → Text

CrisperWhisper 2.0 Large targets verbatim transcription

A Whisper-based ASR model that keeps every filler word and stamps timestamps to the individual word, now covering English and German.

Jul 15, 2026

Tuesday, July 14, 2026

3 releases

NVIDIA/Embeddings

NVIDIA's Nemotron 3 Embed tops the RTEB leaderboard

A compact 1B-parameter text embedding model claims the top overall spot on a retrieval benchmark aimed at reflecting real-world use.

Jul 14, 2026

OpenMOSS/Vision-Language

OpenMOSS Debuts MOSS-VL-Realtime for Live Video

The Chinese research group's new vision-language model targets streaming understanding of video and images rather than static frames.

Jul 14, 2026

Ai Sage/Speech → Text

SberDevices releases GigaAM Multilingual ASR model

An MIT-licensed speech recognition model targeting Russian, English, and Kazakh arrives on Hugging Face.

Jul 14, 2026

Monday, July 13, 2026

4 releases

inclusionAI/Reasoning

inclusionAI's Ring-Zero Scales Zero-RL to a Trillion Parameters

A new mixture-of-experts model learns to reason through reinforcement learning alone, without human-annotated chains of thought.

Jul 13, 2026

Unknown/Any-to-Any

Boogu-Image-0.1 Brings Unified Multimodal to Open Source

A new Apache-licensed model family folds bilingual text-to-image generation and instruction editing into one system.

Jul 13, 2026

Poolside/Code

Poolside releases Laguna-S-2.1 coding model

The AI coding startup puts a version of its Laguna family on Hugging Face under the permissive OpenMDW license.

Jul 13, 2026

ATH MaaS/Vision-Language

Alibaba's OvisOCR2 turns page images into Markdown

A compact 0.8B vision-language model aims to parse full documents—text, tables, and formulas—in a single pass.

Jul 13, 2026

Sunday, July 12, 2026

1 release

Qwen · Alibaba/Music

Qwen Enters Music Generation With Qwen-Music

Alibaba's Qwen team debuts a text-to-song model that produces high-fidelity tracks complete with vocals.

Jul 12, 2026

Music

Friday, July 10, 2026

1 release

Qwen · Alibaba/Image → Video

Wan-Dancer-14B turns still images into dance videos

Alibaba's Wan team releases an Apache-2.0 image-to-video model built for music-driven dance generation.

Jul 10, 2026

Wednesday, July 8, 2026

1 release

robbyant/Text → Video

LingBot-Video puts a 30B MoE behind embodied AI video

A DiT-based mixture-of-experts model activates just 3B parameters per step and ships under an Apache 2.0 license.

Jul 8, 2026

Monday, July 6, 2026

1 release

NVIDIA/Any-to-Any

NVIDIA's Audex Unifies Audio Understanding and Speech

A new 30B mixture-of-experts model from NVIDIA handles both listening and speaking within a single audio-text architecture.

Jul 6, 2026

Sunday, July 5, 2026

1 release

Ai Sage/Text / LLM

GigaChat 3.5 arrives as a 432B mixture-of-experts model

The multilingual instruct model activates 28B parameters per token and leans on hybrid attention for efficiency at scale.

Jul 5, 2026

Saturday, July 4, 2026

1 release

Prism Ml/Text / LLM

Bonsai-27B Brings 1-Bit Quantization to Local Inference

A ternary-weight 27B model with hybrid attention aims to run large-model reasoning on everyday hardware.

Jul 4, 2026

Thursday, July 2, 2026

1 release

Tencent/Text / LLM

Tencent releases Hunyuan Hy3 under Apache 2.0

The company's latest mixture-of-experts model arrives as an openly licensed conversational LLM on Hugging Face.

Jul 2, 2026

Wednesday, July 1, 2026

9 releases

NVIDIA/Text → Video

NVIDIA's Cosmos 3 Edge Brings World Models Closer

A new edge-optimized variant of NVIDIA's Cosmos world-model line aims to run generative video where the compute lives.

Jul 1, 2026

NVIDIA/Text → Image

NVIDIA distills Qwen-Image for few-step generation

A DMD2-distilled build of Qwen-Image trades sampling steps for speed while keeping the original model's output profile.

Jul 1, 2026

Google DeepMind/Any-to-AnyMajor release

Google DeepMind's Gemma 4 Goes Multimodal and MoE

The new open-weights family adds a mixture-of-experts design, encoder-free multimodal inputs, and an optional thinking mode.

Jul 1, 2026

Mistral AI/Text / LLM

Mistral's Leanstral 1.5 puts 119B in a lean MoE

The new Apache-2.0 mixture-of-experts model activates just 6B parameters per token, trading raw density for cheaper inference.

Jul 1, 2026

Soofi Project/Text / LLM

German Consortium Debuts Soofi S, an Open 30B MoE Model

A Mamba-2 mixture-of-experts model claims top marks in both English and German benchmarks.

Jul 1, 2026

Mistral AI/Text / LLM

Liquid AI's LFM2.5-230M targets phones and robots

A 230-million-parameter model built to run on constrained hardware like Raspberry Pi and edge robotics.

Jul 1, 2026

IBM/Text / LLM

Liquid AI's LFM2.5 230M targets phones and robots

A 230-million-parameter language model built to run locally on constrained hardware like the Raspberry Pi.

Jul 1, 2026

NVIDIA/Text / LLM

Liquid AI's LFM2.5-230M targets phones and robots

A 230-million-parameter language model built to run on hardware as modest as a Raspberry Pi.

Jul 1, 2026

Open Gigaai/Image → Video

GigaAI Releases Giga-World-1 Under Apache 2.0

An open image-to-video world model aims to bridge physically grounded generation and robot policy learning.

Jul 1, 2026

Tuesday, June 30, 2026

3 releases

Microsoft/Vision-Language

Microsoft previews GELab-Zero-4B, a compact GUI agent

The 4-billion-parameter vision-language model targets on-screen and mobile automation, built atop Qwen3-VL.

Jun 30, 2026

MuScriptor/Music

MuScriptor Large Turns Real Music Into MIDI

A new open model tackles multi-instrument transcription of real audio mixes, converting songs directly into editable MIDI.

Jun 30, 2026

Music

Meituan/Text / LLM

Meituan releases LongCat-2.0 language model

The Chinese delivery giant continues its push into open AI with a new text model on Hugging Face.

Jun 30, 2026

Monday, June 29, 2026

2 releases

Google DeepMind/Any-to-Any

Google DeepMind Releases TabFM for Tabular Data

A new foundation model brings zero-shot, in-context learning to classification and regression on structured tables.

Jun 29, 2026

SenseTime/Any-to-Any

SenseTime's SenseNova-Vision-7B-MoT Goes Any-to-Any

A single 7B model from SenseTime folds vision-language understanding, image generation, editing, and perception into one system.

Jun 29, 2026

Saturday, June 27, 2026

2 releases

DeepSeek/Text / LLMMajor release

DeepSeek Releases V4-Pro, an MIT-Licensed MoE Model

The new flagship arrives as a mixture-of-experts system with FP8 weights and open reasoning capabilities under a permissive license.

Jun 27, 2026

DeepSeek/Text / LLM

DeepSeek Releases V4-Flash for Low-Latency Inference

A lighter, faster member of DeepSeek's V4 line arrives on Hugging Face under a permissive MIT license.

Jun 27, 2026

Thursday, June 25, 2026

1 release

Deepreinforce Ai/Text / LLM

DeepReinforce Releases Ornith 1.0, a 35B Reasoning Model

The new dense model ships in GGUF format under a permissive MIT license, aimed at local and self-hosted deployment.

Jun 25, 2026

Wednesday, June 24, 2026

3 releases

LiquidAI/Text / LLM

Liquid AI's LFM2.5-230M targets on-device language tasks

A 230-million-parameter multilingual model built to run efficiently at the edge rather than in the cloud.

Jun 24, 2026

NVIDIA/Text / LLM

NVIDIA's Nemotron 3 Puzzle Runs Big on a Lean Budget

A 75-billion-parameter mixture-of-experts reasoning model that activates just 9 billion parameters per token.

Jun 24, 2026

NVIDIA/Text / LLM

NVIDIA's Nemotron-3 Puzzle Brings a Lean MoE to Reasoning

The 75B-parameter model activates just 9B per token and ships in NVIDIA's NVFP4 format for efficient inference.

Jun 24, 2026

Tuesday, June 23, 2026

1 release

Deepreinforce Ai/Text / LLM

DeepReinforce debuts Ornith-1.0, a 397B MoE model

The flagship of a new open model family arrives under a permissive MIT license, with reasoning among its stated strengths.

Jun 23, 2026

Monday, June 22, 2026

5 releases

Tencent/Image Editing

Tencent's Moebius packs inpainting into 0.2B params

A lightweight image-editing framework claims results rivaling 10B-scale models, and it's already running in the browser.

Jun 22, 2026

Qwen · Alibaba/Text / LLM

Qwen's AgentWorld Simulates Worlds for AI Agents

Alibaba's new MoE model acts as a language world model, generating the environments that agents act within.

Jun 22, 2026

Baidu/Vision-Language

Baidu's PP-OCRv6 packs 50-language OCR into tiny models

The latest release of PaddlePaddle's optical character recognition suite spans models from 1.5M to 34.5M parameters under an Apache 2.0 license.

Jun 22, 2026

nineninesix/Text → Speech

Gepard 1.0 brings bilingual TTS with voice cloning

An open, autoregressive text-to-speech model targets English and Spanish under a permissive Apache 2.0 license.

Jun 22, 2026

InternScience/Reasoning

Agents-A1: A 35B MoE Built for Agentic Scaling

InclusionAI's new mixture-of-experts model bets that agent-horizon scaling can rival far larger systems on long-running tasks.

Jun 22, 2026

Sunday, June 21, 2026

2 releases

Deepreinforce Ai/Text / LLM

DeepReinforce's Ornith-1.0-9B Targets Agentic Coding

A compact, MIT-licensed 9B model built for autonomous coding tasks arrives on Hugging Face.

Jun 21, 2026

Deepreinforce Ai/Text / LLM

Ornith-1.0-35B brings a mid-size MoE to agentic coding

An MIT-licensed mixture-of-experts model targets self-scaffolding code tasks without the footprint of a frontier system.

Jun 21, 2026

Saturday, June 20, 2026

1 release

Poolside/Code

Poolside releases Laguna XS 2.1 code model

The compact, code-focused language model arrives on Hugging Face under an open model license.

Jun 20, 2026

Friday, June 19, 2026

2 releases

Datalab To/Vision-Language

LIFT: A Qwen3.5-Based VLM for PDF-to-JSON Extraction

Datalab's new open vision-language model targets structured data extraction from documents, turning messy PDFs into clean JSON.

Jun 19, 2026

Baidu/Vision-Language

Baidu releases Unlimited-OCR under permissive MIT license

The Chinese tech giant's multilingual vision-language model targets text extraction across languages and document types.

Jun 19, 2026

Thursday, June 18, 2026

3 releases

Cohere/Speech → Text

Cohere releases Apache-licensed Arabic speech model

The Cohere Labs transcription model targets Arabic and English audio under a permissive open license.

Jun 18, 2026

Krea/Text → Image

Krea 2 Arrives as Open-Weights Text-to-Image Model

The image-generation startup releases its second-generation diffusion model in raw and turbo variants under open weights.

Jun 18, 2026

Krea/Text → ImageMajor release

Krea 2 Arrives as a 12B Open-Weights Image Model

A new text-to-image model ships with a faster Turbo variant and downloadable weights on Hugging Face.

Jun 18, 2026

Wednesday, June 17, 2026

1 release

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases MIT-Licensed GLM-5.2 MoE Model

The new bilingual model from the Chinese AI firm uses a Mixture of Experts architecture and sparse attention under a fully permissive license.

Jun 17, 2026

Tuesday, June 16, 2026

2 releases

Owensong/Text → Speech

Resemble AI's Inflect-Nano-v1 puts TTS on local hardware

An ultra-small, experimental text-to-speech model arrives under Apache 2.0, aimed at running speech synthesis directly on local machines.

Jun 16, 2026

Boogu/Image Editing

Boogu-Image 0.1 Edit arrives with Apache license

A new open-weight diffusion model for image editing ships with ComfyUI support and a permissive license that allows commercial use.

Jun 16, 2026

Text / LLM Vision-Language

Monday, June 15, 2026

1 release

Poolside/Text / LLM

Poolside Releases Laguna-M.1, an Open MoE Model

The AI coding startup steps into open weights with an Apache-2.0 mixture-of-experts model built for text and code.

Jun 15, 2026

Text / LLM Code

Sunday, June 14, 2026

1 release

Microsoft/Text / LLM

Microsoft's FastContext is a 4B sub-agent for code

A compact Qwen3-derived model built to explore repositories, released under a permissive MIT license.

Jun 14, 2026

Text / LLM Code

Saturday, June 13, 2026

1 release

Moonshot AI/Text / LLMMajor release

Moonshot AI releases Kimi K3, a 2.8T-parameter MoE model

The open-weights multimodal model leans into coding and agentic tasks, extending Moonshot's Kimi line into a new scale bracket.

Jun 13, 2026

Friday, June 12, 2026

1 release

WeiboAI/Reasoning

Weibo AI Releases VibeThinker-3B, a Compact Reasoning Model

The new 3-billion-parameter model from the Chinese tech giant focuses on challenging benchmarks in mathematics, coding, and graduate-level questions.

Jun 12, 2026

Thursday, June 11, 2026

2 releases

Moonshot AI/CodeMajor release

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026

Code Vision-Language

Zyphra/Text → Speech

Zyphra Releases Open-Source Zonos 2 TTS Model

The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.

Jun 11, 2026

Text / LLM Vision-Language

Tuesday, June 9, 2026

3 releases

Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026

Zhipu AI/Image → Video

Zhipu AI Releases SCAIL-2 for Character Animation

The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.

Jun 9, 2026

Baidu/Vision-Language

PaddleOCR's PP-OCRv6 Adds a Medium Detection Model

Baidu's open-source OCR toolkit ships an Apache-licensed text-line detector in safetensors format, tuned for a balance of accuracy and speed.

Jun 9, 2026

Friday, June 5, 2026

1 release

Cohere/Code

Cohere Releases North-Mini-Code, an Open MoE Model

The new Apache 2.0-licensed model is designed for code generation and agentic chat applications, using a Mixture-of-Experts architecture for efficiency.

Jun 5, 2026

Thursday, June 4, 2026

3 releases

Bosonai/Text → Speech

Higgs TTS 3 lands as a 4B multilingual speech model

Boson AI's new text-to-speech release aims for expressive, controllable voice synthesis across multiple languages.

Jun 4, 2026

Bosonai/Text → Speech

Boson AI releases Higgs TTS v3, a 4B speech model

The new open-weights text-to-speech system targets expressive, controllable voice generation across multiple languages with built-in voice cloning.

Jun 4, 2026

Bosonai/Text → Speech

Boson AI's Higgs Audio v3 Offers Expressive, Multilingual TTS

The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.

Jun 4, 2026

Wednesday, June 3, 2026

2 releases

Black Forest Labs/Text → Image

Ideogram 4.0 arrives as an open-weight image model

A 9.3-billion-parameter text-to-image model lands on GitHub with downloadable weights and code.

Jun 3, 2026

Stability AI/Text → Image

Ideogram 4.0 arrives as an open-weight image model

A 9.3-billion-parameter text-to-image system lands with open weights and a public GitHub home.

Jun 3, 2026

Tuesday, June 2, 2026

2 releases

MiniMax/Vision-LanguageMajor release

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026

JD/Text → Video

JD.com Enters Open-Source AI Video with JoyAI-Echo

The Chinese e-commerce giant has released a new model capable of generating long-form, multi-shot videos with synchronized audio from text prompts.

Jun 2, 2026

Saturday, May 30, 2026

1 release

Ideogram Ai/Text → Image

Ideogram 4.0: A 9.3B Open-Weight Text-to-Image Model

The new 9.3 billion parameter model uses a Diffusion Transformer architecture and excels at rendering coherent text within generated images.

May 30, 2026

Friday, May 29, 2026

1 release

Baidu/Text → Video

Baidu Releases NAVA for Text-to-Video with Audio

The new model from the Chinese tech giant uses a Multimodal Diffusion Transformer to generate synchronized audio and video from text or image prompts.

May 29, 2026

Wednesday, May 27, 2026

1 release

Stability AI/Music

Stability AI's Demon brings real-time music diffusion to local GPUs

An open-source engine generates audio on the fly at 25Hz, no cloud required.

May 27, 2026

Music

Monday, May 25, 2026

1 release

OpenMOSS/Text → Speech

MOSS-TTS Aims for More Robust Speech Synthesis

A new text-to-speech model introduces 'delay-pattern decoding' to solve common word skipping and repetition errors in parallel generation.

May 25, 2026

Saturday, May 23, 2026

2 releases

Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4 12B Multimodal Model

The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.

May 23, 2026

Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4, a 12B 'Any-to-Any' Model

The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.

May 23, 2026

Thursday, May 21, 2026

4 releases

NVIDIA/Image → Video

NVIDIA Releases Cosmos3 Image-to-Video World Model

The latest release in NVIDIA's 'world model' research family aims to generate coherent and realistic video from a single static image.

May 21, 2026

Meituan/Image → Video

Meituan releases LongCat-Video-Avatar 1.5

An audio-driven avatar model that animates still images into talking video, with support for continuation of longer clips.

May 21, 2026

OpenBMB/Text / LLM

OpenBMB's MiniCPM5-1B targets on-device AI

The compact 1B-parameter model brings long-context handling and tool-calling to phones and laptops.

May 21, 2026

MisoLabs/Text → Speech

MisoLabs Debuts MisoTTS, an Open Voice Model

The new text-to-speech system adapts the decoder-only architecture of language models like Llama to generate more natural-sounding speech.

May 21, 2026

Tuesday, May 19, 2026

2 releases

zhifeixie/Speech → Text

Mega-ASR Improves on Qwen for Speech Recognition

Researcher Zhifei Xie has released a 1.7B-parameter model that refines Alibaba's Qwen3-ASR, showing improved performance on English and Chinese transcription benchmarks.

May 19, 2026

OpenMOSS/Speech → Text

OpenMOSS Releases Transcribe-Diarize ASR Model

The open-weights team behind MOSS turns to long-form speech recognition with built-in speaker diarization and timestamps.

May 19, 2026

Monday, May 18, 2026

1 release

NVIDIA/Image → Video

NVIDIA Releases SANA, a Camera-Controllable Video Model

The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.

May 18, 2026

Friday, May 15, 2026

2 releases

NVIDIA/Speech → Text

NVIDIA Releases Nemotron-3.5 Streaming ASR Model

The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.

May 15, 2026

ByteDance/Any-to-AnyMajor release

ByteDance Releases Lance, a Unified Generative AI Model

The 3-billion-parameter model handles image and video generation, editing, and understanding from a single set of weights under a permissive license.

May 15, 2026

Thursday, May 14, 2026

1 release

SenseTime/Any-to-Any

SenseTime Releases 8B 'Any-to-Any' Infographic Model

The new 8B-parameter SenseNova U1 model from SenseTime is designed for complex multimodal tasks, including the in-conversation generation and editing of infographics.

May 14, 2026

Tuesday, May 12, 2026

2 releases

OpenBMB/Text / LLM

GLiGuard: A Sub-1B Model for Faster LLM Guardrails

The team behind GLiNER releases an open-source small language model aimed at making safety moderation cheaper and quicker to run.

May 12, 2026

OpenBMB/Code

Needle: A 26M-Parameter Model Built for Tool Calling

Cactus Compute distilled Gemini's tool-calling behavior into a tiny model meant to run locally.

May 12, 2026

Monday, May 11, 2026

2 releases

Lightricks/Image → Video

Lightricks Releases LoRA for AI Lip-Dubbing

The new 'Identity-Control' adapter fine-tunes the company's LTX-2.3 video model to create realistic lip-syncing for dubbing workflows.

May 11, 2026

Tencent/Text / LLM

Tencent Releases 1.8B Model for Multilingual Translation

The 1.8 billion-parameter model from the Chinese tech giant is designed for high-quality translation across a wide range of language pairs.

May 11, 2026

Wednesday, May 6, 2026

1 release

Supertone/Text → Speech

Supertone Releases On-Device Multilingual TTS Model

The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.

May 6, 2026

Tuesday, May 5, 2026

1 release

LiquidAI/Embeddings

Liquid AI ships a 350M multilingual embedding model

LFM2.5-Embedding-350M targets retrieval and search workloads on edge hardware, where compact size matters as much as accuracy.

May 5, 2026

Sunday, May 3, 2026

1 release

Moonshot AI/Vision-LanguageMajor release

Kimi K2.6 tops closed models in coding test

Moonshot AI's open-weights mixture-of-experts model reportedly outperformed Claude, GPT-5.5, and Gemini on a programming challenge.

May 3, 2026

Tuesday, April 28, 2026

1 release

NVIDIA/Image Editing

NVIDIA Releases PiD for High-Quality Image Upscaling

The new component is a specialized VAE decoder that works with Stability AI's Z-Image model to enhance super-resolution tasks.

Apr 28, 2026

Friday, April 24, 2026

1 release

NVIDIA/Any-to-Any

NVIDIA Releases Efficient Nemotron-3 Multimodal MoE

The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.

Apr 24, 2026

Any-to-Any Reasoning

Thursday, April 23, 2026

5 releases

Google DeepMind/Any-to-Any

Google Releases Gemma 4 Multimodal Open Model

The new 26-billion-parameter model from DeepMind uses a mixture-of-experts design for greater efficiency and is tuned for assistant-style tasks.

Apr 23, 2026

Google DeepMind/Any-to-AnyMajor release

Google Releases Multimodal Gemma 4 31B Model

The new 31-billion-parameter model is an instruction-tuned, 'any-to-any' powerhouse released under a permissive Apache 2.0 license.

Apr 23, 2026

Google DeepMind/Any-to-Any

Google Releases 4B Multimodal Gemma 4 Assistant

The new 4-billion-parameter model is instruction-tuned for 'any-to-any' tasks, handling a flexible mix of data types.

Apr 23, 2026

Google DeepMind/Any-to-Any

Google Releases 2B Multimodal Gemma 4 Assistant Model

The new compact model from DeepMind is instruction-tuned for "any-to-any" tasks, capable of processing and generating mixed data types.

Apr 23, 2026

Xiaomi/Speech → Text

Xiaomi Releases MiMo Model for Speech Recognition

The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.

Apr 23, 2026

Wednesday, April 22, 2026

4 releases

inclusionAI/Any-to-Any

LLaDA2.0-Uni: A Unified MoE for Vision Tasks

The new open-source model from inclusionAI uses a Mixture-of-Experts architecture to handle multiple vision tasks in a single, diffusion-based system.

Apr 22, 2026

DeepSeek/Text / LLMMajor release

DeepSeek Releases V4-Pro, an Open MoE Contender

The new flagship model combines a Mixture-of-Experts architecture with a permissive MIT license, positioning it for wide commercial adoption.

Apr 22, 2026

DeepSeek/Text / LLMMajor release

DeepSeek Releases V4-Flash, a Fast MIT-Licensed MoE Model

The new Mixture of Experts model from the Beijing-based AI lab is optimized for fast, efficient conversational AI and carries a fully permissive license.

Apr 22, 2026

SenseTime/Any-to-Any

SenseTime Releases 8B Any-to-Any Multimodal Model

The new SenseNova-U1 model unifies image understanding, generation, and editing within a single 8-billion-parameter framework.

Apr 22, 2026

Tuesday, April 21, 2026

1 release

Qwen · Alibaba/Vision-Language

Alibaba's Qwen Releases Open 27B Vision Model

The new dense model, licensed under Apache 2.0, brings both text and image understanding to the midrange parameter space.

Apr 21, 2026

Monday, April 20, 2026

1 release

NVIDIA/Any-to-Any

NVIDIA Releases Nemotron-3-Nano Omni-Modal MoE

The new 30-billion-parameter Mixture-of-Experts model handles any combination of modalities with just 3 billion active parameters.

Apr 20, 2026

Any-to-Any Reasoning

Friday, April 17, 2026

1 release

Resemble AI/Text → Speech

Resemble AI Releases Dramabox Voice Cloning TTS Model

The new text-to-speech model uses a diffusion-transformer architecture for high-quality, expressive audio and one-shot voice cloning.

Apr 17, 2026

Thursday, April 16, 2026

1 release

IBM/Speech → Text

IBM Releases 2B Granite Model for Multilingual Speech

The new two-billion-parameter model offers transcription capabilities for at least five major languages under a permissive Apache 2.0 license.

Apr 16, 2026

Wednesday, April 15, 2026

1 release

Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases 35B Multimodal Mixture-of-Experts Model

The new Qwen3.6-35B-A3B from Alibaba's Qwen team combines vision and language capabilities using an efficient sparse architecture.

Apr 15, 2026

Tuesday, April 14, 2026

2 releases

Motif Technologies/Text → Video

Motif Releases 2B Open-Source Text-to-Video Model

The new Apache 2.0 licensed model uses a diffusion transformer architecture to offer a new open alternative for video generation research.

Apr 14, 2026

Moonshot AI/Vision-LanguageMajor release

Moonshot AI Releases Kimi-K2.6 Multimodal Model

The Chinese AI lab has published weights for its new vision-language model, though a restrictive license limits its use to research applications.

Apr 14, 2026

Monday, April 13, 2026

1 release

OpenBMB/Vision-Language

OpenBMB Releases MiniCPM-V for On-Device Vision

The new open-source vision-language model is designed for high-resolution image understanding on mobile and edge devices.

Apr 13, 2026

Saturday, April 11, 2026

2 releases

NVIDIA/Text / LLM

NVIDIA's Nemotron TwoTower mixes diffusion and Mamba

A new 30B mixture-of-experts base model activates just 3B parameters per token and pairs a hybrid diffusion/Mamba design.

Apr 11, 2026

NVIDIA/Text / LLM

NVIDIA's Nemotron TwoTower is a MoE experiment

An experimental 30B mixture-of-experts base model blends diffusion and Mamba ideas under a two-tower design.

Apr 11, 2026

Thursday, April 9, 2026

1 release

MiniMax/Text / LLM

MiniMax Releases M2.7, an MoE Model with FP8 Weights

The new conversational language model from the Chinese AI company uses a Mixture-of-Experts architecture and 8-bit weights, but is released under a restrictive custom license.

Apr 9, 2026

Tuesday, April 7, 2026

1 release

Baidu/Text → Image

Baidu Releases 8B Text-to-Image Model ERNIE-Image

The large diffusion model from the Chinese tech giant is available under the commercially permissive Apache 2.0 license, a notable release for the community.

Apr 7, 2026

Monday, April 6, 2026

1 release

Black Forest Labs/Text → Image

Black Forest Labs Releases Open FLUX.2 Image Decoder

This new component is part of a novel transformer-based architecture for text-to-image generation, released under a permissive Apache 2.0 license.

Apr 6, 2026

Friday, April 3, 2026

2 releases

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases Open-Source GLM-5.1 MoE Model

The new bilingual model from the Chinese AI firm features an efficient Mixture-of-Experts architecture and a fully permissive MIT license.

Apr 3, 2026

OpenBMB/Text → Speech

OpenBMB Releases VoxCPM2 for Expressive TTS

The new diffusion-based model from the OpenBMB research group supports multilingual speech, emotional control, and zero-shot voice cloning.

Apr 3, 2026

Thursday, April 2, 2026

2 releases

OpenMOSS/Text → Speech

MOSS-TTS-Nano Delivers Multilingual Speech at 100M Params

The new open-source model from OpenMOSS-Team generates high-quality speech in multiple languages while maintaining a remarkably small footprint.

Apr 2, 2026

Tencent/Vision-Language

Tencent Releases 2B Vision Model for Robotics

The new HY-Embodied 0.5 is a vision-language model designed specifically for multi-object tracking in dynamic, real-world environments.

Apr 2, 2026

Tuesday, March 31, 2026

2 releases

Tencent/Image → Video

Tencent Releases HY-OmniWeaving for Multi-Image Video

Built on their HunyuanVideo-1.5 architecture, the new model synthesizes video by combining multiple static images and text prompts into a cohesive narrative.

Mar 31, 2026

JD/Image Editing

JD.com Releases Open-Source Bilingual Image Editor

The new JoyAI-Image-Edit model allows for instruction-based photo manipulation in both English and Chinese under a permissive Apache 2.0 license.

Mar 31, 2026

Monday, March 30, 2026

2 releases

KRAFTON/Any-to-Any

KRAFTON Releases 9B Bilingual Speech Model

The gaming giant behind 'PUBG' has released Raon-Speech-9B, a multimodal model for English and Korean speech recognition and synthesis.

Mar 30, 2026

k2-fsa/Text → Speech

OmniVoice TTS Offers Zero-Shot Multilingual Voice Cloning

A new open-source text-to-speech model from the k2-fsa project can replicate a voice and generate speech in multiple languages from a single short audio sample.

Mar 30, 2026

Friday, March 27, 2026

1 release

HKUSTAudio/Any-to-Any

HKUST Releases Audio-Omni, a Unified Audio Model

The new diffusion-based model handles speech, music, and general audio tasks like conversion and editing within a single, versatile framework.

Mar 27, 2026

Any-to-Any Music

Wednesday, March 25, 2026

1 release

Meituan/Any-to-Any

Meituan Releases LongCat-Next 'Any-to-Any' AI Model

The Chinese tech company has released the weights for a unified model that can process and generate combinations of text, images, audio, and video.

Mar 25, 2026

Tuesday, March 24, 2026

1 release

Cohere/Speech → Text

Cohere Releases Top-Ranked Multilingual Transcription Model

The new automatic speech recognition model from Cohere Labs sets a new benchmark on the Hugging Face Open ASR Leaderboard for multilingual performance.

Mar 24, 2026

Monday, March 23, 2026

1 release

Aratako/Text → Speech

Irodori-TTS v2 Offers Open Japanese Speech Synthesis

The 500-million-parameter model from researcher Aratako provides a high-quality, single-speaker voice under a permissive MIT license.

Mar 23, 2026

Saturday, March 21, 2026

1 release

GAIR/Image → Video

GAIR Releases daVinci-MagiHuman for Video Generation

The new open-source model from the General Artificial Intelligence Research team can create video clips complete with audio from a variety of inputs.

Mar 21, 2026

Wednesday, March 18, 2026

1 release

Baidu/Vision-Language

Baidu Releases Qianfan-OCR for Document Intelligence

The new vision-language model from the Chinese tech giant is designed for complex, multilingual optical character recognition and layout analysis.

Mar 18, 2026

Monday, March 16, 2026

1 release

Cactus Compute/Code

Needle: A 26M-Param Model Built for On-Device Tool Calls

Cactus Compute's tiny encoder-decoder is distilled specifically for function calling at the edge, trading general chat for a narrow, useful job.

Mar 16, 2026

Text / LLM Code

Wednesday, March 11, 2026

2 releases

Google DeepMind/Any-to-AnyMajor release

Google Releases Gemma 4, a 26B Vision-Language Model

The new open-source model from DeepMind uses a Mixture-of-Experts architecture to handle both text and image inputs efficiently.

Mar 11, 2026

Google DeepMind/Any-to-AnyMajor release

Google Releases Multimodal Gemma 4 31B Model

The new 31-billion-parameter model is instruction-tuned and can process both text and images, marking a significant expansion for the Gemma family.

Mar 11, 2026

Monday, March 9, 2026

2 releases

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases 9B FLUX.2 klein Image Model

The new open-weight model offers a more compact, distilled version of the advanced FLUX architecture for text-to-image and editing tasks.

Mar 9, 2026

Fishaudio/Text → Speech

Fish Audio's S2-Pro Brings Expressive TTS to Open Source

The new text-to-speech model can follow natural language instructions to control tone, clone voices from short clips, and speak multiple languages.

Mar 9, 2026

Wednesday, March 4, 2026

1 release

Lightricks/Image → Video

Lightricks LTX-2.3 Generates Video and Audio Together

The new model, based on Stable Video Diffusion, can create video and a corresponding soundtrack simultaneously from text, image, or audio prompts.

Mar 4, 2026

Monday, March 2, 2026

6 releases

NVIDIA/Vision-Language

NVIDIA's New 3B VLM Pinpoints Objects in Images

The new 3-billion-parameter model, based on the company's Eagle architecture, is designed for high-precision visual grounding tasks.

Mar 2, 2026

Google DeepMind/Any-to-AnyMajor release

Google Releases Compact Gemma 4 E2B Multimodal Model

The new 2-billion-parameter model from Google DeepMind brings efficient image-and-text understanding to the open-source Gemma family.

Mar 2, 2026

Google DeepMind/Any-to-AnyMajor release

Google's Gemma 4 Arrives with Any-to-Any Multimodal Skills

The new 2-billion-parameter model from DeepMind can process text, vision, and audio, making it a versatile and efficient foundation for developers.

Mar 2, 2026

Google DeepMind/Any-to-Any

Google Releases Gemma 4 E4B, a 4B Multimodal Model

The new 4-billion-parameter vision-language model brings image and text understanding to Google's popular open-source family.

Mar 2, 2026

Google DeepMind/Any-to-AnyMajor release

Google's Gemma 4 Debuts with Any-to-Any Multimodality

The new 4-billion parameter model from Google DeepMind is designed for versatile input and output, handling text, images, and other data types.

Mar 2, 2026

Xiaomi/Image Editing

Xiaomi Releases Bilingual Image Editing Model FireRed 1.1

The new open-source model from Xiaomi's FireRedTeam leverages the Qwen-Image-Edit pipeline to offer instruction-based image editing in both English and Chinese.

Mar 2, 2026

Saturday, February 28, 2026

1 release

Qwen · Alibaba/Vision-Language

Alibaba's Qwen Releases Compact 0.8B Vision Model

The new 800-million-parameter model is the smallest in the Qwen3.5 family, designed for efficient multimodal tasks on consumer-grade hardware.

Feb 28, 2026

Friday, February 27, 2026

3 releases

IBM/Speech → Text

IBM Releases 1B Granite Model for Multilingual Speech

The new Apache 2.0-licensed model is part of the company's Granite family and aims to provide high-quality speech-to-text across several languages.

Feb 27, 2026

Qwen · Alibaba/Vision-Language

Alibaba's Qwen team releases 4B vision-language model

The new Qwen3.5-4B model combines text and image understanding in a compact, permissively licensed package for developers.

Feb 27, 2026

Qwen · Alibaba/Vision-Language

Qwen Releases 9B Multimodal Model in New 3.5 Series

The new open-source vision-language model from Alibaba's Qwen team offers strong performance in a compact, Apache 2.0-licensed package.

Feb 27, 2026

Tuesday, February 24, 2026

4 releases

Resemble AI/Speech → Text

Moonshine: Open STT Models Aim to Beat Whisper

Resemble AI releases MIT-licensed speech-to-text models that claim higher accuracy than OpenAI's Whisper Large v3.

Feb 24, 2026

Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases Flagship 122B Multimodal MoE Model

The new Qwen3.5-122B-A10B combines a massive parameter count with an efficient Mixture-of-Experts architecture for advanced vision and language tasks.

Feb 24, 2026

Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases 27B Vision Model with Long Context

The new model from Alibaba's Qwen team combines multimodal understanding with a 131K token context window under a permissive Apache 2.0 license.

Feb 24, 2026

Qwen · Alibaba/Vision-Language

Qwen Releases Efficient 35B Multimodal MoE Model

The new Qwen3.5-35B-A3B model from Alibaba combines vision and language capabilities with a resource-friendly Mixture of Experts design.

Feb 24, 2026

Monday, February 16, 2026

2 releases

Qwen · Alibaba/Vision-LanguageMajor release

Qwen releases flagship 397B multimodal MoE

The new open-source model from Alibaba uses a Mixture-of-Experts architecture to balance massive scale with efficient inference.

Feb 16, 2026

HumeAI/Text → Speech

Hume AI Releases 3B Multilingual Text-to-Speech Model

The new model, Tada-3B-ML, is designed for fine-grained control over vocal expression across more than 10 languages.

Feb 16, 2026

Thursday, February 12, 2026

2 releases

nineninesix/Text → Speech

Kani-TTS-2 Offers New Open-Source Voice Generation

An independent researcher has released a new English text-to-speech model under a permissive license, built on a modern generative foundation.

Feb 12, 2026

MiniMax/Text / LLM

MiniMax Releases M2.5 Mixture-of-Experts Model

The Chinese AI company's first open-weight release uses an efficient FP8 data type but comes with a restrictive, non-commercial license.

Feb 12, 2026

Wednesday, February 11, 2026

1 release

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases Open-Source GLM-5 MoE Model

The new Mixture-of-Experts model from the Chinese AI company combines an advanced architecture with a fully permissive MIT license for commercial use.

Feb 11, 2026

Tuesday, February 10, 2026

2 releases

inclusionAI/Any-to-Any

inclusionAI's Ming 2.0 Tackles Any-to-Any Multimodality

The new open-source Mixture-of-Experts model can process and generate content across text, images, and audio in any combination.

Feb 10, 2026

Nanbeige/Text / LLM

Nanbeige Releases 3B Chinese-Enhanced Language Model

The new Llama-based model was trained from scratch on 3.5 trillion tokens of Chinese and English data to enhance its bilingual capabilities.

Feb 10, 2026

Friday, February 6, 2026

2 releases

OpenMOSS/Text → Speech

MOSS-TTS: A New Multilingual Text-to-Speech Model

The new system from the OpenMOSS Team uses a novel 'delay-pattern' architecture to generate natural-sounding speech in Chinese, English, and Japanese.

Feb 6, 2026

Soul AILab/Music

Soul-AILab Releases Zero-Shot Singing Voice Model

The new model, SoulX-Singer, can replicate a singing voice from a short audio sample and supports both English and Chinese under a permissive license.

Feb 6, 2026

Music Text → Speech

Tuesday, February 3, 2026

1 release

OpenBMB/Any-to-Any

OpenBMB Releases 'Any-to-Any' Multimodal Model

The new MiniCPM-o 4.5 model from the open-source research group can process and generate interleaved combinations of images, text, and audio.

Feb 3, 2026

Monday, February 2, 2026

1 release

OpenBMB/Any-to-Any

MiniCPM-o 4.5 Offers 'Any-to-Any' Multimodal AI

The new model from OpenBMB supports mixed-modality inputs and outputs, from text and images to audio and video, in a single efficient package.

Feb 2, 2026

Friday, January 30, 2026

2 releases

Qwen · Alibaba/Code

Qwen Releases Coder-Next, A New Open MoE Coding Model

The new model from Alibaba's Qwen team uses a Mixture-of-Experts architecture and is released under the commercially-friendly Apache 2.0 license.

Jan 30, 2026

Zhipu AI/Vision-Language

Zhipu AI Releases Multilingual GLM-OCR Vision Model

The new vision-language model from the creators of the GLM series is specialized for recognizing and extracting text from images across multiple languages.

Jan 30, 2026

Wednesday, January 28, 2026

7 releases

Black Forest Labs/Text → Image

Black Forest Labs Releases FLUX.2 Klein 9B

The open-weight text-to-image model brings a 9-billion-parameter base release to the FLUX.2 Klein family.

Jan 28, 2026

OpenMOSS/Any-to-Any

OpenMOSS Releases MOVA, a 720p Multimodal Video Generator

The new open model can generate high-definition video with synchronized audio from a flexible combination of text and image prompts.

Jan 28, 2026

Any-to-Any Image → Video

Baidu/Vision-Language

Baidu Releases Open VLM for Advanced Document OCR

The new PaddleOCR-VL model is built to parse not just text, but also the tables, formulas, and page layouts found in complex documents.

Jan 28, 2026

OpenMOSS/Image → Video

OpenMOSS Releases MOVA for Joint Video and Audio Gen

The new model generates 360p video from text or images and creates corresponding audio tracks simultaneously, a notable step for integrated audiovisual synthesis.

Jan 28, 2026

Qwen · Alibaba/Speech → Text

Qwen Releases 0.6B Model for Audio-Text Alignment

The new open-source tool, based on the Qwen3 architecture, precisely synchronizes audio recordings with their corresponding text transcripts.

Jan 28, 2026

Qwen · Alibaba/Speech → Text

Qwen3 Family Expands into Speech Recognition

Alibaba's Qwen team has released a new 1.7-billion-parameter model designed specifically for automatic speech recognition.

Jan 28, 2026

Qwen · Alibaba/Speech → Text

Qwen open-sources compact model for speech recognition

The new 600-million-parameter Qwen3-ASR model is designed for efficient, high-quality audio transcription under a permissive license.

Jan 28, 2026

Tuesday, January 27, 2026

1 release

DeepSeek/Vision-Language

DeepSeek-OCR-2 Tackles Multilingual Document AI

The new open vision-language model is designed to extract text and understand structure from complex, multilingual documents.

Jan 27, 2026

Monday, January 26, 2026

1 release

robbyant/Image → Video

Lingbot-World Animates Images with Camera Control

The new open-source world model from researcher robbyant generates short video clips from a single image, giving users control over the virtual camera path.

Jan 26, 2026

Friday, January 23, 2026

1 release

Qwen · Alibaba/Text → Image

Alibaba's Qwen Team Releases Z-Image Diffusion Model

The makers of the popular Qwen language models have published their first open-source text-to-image generator with a permissive Apache 2.0 license.

Jan 23, 2026

Thursday, January 22, 2026

1 release

YatharthS/Text → Speech

LuxTTS Delivers Lightweight, Open-Source Speech Synthesis

The new text-to-speech model is optimized for the ONNX runtime, making it a promising option for efficient, on-device audio generation.

Jan 22, 2026

Wednesday, January 21, 2026

6 releases

Mistral AI/Speech → Text

Mistral Enters Speech AI with Voxtral Mini Model

The company, known for its powerful text models, has released its first open-source speech recognition system designed for real-time, multilingual transcription.

Jan 21, 2026

Microsoft/Speech → Text

Microsoft Releases VibeVoice for Speech Transcription

The new open-source automatic speech recognition model handles multilingual transcription and speaker identification out of the box.

Jan 21, 2026

Qwen · Alibaba/Text → Speech

Qwen Releases Open-Source Voice Cloning Model

The new 600-million-parameter Qwen3-TTS model can generate speech in multiple languages and clone voices from short audio clips.

Jan 21, 2026

Qwen · Alibaba/Text → Speech

Qwen Releases a Compact Custom-Voice TTS Model

The new 600-million-parameter model from Alibaba's Qwen team can clone voices from short audio clips for multilingual speech synthesis.

Jan 21, 2026

Qwen · Alibaba/Text → Speech

Qwen Releases Open 1.7B Custom Voice Synthesis Model

Alibaba's Qwen team has released a new text-to-speech model capable of cloning voices from just a few seconds of audio.

Jan 21, 2026

Qwen · Alibaba/Text → Speech

Qwen Unveils Open Model for Custom Voice Synthesis

The new 1.7-billion-parameter text-to-speech model from Alibaba's Qwen team can generate novel voices from short audio prompts.

Jan 21, 2026

Monday, January 19, 2026

1 release

Zhipu AI/Text / LLM

Zhipu AI Releases GLM-4.7-Flash MoE Model

The new Mixture-of-Experts model from the Beijing-based AI company is optimized for speed and released under the permissive MIT license.

Jan 19, 2026

Friday, January 16, 2026

1 release

LightOn/Vision-Language

LightOn Releases OCR-2, a 1B Document AI Model

The new vision model from the Paris-based AI lab uses Mistral architecture to extract text and structure from complex documents like PDFs and forms.

Jan 16, 2026

Wednesday, January 14, 2026

6 releases

Black Forest Labs/Text → Image

Black Forest Labs Releases 9B FLUX.2 Image Model

The new text-to-image model emphasizes speed and efficiency with a novel architecture and FP8 quantization.

Jan 14, 2026

ekwek/Text → Speech

Soprano TTS Model Leverages Qwen3 Architecture

The new 80-million-parameter text-to-speech model adapts a powerful language model architecture for efficient, open-source audio generation.

Jan 14, 2026

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases Open-Source FLUX.2 Klein 4B

The new 4-billion-parameter model is a distilled version of the powerful FLUX.2 architecture, released under a commercially-friendly Apache 2.0 license.

Jan 14, 2026

Black Forest Labs/Text → Image

FLUX.2 Klein: A Compact 4B Open-Source Image Model

The new 4-billion-parameter model from Black Forest Labs offers an efficient, transformer-based alternative to latent diffusion for image generation.

Jan 14, 2026

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases 9B FLUX.2 Image Model

The new 9-billion-parameter model uses a Diffusion Transformer architecture, promising higher performance than existing open-source alternatives.

Jan 14, 2026

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases New FLUX.2 Image Model

The new 9-billion-parameter text-to-image model uses a novel architecture that operates directly on pixels for faster, more efficient generation.

Jan 14, 2026

Monday, January 12, 2026

2 releases

HumeAI/Text → Speech

Hume AI Releases TADA 1B for Expressive Speech

The new 1-billion-parameter model combines a Llama 3.2 base with text-to-speech to generate more natural and nuanced audio.

Jan 12, 2026

Google DeepMind/Text / LLM

Google Releases TranslateGemma for Open Translation

The new 4B-parameter model is an instruction-tuned variant of Gemma, designed specifically for high-quality multilingual translation tasks.

Jan 12, 2026

Sunday, January 11, 2026

1 release

Kugelaudio/Text → Speech

OpenMOSS Releases KugelAudio for European Languages

The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.

Jan 11, 2026

Thursday, January 8, 2026

1 release

Zhipu AI/Text → Image

Zhipu AI Releases Open, Bilingual GLM-Image Model

The new text-to-image model is fluent in both Chinese and English, built on the CogView2 architecture and released under a permissive MIT license.

Jan 8, 2026

Vision-Language Reasoning

Wednesday, January 7, 2026

1 release

Google DeepMind/Vision-Language

Google's MedGemma brings open vision AI to medicine

The new 4-billion-parameter vision-language model is specialized for tasks in radiology, pathology, and complex clinical reasoning.

Jan 7, 2026

Tuesday, January 6, 2026

1 release

Supertone/Text → Speech

Supertone Open-Sources Supertonic 2 Voice Model

The new text-to-speech model from the audio AI company supports English, Korean, and Spanish and comes in the efficient ONNX format for deployment.

Jan 6, 2026

Saturday, January 3, 2026

1 release

Lightricks/Image → VideoMajor release

Lightricks Releases LTX-2 Multimodal Video Generator

The new diffusion model from the creative app company can generate short video clips from text, images, audio, and even other videos.

Jan 3, 2026

Thursday, January 1, 2026

1 release

Moonshot AI/Vision-LanguageMajor release

Moonshot AI Releases Kimi K2.5 Multimodal Model

The new vision-language model from the Chinese AI firm uses a Mixture-of-Experts architecture and is now available on Hugging Face.

Jan 1, 2026

Tuesday, December 30, 2025

1 release

Qwen · Alibaba/Text → Image

Qwen Releases Bilingual Open-Source Image Model

Alibaba's latest text-to-image generator, Qwen-Image 2512, is optimized for creating visuals from both English and Chinese prompts.

Dec 30, 2025

Tuesday, December 23, 2025

1 release

Qwen · Alibaba/Any-to-Any

Qwen's Fun-Audio-Chat: An Open Speech-to-Speech LLM

The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.

Dec 23, 2025

Saturday, December 20, 2025

1 release

MiniMax/Text / LLM

MiniMax Debuts M2.1, an MoE Model Optimized with FP8

The new Mixture of Experts model from the Chinese AI firm uses 8-bit floating-point precision for a smaller memory footprint and faster inference.

Dec 20, 2025

Thursday, December 18, 2025

1 release

Google DeepMind/Speech → Text

Google Releases MedASR for Medical Transcription

The new speech recognition model from DeepMind is trained specifically on medical dictation, aiming for higher accuracy in clinical notes.

Dec 18, 2025

Wednesday, December 17, 2025

4 releases

YatharthS/Text → Speech

MiraTTS Brings Qwen2 to Bilingual Speech Synthesis

A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.

Dec 17, 2025

ekwek/Text → Speech

Soprano-80M: A Tiny TTS Model Based on Qwen3

Developer 'ekwek' has released a compact 80-million-parameter text-to-speech model, notable for its unconventional use of a Qwen3 language model architecture.

Dec 17, 2025

Qwen · Alibaba/Image Editing

Qwen Releases Open, Bilingual Image Editing Model

The new diffusion model from Alibaba's team allows for precise, instruction-based image modifications in both English and Chinese.

Dec 17, 2025

NVIDIA/Speech → Text

NVIDIA Releases Streaming Speech-to-Text Model

The 600-million-parameter Nemotron model is designed for real-time English transcription using a cache-aware FastConformer architecture.

Dec 17, 2025

Monday, December 15, 2025

1 release

Qwen · Alibaba/Speech → Text

Qwen Releases Compact ASR Model for Streaming Audio

The new Fun-ASR-Nano model from Alibaba's team packs real-time multilingual transcription, speaker diarization, and hotword detection into an efficient package.

Dec 15, 2025

Saturday, December 13, 2025

1 release

Huaichang/Image → Video

PersonaLive Model Animates Portraits in Real Time

The new open-source model from OpenBMB uses a diffusion-based architecture to generate expressive video from a single still image.

Dec 13, 2025

Friday, December 12, 2025

1 release

Tencent/Image → Video

Tencent's HY-WorldPlay Creates 3D Scenes from One Image

The new model from Tencent's Hunyuan team generates dynamic video and reconstructs 3D environments using a single static picture.

Dec 12, 2025

Image → Video Text → 3D

Thursday, December 11, 2025

1 release

Qwen · Alibaba/Text → Speech

Alibaba Releases CosyVoice 3 for Expressive TTS

The new 500-million-parameter text-to-speech model from the Qwen team offers multilingual voice cloning and emotional control.

Dec 11, 2025

Wednesday, December 10, 2025

1 release

Zhipu AI/Text → Speech

Zhipu AI Releases GLM-TTS for Zero-Shot Voice Cloning

This new text-to-speech model can replicate a voice from just a few seconds of audio, using a novel combination of flow matching and reinforcement learning.

Dec 10, 2025

Tuesday, December 9, 2025

1 release

Zhipu AI/Speech → Text

Zhipu AI Releases Compact Bilingual Speech Model

The new GLM-ASR-Nano model is designed for efficient automatic speech recognition in both English and Mandarin Chinese.

Dec 9, 2025

Sunday, December 7, 2025

1 release

Zhipu AI/Vision-Language

Zhipu AI Releases Fast, Open Vision Model GLM-4.6V-Flash

The new model from the GLM-4.6V family offers a fast, MIT-licensed option for developers working with both text and images.

Dec 7, 2025

Friday, December 5, 2025

2 releases

OpenBMB/Text → Speech

VoxCPM 1.5 Brings Open-Source Voice Cloning

The new 500-million-parameter text-to-speech model from OpenBMB supports both English and Chinese and can replicate a voice from a short audio sample.

Dec 5, 2025

Meituan/Image Editing

Meituan Releases Open, Bilingual Image Editing Model

The new LongCat-Image-Edit model follows natural language instructions to perform complex photo manipulations in both English and Chinese.

Dec 5, 2025

Thursday, December 4, 2025

2 releases

Quark Vision/Image → Video

Baidu's Live-Avatar Animates Photos With Audio

The new 14-billion-parameter model uses audio input to generate realistic talking head videos from a single still image.

Dec 4, 2025

Microsoft/Text → Speech

Microsoft Releases VibeVoice for Real-Time AI Speech

The new 500-million-parameter model is designed for generating natural, long-form speech with very low latency for interactive applications.

Dec 4, 2025

Tuesday, December 2, 2025

1 release

Resemble AI/Text → Speech

Resemble AI Releases Chatterbox Turbo for Open TTS

The new text-to-speech model focuses on performance and offers voice cloning capabilities for English under a permissive MIT license.

Dec 2, 2025

Monday, December 1, 2025

1 release

DeepSeek/Text / LLM

DeepSeek-V3.2 Arrives With FP8 Weights, MIT License

The new Mixture-of-Experts model from DeepSeek AI combines an efficient FP8 architecture with a fully permissive license for commercial use.

Dec 1, 2025

Friday, November 28, 2025

1 release

FlashLabs/Any-to-Any

FlashLabs Releases Chroma-4B, an Any-to-Any Model

The new 4-billion-parameter model handles text, image, and speech inputs and outputs, including direct speech-to-speech translation.

Nov 28, 2025

Tuesday, November 25, 2025

1 release

Qwen · Alibaba/Text → Image

Alibaba Releases Z-Image-Turbo, A Fast Open Image Model

The new text-to-image model from the team behind Qwen uses a diffusion transformer to generate high-resolution images in just a single step.

Nov 25, 2025

Saturday, November 22, 2025

1 release

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases Open-Source FLUX.2 Image Model

The developer preview of the next-generation text-to-image architecture promises significant architectural improvements over its predecessor.

Nov 22, 2025

Tuesday, November 18, 2025

2 releases

Tencent/Text → VideoMajor release

Tencent Releases HunyuanVideo 1.5 Generation Model

The new diffusion model generates short video clips from text and image prompts, adding another major player to the open video space.

Nov 18, 2025

Tencent/Vision-Language

Tencent Releases 1B Parameter HunyuanOCR Model

The new vision-language model from Tencent Hunyuan offers a compact, end-to-end solution for optical character recognition.

Nov 18, 2025

Monday, November 17, 2025

1 release

Mistral AI/Text → Speech

Mistral AI Releases Voxtral, an Open-Source TTS Model

The French AI leader expands beyond large language models with a new, 4-billion-parameter model for generating multilingual speech.

Nov 17, 2025

Saturday, November 15, 2025

1 release

Nari Labs/Text → Speech

Nari Labs Releases Dia2-2B, an Open Voice Cloning Model

The 2-billion-parameter text-to-speech model can clone voices from a short audio sample and is available under an Apache 2.0 license.

Nov 15, 2025

Friday, November 7, 2025

2 releases

Meta AI/Vision-Language

Meta releases SAM 3 for image and video segmentation

The latest Segment Anything Model extends Meta's mask-generation lineage from still images into video, now available on Hugging Face.

Nov 7, 2025

Vision-Language Reasoning

Baidu/Vision-Language

Baidu Releases Open Vision-Language MoE Model

The new ERNIE 4.5 VL model brings advanced multimodal reasoning to the open-source community with an efficient Mixture-of-Experts architecture.

Nov 7, 2025

Tuesday, November 4, 2025

1 release

Moonshot AI/ReasoningMajor release

Moonshot AI Releases Kimi-K2 Reasoning Model

The new Mixture-of-Experts model is designed for complex tasks but arrives in a custom compressed format with a restrictive license.

Nov 4, 2025

Friday, October 31, 2025

1 release

BAAI/Any-to-Any

BAAI Releases Emu3.5, an 'Any-to-Any' Multimodal Model

The new open-source model from the Allen Institute for AI unifies text and image understanding and generation into a single architecture.

Oct 31, 2025

Thursday, October 30, 2025

1 release

Microsoft/Vision-Language

Microsoft Releases Fara-7B Vision Agent Model

The 7-billion-parameter model is designed to understand and interact with graphical user interfaces, building on Alibaba's open-source Qwen2.5-VL.

Oct 30, 2025

Monday, October 27, 2025

1 release

Soul AILab/Text → Speech

SoulX-Podcast 1.7B Offers Open Multi-Speaker TTS

The new 1.7 billion-parameter model from OpenMOSS is trained on conversational data to generate natural dialogue in English and Chinese.

Oct 27, 2025

Friday, October 24, 2025

1 release

Meituan/Text → Video

Meituan Releases Open-Source LongCat-Video Model

The Chinese tech giant has released a new MIT-licensed model capable of generating video from text, images, or by continuing existing clips.

Oct 24, 2025

Thursday, October 23, 2025

1 release

Meituan/Any-to-Any

Meituan Debuts LongCat-Flash-Omni, an Any-to-Any AI Model

The new open-source Mixture-of-Experts model can process and generate any combination of text, images, video, audio, and 3D data.

Oct 23, 2025

Wednesday, October 22, 2025

2 releases

NVIDIA/Speech → Text

NVIDIA Releases Real-Time Speaker Diarization Model

The new Sortformer-based model is designed for streaming audio, identifying up to four distinct speakers in real time.

Oct 22, 2025

MiniMax/Text / LLMMajor release

MiniMax Releases M2, an Open-Weight MoE for Agents

The Shanghai-based AI startup has released a new Mixture-of-Experts model focused on complex reasoning, coding, and agentic tasks.

Oct 22, 2025

Tuesday, October 21, 2025

1 release

Datalab To/Vision-Language

Datalab Releases Chandra, a New OCR Vision Model

The new vision-language model from Datalab is fine-tuned from Qwen2-VL to specialize in extracting text and structure from complex documents.

Oct 21, 2025

Saturday, October 18, 2025

2 releases

Kuaishou/Any-to-Any

Kling Releases UniVideo for Generation and Understanding

The new open-source model combines both video generation and comprehension, a rare dual capability built on the Qwen2.5 vision-language foundation.

Oct 18, 2025

Any-to-Any Text → Video

Maya Research/Text → Speech

Maya Research Releases Maya1, an Expressive TTS Model

The new Apache 2.0 licensed model uses a Llama-based architecture to generate more natural and emotionally nuanced speech from text.

Oct 18, 2025

Friday, October 17, 2025

1 release

DeepSeek/Vision-LanguageMajor release

DeepSeek-OCR Tackles Document Parsing with Vision AI

The new vision-language model uses a novel context compression technique to efficiently extract text and structure from complex documents.

Oct 17, 2025

Thursday, October 16, 2025

1 release

Baidu/Vision-Language

Baidu Releases PaddleOCR-VL for Document AI

The new vision-language model is fine-tuned to understand not just text, but the complex structure of tables, charts, and formulas.

Oct 16, 2025

Wednesday, October 15, 2025

1 release

NVIDIA/Speech → Text

NVIDIA's Parakeet ASR Tackles Multi-Speaker Audio

The 600-million-parameter model offers real-time speech-to-text with speaker diarization, built on the efficient FastConformer architecture.

Oct 15, 2025

Tuesday, October 14, 2025

1 release

inclusionAI/Any-to-Any

inclusionAI Debuts 'Any-to-Any' Multimodal MoE Model

The new Ming-flash-omni-Preview aims to handle any combination of data modalities using an efficient Mixture of Experts architecture.

Oct 14, 2025

Saturday, October 11, 2025

1 release

Qwen · Alibaba/Vision-Language

Alibaba Releases Qwen3-VL, an 8B Open-Source Vision Model

The latest vision-language model from the popular Qwen series is instruction-tuned and available under an Apache 2.0 license.

Oct 11, 2025

Wednesday, October 8, 2025

3 releases

Google DeepMind/Text / LLM

Google Releases Compact FunctionGemma Model

The new 270-million-parameter model from Google DeepMind is fine-tuned specifically for reliable function calling and tool use.

Oct 8, 2025

EPFL VITA/Image → Video

EPFL Releases SVI for Streaming Image-to-Video

The new open-source model from Swiss researchers uses a novel chunking method to generate indefinitely long videos from a single still image.

Oct 8, 2025

Krea/Text → Video

Krea Releases Open-Source Real-Time Video Model

The new 14-billion-parameter model is a distilled, more efficient version of a larger foundation, designed for interactive video generation.

Oct 8, 2025

Tuesday, September 30, 2025

4 releases

inclusionAI/Any-to-Any

inclusionAI Releases Ming-UniVision MoE Multimodal Model

The new 16-billion-parameter model uses a sparse Mixture-of-Experts design to efficiently handle 'any-to-any' data combinations, from text to images.

Sep 30, 2025

Qwen · Alibaba/Vision-LanguageMajor release

Qwen Releases 30B MoE Vision Model, Qwen3-VL

The new open-source model from Alibaba uses a Mixture-of-Experts architecture to make its powerful vision-language capabilities more efficient to run.

Sep 30, 2025

nineninesix/Text → Speech

Kani TTS 370M Offers Compact Multilingual Speech

Based on the Language-Free Modeling for Multilingual Text-To-Speech (LFM2) architecture, the new model offers an efficient solution for developers.

Sep 30, 2025

chetwinlow1/Image → Video

Ovi Syncs Audio and Video in New Open-Source Model

Built on the Wan2.2 architecture, this new 5-billion-parameter model generates short video clips from a single image and simultaneously creates synchronized audio.

Sep 30, 2025

Monday, September 29, 2025

2 releases

Zhipu AI/Text / LLMMajor release

Zhipu AI Releases Open-Weight MoE Model GLM-4.6

The new Mixture-of-Experts model is available under a permissive MIT license and is optimized for complex reasoning and coding tasks.

Sep 29, 2025

inclusionAI/Any-to-Any

Ming-UniAudio Brings MoE to Unified Audio AI

A new 16-billion-parameter model from inclusionAI uses a Mixture-of-Experts architecture to handle a wide range of audio tasks efficiently.

Sep 29, 2025

Friday, September 26, 2025

1 release

ByteDance/Image → Video

ByteDance Releases Lynx for Identity-Preserving Video

The new model from the TikTok parent company generates short video clips that maintain a person's likeness from a single reference image.

Sep 26, 2025

Thursday, September 25, 2025

2 releases

Tencent/Text → Image

Tencent Debuts HunyuanImage 3.0 with MoE Design

The new text-to-image generator from the Chinese tech giant uses a Mixture-of-Experts architecture for more efficient and detailed image creation.

Sep 25, 2025

Tencent/Text → ImageMajor release

Tencent Releases HunyuanImage 3.0 Text-to-Image Model

The new text-to-image generator from the Chinese tech giant uses a Mixture-of-Experts architecture for improved efficiency and output quality.

Sep 25, 2025

Monday, September 22, 2025

1 release

Qwen · Alibaba/Image Editing

Qwen Releases Open-Source Instruction-Based Image Editor

The new model from Alibaba's Qwen team allows users to modify images using natural language prompts instead of complex tools or masks.

Sep 22, 2025

Saturday, September 20, 2025

1 release

Qwen · Alibaba/Any-to-AnyMajor release

Qwen3-Omni Arrives With Any-to-Any Multimodality

The new 30B Mixture-of-Experts model from Alibaba's Qwen team can process and generate content across text, image, and audio formats.

Sep 20, 2025

Thursday, September 18, 2025

1 release

Xiaomi/Any-to-Any

Xiaomi's MiMo-Audio 7B Tackles Complex Speech Tasks

This new instruction-tuned model from Xiaomi can handle a flexible combination of audio and text inputs and outputs, from transcription to voice synthesis.

Sep 18, 2025

Tuesday, September 16, 2025

1 release

OpenBMB/Text → Speech

OpenBMB Releases VoxCPM for Open Voice Synthesis

The new 500-million-parameter model offers high-quality text-to-speech and zero-shot voice cloning under a permissive license.

Sep 16, 2025

Monday, September 15, 2025

3 releases

Qwen · Alibaba/Any-to-Any

Qwen Releases 'Thinking' Multimodal MoE Model

The new 30-billion-parameter Mixture-of-Experts model from Alibaba's Qwen team is designed to show its reasoning process for complex multimodal tasks.

Sep 15, 2025

Any-to-Any Reasoning

Qwen · Alibaba/Any-to-Any

Qwen Releases 30B Model for Audio Captioning

The new Mixture-of-Experts model from Alibaba is fine-tuned to generate detailed, multilingual descriptions for complex audio content.

Sep 15, 2025

neuphonic/Text → Speech

Neuphonic Releases NeuTTS Air for On-Device AI Speech

The new Apache 2.0 text-to-speech model is built on a Qwen2 architecture and optimized for local inference with GGUF support.

Sep 15, 2025

Thursday, September 11, 2025

1 release

moondream/Vision-Language

Moondream 3 Arrives in Preview Release

The next generation of the efficient, open-source vision-language model is now available for early testing and feedback.

Sep 11, 2025

Wednesday, September 10, 2025

2 releases

ByteDance/Image → Video

ByteDance Releases HuMo for Human Video Generation

The new open-source model specializes in creating realistic videos of people, separating appearance from motion for greater control.

Sep 10, 2025

Qwen · Alibaba/Text → Video

Alibaba's Wan2.2 Adds Control to Open Video

The new 14-billion-parameter model from Alibaba's PAI team offers fine-grained control over video generation using inputs like sketches and depth maps.

Sep 10, 2025

Tuesday, September 9, 2025

2 releases

Qwen · Alibaba/Text / LLMMajor release

Qwen Releases 80B Mixture-of-Experts Model

The new Qwen3-Next model from Alibaba combines a large parameter count with an efficient MoE architecture to balance performance and computational cost.

Sep 9, 2025

Alpha-VLLM/Any-to-Any

Lumina-DiMOO: A Diffusion Model for Any-to-Any AI

This new open-source model uses a diffusion architecture instead of a typical transformer to generate and understand a mix of media types.

Sep 9, 2025

Monday, September 8, 2025

1 release

Tencent/Text → Image

Tencent SRPO Fine-Tunes SDXL with Preference Optimization

The new text-to-image model uses a novel rejection sampling technique to align Stable Diffusion XL more closely with human aesthetic preferences.

Sep 8, 2025

Friday, September 5, 2025

1 release

Tencent/Text → Image

Tencent Releases HunyuanImage 2.1 for Bilingual AI Art

The new text-to-image model from the Chinese tech giant is designed to understand both Chinese and English prompts at high resolutions.

Sep 5, 2025

Thursday, September 4, 2025

2 releases

Vibevoice/Text → Speech

Microsoft Releases VibeVoice, a 7B Podcast TTS Model

The new 7-billion-parameter model is designed for generating long-form, multi-speaker audio in English and Chinese under a permissive MIT license.

Sep 4, 2025

Aoi Ot/Text → Speech

Microsoft Releases VibeVoice, a Podcast-Ready TTS Model

The new open-source model specializes in generating long-form, multi-speaker audio in both English and Mandarin, mimicking a natural podcast conversation.

Sep 4, 2025

Thursday, August 28, 2025

1 release

StepFun/Any-to-Any

StepFun Releases Step-Audio 2 mini, a Unified Audio AI

The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.

Aug 28, 2025

Wednesday, August 27, 2025

1 release

Tencent/Image → Video

Tencent's Voyager Model Turns Images into 3D Worlds

The new model from Tencent AI Lab generates temporally and spatially consistent video sequences from a single image, enabling virtual exploration of static scenes.

Aug 27, 2025

Image → Video Text → 3D

Monday, August 25, 2025

2 releases

Microsoft/Text → Speech

Microsoft Releases VibeVoice for Long-Form Audio

The new 1.5-billion-parameter text-to-speech model is designed to generate natural, multi-speaker audio for podcasts and other long-form content.

Aug 25, 2025

Qwen · Alibaba/Image → Video

Alibaba Releases 14B Model for Audio-Driven Video

The new Wan2.2-S2V model takes a still image and a speech track to generate a realistic talking-head animation, available under a permissive license.

Aug 25, 2025

Sunday, August 24, 2025

1 release

OpenBMB/Vision-Language

OpenBMB Releases Compact Multimodal Model MiniCPM-V 4.5

The new vision-language model from the open-source research group demonstrates strong OCR and video understanding capabilities in a small package.

Aug 24, 2025

Tuesday, August 19, 2025

1 release

DeepSeek/Text / LLMMajor release

DeepSeek Releases 671B MoE Model Under MIT License

The new DeepSeek-V3.1-Base is a massive 671-billion-parameter Mixture-of-Experts model designed for efficient, large-scale research and development.

Aug 19, 2025

Sunday, August 17, 2025

1 release

Qwen · Alibaba/Image EditingMajor release

Qwen Releases Open Model for Image Editing

The new open-source model from Alibaba lets users edit images with simple text commands in both English and Chinese.

Aug 17, 2025

Friday, August 15, 2025

1 release

NexaAI/Any-to-Any

NexaAI Releases OmniNeural-4B for On-Device AI

The new 4-billion-parameter model is designed for 'any-to-any' multimodal tasks and optimized to run efficiently on mobile hardware.

Aug 15, 2025

Wednesday, August 13, 2025

1 release

Tencent/Image → Video

Tencent Releases Controllable Game Video Model

The new Hunyuan-GameCraft 1.0 is an open image-to-video model that generates interactive game-like scenes with precise camera control.

Aug 13, 2025

Tuesday, August 12, 2025

1 release

FrancisRing/Image → Video

StableAvatar Brings Open Source Talking Heads to Life

A new diffusion-based model from developer FrancisRing animates still images into talking avatars using only an audio track.

Aug 12, 2025

Vision-Language Reasoning

Sunday, August 10, 2025

1 release

Zhipu AI/Vision-LanguageMajor release

Zhipu AI Releases Open Vision Model GLM-4.5V

The new Mixture-of-Experts model offers strong multimodal reasoning capabilities under a permissive MIT license.

Aug 10, 2025

Friday, August 8, 2025

1 release

Skywork/Image → Video

Skywork Releases Open 'World Model' for Playable Video

The new 1.3-billion-parameter model functions as an interactive 'world model,' generating controllable video scenes from a single static image.

Aug 8, 2025

Tuesday, August 5, 2025

1 release

Google DeepMind/Text / LLM

Google Releases Gemma 3 270M for On-Device AI

The new ultra-compact model from DeepMind is designed for efficient performance in resource-constrained environments like mobile and web.

Aug 5, 2025

Monday, August 4, 2025

4 releases

OpenAI/ReasoningMajor release

OpenAI Releases 21B Open-Weight MoE Model

The new `gpt-oss-20b` is an Apache 2.0-licensed Mixture-of-Experts model designed to run efficiently on consumer-grade hardware.

Aug 4, 2025

OpenAI/ReasoningMajor release

OpenAI Releases Its First Open-Source MoE Model

The new 117-billion-parameter `gpt-oss-120b` is a Mixture-of-Experts model focused on reasoning, released under a permissive Apache 2.0 license.

Aug 4, 2025

NVIDIA/Speech → Text

NVIDIA Releases Canary 1B v2 Multilingual Speech Model

The new 1-billion-parameter model handles both transcription and translation across five languages using the company's efficient FastConformer architecture.

Aug 4, 2025

NVIDIA/Speech → Text

NVIDIA Releases 600M Parakeet for Speech Recognition

The new FastConformer model uses a specialized training technique to improve transcription accuracy in noisy, real-world environments.

Aug 4, 2025

Saturday, August 2, 2025

1 release

Qwen · Alibaba/Text → ImageMajor release

Qwen releases open model for text-in-image generation

The new Apache 2.0 diffusion model from Alibaba's Qwen team focuses on accurately rendering both English and Chinese characters within generated images.

Aug 2, 2025

Thursday, July 31, 2025

1 release

Qwen · Alibaba/Code

Qwen Releases Compact 30B MoE for Coding Agents

The new Apache 2.0 model from Alibaba's Qwen team uses a Mixture-of-Experts architecture to deliver strong performance with only 3B active parameters.

Jul 31, 2025

Wednesday, July 30, 2025

1 release

rednote-hilab/Vision-Language

New VLM `dots.ocr` Takes on Complex Documents

The new 3B-parameter model from rednote-hilab uses a vision-language approach to parse tables, layouts, and even mathematical formulas.

Jul 30, 2025

Tuesday, July 29, 2025

1 release

Skywork/Any-to-Any

Skywork Releases UniPic, a Unified 1.5B Vision Model

The new autoregressive model from the Chinese AI lab can understand, generate, and edit images within a single, compact framework.

Jul 29, 2025

Monday, July 28, 2025

3 releases

Qwen · Alibaba/Image → VideoMajor release

Alibaba Releases Wan2.2, a 14B MoE Video Model

The new open-source diffusion model from the team behind Qwen uses a Mixture-of-Experts architecture to animate still images.

Jul 28, 2025

Tencent/Text → Video

Tencent Releases Wan2.2, a 14B MoE Video Model

The new Apache 2.0-licensed generator uses a Mixture-of-Experts architecture and is available in the popular Diffusers library format for easier integration.

Jul 28, 2025

Qwen · Alibaba/Text → Video

Qwen Releases Wan2.2, a 5B Open-Source Video Model

The new Apache 2.0 licensed model from Alibaba's team generates video from either text prompts or still images, offering a unified approach in a compact package.

Jul 28, 2025

Thursday, July 24, 2025

2 releases

Qwen · Alibaba/Text → Video

Qwen Unveils Wan2.2, a 14B Open Text-to-Video Model

The new Apache 2.0-licensed model from Alibaba's team uses a Mixture-of-Experts architecture for efficient, high-quality video generation.

Jul 24, 2025

Qwen · Alibaba/Image → Video

Qwen Releases Wan2.2, a 14B Image-to-Video Model

The new 14-billion parameter model from Alibaba's AI team uses a Mixture-of-Experts design and is available under the permissive Apache 2.0 license.

Jul 24, 2025

Tuesday, July 22, 2025

1 release

Qwen · Alibaba/CodeMajor release

Qwen Releases 480B Open-Source Model for Code Agents

The new flagship coding model from Alibaba's Qwen team uses a massive Mixture-of-Experts architecture and is released under a permissive Apache-2.0 license.

Jul 22, 2025

Sunday, July 20, 2025

1 release

Zhipu AI/Text / LLMMajor release

Z.ai Releases 355B Parameter GLM-4.5 Under MIT License

The new Mixture-of-Experts model combines massive scale with a fully permissive license, targeting complex reasoning and agentic applications.

Jul 20, 2025

Friday, July 18, 2025

1 release

Qwen · Alibaba/Text → VideoMajor release

Qwen Releases Wan 2.2, a 5B Open Video AI Model

The new Apache 2.0 licensed model from Alibaba's team can generate video from both text and image prompts, adding a powerful new tool to the open-source creative ecosystem.

Jul 18, 2025

Wednesday, July 16, 2025

1 release

HiDream.ai/Image Editing

HiDream.ai Releases 17B Open Image Editing Model

The new MIT-licensed model, HiDream-E1.1, allows for complex image modifications by following natural language instructions.

Jul 16, 2025

Tuesday, July 15, 2025

1 release

inclusionAI/Any-to-Any

Ming-Lite-Omni 1.5 Brings Any-to-Any Modality to Open Source

The new MIT-licensed model from inclusionAI can process and generate a mix of text, images, audio, and video, pushing the boundaries of open multimodal AI.

Jul 15, 2025

Monday, July 14, 2025

2 releases

RaphaelLiu/Image → Video

Pusa V1: A New Open Model for Image-to-Video Animation

Based on the Wan2.1 architecture, this new 14B parameter model offers fine-grained control over video generation from still images and text.

Jul 14, 2025

T-Tech/Speech → Text

T-Tech Releases T-one for Russian Speech Recognition

The new streaming Conformer model from the Russian digital bank is optimized for real-time transcription of telephone conversations.

Jul 14, 2025

Friday, July 11, 2025

1 release

Moonshot AI/Vision-LanguageMajor release

Moonshot AI Releases Trillion-Parameter Kimi-K2 Model

The new Mixture-of-Experts model brings massive scale to the open-weights community, focusing on complex reasoning and coding tasks with a 128K context window.

Jul 11, 2025

Monday, July 7, 2025

1 release

Black Forest Labs/Text → ImageMajor release

Black Forest Labs Releases FLUX.1 Krea Image Model

The new 12-billion-parameter model, tuned by creative AI platform Krea, focuses on high-quality aesthetic output and prompt fidelity.

Jul 7, 2025

Wednesday, July 2, 2025

1 release

ByteDance/Any-to-Any

ByteDance Releases Tar-7B for 'Any-to-Any' Multimodality

The new 7-billion-parameter model from the company's SEED team can process and generate a mix of text, images, audio, and video in a single unified framework.

Jul 2, 2025

Tuesday, July 1, 2025

1 release

Bosonai/Text → Speech

Boson AI Releases Higgs Audio v2 for Expressive TTS

The new 3-billion-parameter model focuses on generating expressive, multilingual speech and is fully open for commercial use under an Apache 2.0 license.

Jul 1, 2025