Zhipu AI Releases MIT-Licensed GLM-5.2 MoE Model
The new bilingual model from the Chinese AI firm uses a Mixture of Experts architecture and sparse attention under a fully permissive license.
The feed
Every new open-source model release and major update — aggregated from across the ecosystem, deduplicated, and refreshed every 12 hours.
164 releases
The new bilingual model from the Chinese AI firm uses a Mixture of Experts architecture and sparse attention under a fully permissive license.
The new 3-billion-parameter model from the Chinese tech giant focuses on challenging benchmarks in mathematics, coding, and graduate-level questions.
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.
The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.
The new Apache 2.0-licensed model is designed for code generation and agentic chat applications, using a Mixture-of-Experts architecture for efficiency.
The new 4-billion-parameter text-to-speech model is available for non-commercial use, promising fine-grained control over vocal delivery.
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
The new 9.3 billion parameter model uses a Diffusion Transformer architecture and excels at rendering coherent text within generated images.
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.
The 600-million-parameter model uses a FastConformer architecture for real-time, multilingual speech-to-text applications.
The 3-billion-parameter model handles image and video generation, editing, and understanding from a single set of weights under a permissive license.
The new 8B-parameter SenseNova U1 model from SenseTime is designed for complex multimodal tasks, including the in-conversation generation and editing of infographics.
The new 'Identity-Control' adapter fine-tunes the company's LTX-2.3 video model to create realistic lip-syncing for dubbing workflows.
The 1.8 billion-parameter model from the Chinese tech giant is designed for high-quality translation across a wide range of language pairs.
The new Supertonic 3 model supports seven languages and is optimized for local inference with the portable ONNX format.
The new component is a specialized VAE decoder that works with Stability AI's Z-Image model to enhance super-resolution tasks.
The new 30-billion parameter Mixture-of-Experts model handles text and images while using only 3 billion active parameters for inference.
The new 26-billion-parameter model from DeepMind uses a mixture-of-experts design for greater efficiency and is tuned for assistant-style tasks.
The new 31-billion-parameter model is an instruction-tuned, 'any-to-any' powerhouse released under a permissive Apache 2.0 license.
The new 4-billion-parameter model is instruction-tuned for 'any-to-any' tasks, handling a flexible mix of data types.
The new compact model from DeepMind is instruction-tuned for "any-to-any" tasks, capable of processing and generating mixed data types.
The new open-source model from the Chinese tech giant offers automatic speech recognition for Mandarin, Cantonese, and English under a permissive MIT license.
The new open-source model from inclusionAI uses a Mixture-of-Experts architecture to handle multiple vision tasks in a single, diffusion-based system.
The new flagship model combines a Mixture-of-Experts architecture with a permissive MIT license, positioning it for wide commercial adoption.
The new Mixture of Experts model from the Beijing-based AI lab is optimized for fast, efficient conversational AI and carries a fully permissive license.
The new SenseNova-U1 model unifies image understanding, generation, and editing within a single 8-billion-parameter framework.
The new dense model, licensed under Apache 2.0, brings both text and image understanding to the midrange parameter space.
The new 30-billion-parameter Mixture-of-Experts model handles any combination of modalities with just 3 billion active parameters.
The new text-to-speech model uses a diffusion-transformer architecture for high-quality, expressive audio and one-shot voice cloning.
The new two-billion-parameter model offers transcription capabilities for at least five major languages under a permissive Apache 2.0 license.
The new Qwen3.6-35B-A3B from Alibaba's Qwen team combines vision and language capabilities using an efficient sparse architecture.
The new Apache 2.0 licensed model uses a diffusion transformer architecture to offer a new open alternative for video generation research.
The Chinese AI lab has published weights for its new vision-language model, though a restrictive license limits its use to research applications.
The new open-source vision-language model is designed for high-resolution image understanding on mobile and edge devices.
The new conversational language model from the Chinese AI company uses a Mixture-of-Experts architecture and 8-bit weights, but is released under a restrictive custom license.
The new vision-language model from the Chinese tech giant is designed for complex, multilingual optical character recognition and layout analysis.
The new open-source model from DeepMind uses a Mixture-of-Experts architecture to handle both text and image inputs efficiently.
The new 31-billion-parameter model is instruction-tuned and can process both text and images, marking a significant expansion for the Gemma family.
The new open-weight model offers a more compact, distilled version of the advanced FLUX architecture for text-to-image and editing tasks.
The new 3-billion-parameter model, based on the company's Eagle architecture, is designed for high-precision visual grounding tasks.
The new 4-billion parameter model from Google DeepMind is designed for versatile input and output, handling text, images, and other data types.
The new open-source model from Xiaomi's FireRedTeam leverages the Qwen-Image-Edit pipeline to offer instruction-based image editing in both English and Chinese.
The new Apache 2.0-licensed model is part of the company's Granite family and aims to provide high-quality speech-to-text across several languages.
The new model, Tada-3B-ML, is designed for fine-grained control over vocal expression across more than 10 languages.
An independent researcher has released a new English text-to-speech model under a permissive license, built on a modern generative foundation.
The new Mixture-of-Experts model from the Chinese AI company combines an advanced architecture with a fully permissive MIT license for commercial use.
The new open-source Mixture-of-Experts model can process and generate content across text, images, and audio in any combination.
The new Llama-based model was trained from scratch on 3.5 trillion tokens of Chinese and English data to enhance its bilingual capabilities.
The new system from the OpenMOSS Team uses a novel 'delay-pattern' architecture to generate natural-sounding speech in Chinese, English, and Japanese.
The new model, SoulX-Singer, can replicate a singing voice from a short audio sample and supports both English and Chinese under a permissive license.
The new vision-language model from the creators of the GLM series is specialized for recognizing and extracting text from images across multiple languages.
The new model generates 360p video from text or images and creates corresponding audio tracks simultaneously, a notable step for integrated audiovisual synthesis.
The makers of the popular Qwen language models have published their first open-source text-to-image generator with a permissive Apache 2.0 license.
The new text-to-speech model is optimized for the ONNX runtime, making it a promising option for efficient, on-device audio generation.
The new open-source automatic speech recognition model handles multilingual transcription and speaker identification out of the box.
The new 600-million-parameter Qwen3-TTS model can generate speech in multiple languages and clone voices from short audio clips.
The new 600-million-parameter model from Alibaba's Qwen team can clone voices from short audio clips for multilingual speech synthesis.
The new Mixture-of-Experts model from the Beijing-based AI company is optimized for speed and released under the permissive MIT license.
The new text-to-image model emphasizes speed and efficiency with a novel architecture and FP8 quantization.
The new 80-million-parameter text-to-speech model adapts a powerful language model architecture for efficient, open-source audio generation.
The new 9-billion-parameter model uses a Diffusion Transformer architecture, promising higher performance than existing open-source alternatives.
The new 1-billion-parameter model combines a Llama 3.2 base with text-to-speech to generate more natural and nuanced audio.
The new text-to-speech model uses a hybrid diffusion and autoregressive architecture for high-quality, multilingual synthesis.
The new text-to-image model is fluent in both Chinese and English, built on the CogView2 architecture and released under a permissive MIT license.
The new text-to-speech model from the audio AI company supports English, Korean, and Spanish and comes in the efficient ONNX format for deployment.
The new diffusion model from the creative app company can generate short video clips from text, images, audio, and even other videos.
The new vision-language model from the Chinese AI firm uses a Mixture-of-Experts architecture and is now available on Hugging Face.
The 8-billion-parameter model from Alibaba's Qwen team understands and generates spoken responses, enabling more natural audio-first applications.
A new text-to-speech model from OpenMOSS leverages the Qwen2 architecture to generate speech in both English and Chinese.
The new diffusion model from Alibaba's team allows for precise, instruction-based image modifications in both English and Chinese.
The new Fun-ASR-Nano model from Alibaba's team packs real-time multilingual transcription, speaker diarization, and hotword detection into an efficient package.
The new model from Tencent's Hunyuan team generates dynamic video and reconstructs 3D environments using a single static picture.
The new LongCat-Image-Edit model follows natural language instructions to perform complex photo manipulations in both English and Chinese.
The new 14-billion-parameter model uses audio input to generate realistic talking head videos from a single still image.
The new 500-million-parameter model is designed for generating natural, long-form speech with very low latency for interactive applications.
The new 4-billion-parameter model handles text, image, and speech inputs and outputs, including direct speech-to-speech translation.
The new text-to-image model from the team behind Qwen uses a diffusion transformer to generate high-resolution images in just a single step.
The 2-billion-parameter text-to-speech model can clone voices from a short audio sample and is available under an Apache 2.0 license.
The new open-source model from the Allen Institute for AI unifies text and image understanding and generation into a single architecture.
The new 1.7 billion-parameter model from OpenMOSS is trained on conversational data to generate natural dialogue in English and Chinese.
The Chinese tech giant has released a new MIT-licensed model capable of generating video from text, images, or by continuing existing clips.
The new open-source Mixture-of-Experts model can process and generate any combination of text, images, video, audio, and 3D data.
The new Sortformer-based model is designed for streaming audio, identifying up to four distinct speakers in real time.
The Shanghai-based AI startup has released a new Mixture-of-Experts model focused on complex reasoning, coding, and agentic tasks.
The new vision-language model from Datalab is fine-tuned from Qwen2-VL to specialize in extracting text and structure from complex documents.
The new open-source model combines both video generation and comprehension, a rare dual capability built on the Qwen2.5 vision-language foundation.
The new Apache 2.0 licensed model uses a Llama-based architecture to generate more natural and emotionally nuanced speech from text.
The new vision-language model uses a novel context compression technique to efficiently extract text and structure from complex documents.
The new vision-language model is fine-tuned to understand not just text, but the complex structure of tables, charts, and formulas.
The 600-million-parameter model offers real-time speech-to-text with speaker diarization, built on the efficient FastConformer architecture.
The new Ming-flash-omni-Preview aims to handle any combination of data modalities using an efficient Mixture of Experts architecture.
The latest vision-language model from the popular Qwen series is instruction-tuned and available under an Apache 2.0 license.
The new 270-million-parameter model from Google DeepMind is fine-tuned specifically for reliable function calling and tool use.
The new open-source model from Swiss researchers uses a novel chunking method to generate indefinitely long videos from a single still image.
The new 14-billion-parameter model is a distilled, more efficient version of a larger foundation, designed for interactive video generation.
The new 16-billion-parameter model uses a sparse Mixture-of-Experts design to efficiently handle 'any-to-any' data combinations, from text to images.
The new open-source model from Alibaba uses a Mixture-of-Experts architecture to make its powerful vision-language capabilities more efficient to run.
Based on the Language-Free Modeling for Multilingual Text-To-Speech (LFM2) architecture, the new model offers an efficient solution for developers.
Built on the Wan2.2 architecture, this new 5-billion-parameter model generates short video clips from a single image and simultaneously creates synchronized audio.
The new Mixture-of-Experts model is available under a permissive MIT license and is optimized for complex reasoning and coding tasks.
A new 16-billion-parameter model from inclusionAI uses a Mixture-of-Experts architecture to handle a wide range of audio tasks efficiently.
The new model from the TikTok parent company generates short video clips that maintain a person's likeness from a single reference image.
The new text-to-image generator from the Chinese tech giant uses a Mixture-of-Experts architecture for improved efficiency and output quality.
The new model from Alibaba's Qwen team allows users to modify images using natural language prompts instead of complex tools or masks.
The new 30B Mixture-of-Experts model from Alibaba's Qwen team can process and generate content across text, image, and audio formats.
This new instruction-tuned model from Xiaomi can handle a flexible combination of audio and text inputs and outputs, from transcription to voice synthesis.
The new 500-million-parameter model offers high-quality text-to-speech and zero-shot voice cloning under a permissive license.
The new 30-billion-parameter Mixture-of-Experts model from Alibaba's Qwen team is designed to show its reasoning process for complex multimodal tasks.
The new Mixture-of-Experts model from Alibaba is fine-tuned to generate detailed, multilingual descriptions for complex audio content.
The new Apache 2.0 text-to-speech model is built on a Qwen2 architecture and optimized for local inference with GGUF support.
The next generation of the efficient, open-source vision-language model is now available for early testing and feedback.
The new open-source model specializes in creating realistic videos of people, separating appearance from motion for greater control.
The new 14-billion-parameter model from Alibaba's PAI team offers fine-grained control over video generation using inputs like sketches and depth maps.
The new Qwen3-Next model from Alibaba combines a large parameter count with an efficient MoE architecture to balance performance and computational cost.
This new open-source model uses a diffusion architecture instead of a typical transformer to generate and understand a mix of media types.
The new text-to-image model uses a novel rejection sampling technique to align Stable Diffusion XL more closely with human aesthetic preferences.
The new text-to-image model from the Chinese tech giant is designed to understand both Chinese and English prompts at high resolutions.
The new 7-billion-parameter model is designed for generating long-form, multi-speaker audio in English and Chinese under a permissive MIT license.
The new open-source model specializes in generating long-form, multi-speaker audio in both English and Mandarin, mimicking a natural podcast conversation.
The new open-source model handles both speech recognition and audio generation in a single, end-to-end architecture.
The new model from Tencent AI Lab generates temporally and spatially consistent video sequences from a single image, enabling virtual exploration of static scenes.
The new 1.5-billion-parameter text-to-speech model is designed to generate natural, multi-speaker audio for podcasts and other long-form content.
The new Wan2.2-S2V model takes a still image and a speech track to generate a realistic talking-head animation, available under a permissive license.
The new vision-language model from the open-source research group demonstrates strong OCR and video understanding capabilities in a small package.
The new DeepSeek-V3.1-Base is a massive 671-billion-parameter Mixture-of-Experts model designed for efficient, large-scale research and development.
The new open-source model from Alibaba lets users edit images with simple text commands in both English and Chinese.
The new 4-billion-parameter model is designed for 'any-to-any' multimodal tasks and optimized to run efficiently on mobile hardware.
The new Hunyuan-GameCraft 1.0 is an open image-to-video model that generates interactive game-like scenes with precise camera control.
A new diffusion-based model from developer FrancisRing animates still images into talking avatars using only an audio track.
The new Mixture-of-Experts model offers strong multimodal reasoning capabilities under a permissive MIT license.
The new 1.3-billion-parameter model functions as an interactive 'world model,' generating controllable video scenes from a single static image.
The new ultra-compact model from DeepMind is designed for efficient performance in resource-constrained environments like mobile and web.
The new `gpt-oss-20b` is an Apache 2.0-licensed Mixture-of-Experts model designed to run efficiently on consumer-grade hardware.
The new 117-billion-parameter `gpt-oss-120b` is a Mixture-of-Experts model focused on reasoning, released under a permissive Apache 2.0 license.
The new 1-billion-parameter model handles both transcription and translation across five languages using the company's efficient FastConformer architecture.
The new FastConformer model uses a specialized training technique to improve transcription accuracy in noisy, real-world environments.
The new Apache 2.0 diffusion model from Alibaba's Qwen team focuses on accurately rendering both English and Chinese characters within generated images.
The new Apache 2.0 model from Alibaba's Qwen team uses a Mixture-of-Experts architecture to deliver strong performance with only 3B active parameters.
The new 3B-parameter model from rednote-hilab uses a vision-language approach to parse tables, layouts, and even mathematical formulas.
The new autoregressive model from the Chinese AI lab can understand, generate, and edit images within a single, compact framework.
The new open-source diffusion model from the team behind Qwen uses a Mixture-of-Experts architecture to animate still images.
The new Apache 2.0 licensed model from Alibaba's team generates video from either text prompts or still images, offering a unified approach in a compact package.
The new Apache 2.0-licensed model from Alibaba's team uses a Mixture-of-Experts architecture for efficient, high-quality video generation.
The new 14-billion parameter model from Alibaba's AI team uses a Mixture-of-Experts design and is available under the permissive Apache 2.0 license.
The new flagship coding model from Alibaba's Qwen team uses a massive Mixture-of-Experts architecture and is released under a permissive Apache-2.0 license.
The new Mixture-of-Experts model combines massive scale with a fully permissive license, targeting complex reasoning and agentic applications.
The new Apache 2.0 licensed model from Alibaba's team can generate video from both text and image prompts, adding a powerful new tool to the open-source creative ecosystem.
The new MIT-licensed model, HiDream-E1.1, allows for complex image modifications by following natural language instructions.
The new MIT-licensed model from inclusionAI can process and generate a mix of text, images, audio, and video, pushing the boundaries of open multimodal AI.
Based on the Wan2.1 architecture, this new 14B parameter model offers fine-grained control over video generation from still images and text.
The new streaming Conformer model from the Russian digital bank is optimized for real-time transcription of telephone conversations.
The new Mixture-of-Experts model brings massive scale to the open-weights community, focusing on complex reasoning and coding tasks with a 128K context window.
The new 12-billion-parameter model, tuned by creative AI platform Krea, focuses on high-quality aesthetic output and prompt fidelity.
The new 7-billion-parameter model from the company's SEED team can process and generate a mix of text, images, audio, and video in a single unified framework.
The new 3-billion-parameter model focuses on generating expressive, multilingual speech and is fully open for commercial use under an Apache 2.0 license.
The French AI lab's new open-source model generates streaming audio in English and French under a permissive license.
The new GLM-4.1V-9B-Thinking model makes its vision and chain-of-thought reasoning capabilities available under a permissive MIT license.
The new 3-billion-parameter model from AIDC-AI combines vision-language understanding and image generation into a single 'any-to-any' framework.
The 2.5 billion-parameter speech model combines a FastConformer encoder with a Qwen LLM decoder, a hybrid approach to transcription.
Maya Research has released a 3-billion-parameter model designed to generate natural-sounding speech in Hindi and English.
The new 7-billion-parameter model from FreedomIntelligence can process various inputs and generate or edit images based on text prompts.