Moonshot AI Releases Kimi, a Multimodal Coding Model
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
The new bilingual model from the Chinese AI firm uses a Mixture of Experts architecture and sparse attention under a fully permissive license.

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.
The new 3-billion-parameter model from the Chinese tech giant focuses on challenging benchmarks in mathematics, coding, and graduate-level questions.
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
The new text-to-speech model offers a commercially permissive alternative for developers in a field still dominated by closed-source APIs.
The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.
The new open-source diffusion model from the company's research arm generates video clips from a single character image and a sequence of poses.
The new Apache 2.0-licensed model is designed for code generation and agentic chat applications, using a Mixture-of-Experts architecture for efficiency.
Momentum
Benchmarks
| # | Model | Avg. |
|---|---|---|
| 1 | MaziyarPanahi/calme-3.2-instruct-78b | 52.1 |
| 2 | MaziyarPanahi/calme-3.1-instruct-78b | 51.3 |
| 3 | dfurman/CalmeRys-78B-Orpo-v0.1 | 51.2 |
| 4 | MaziyarPanahi/calme-2.4-rys-78b | 50.8 |
| 5 | huihui-ai/Qwen2.5-72B-Instruct-abliterated | 48.1 |
| 6 | Qwen/Qwen2.5-72B-Instruct Qwen · Alibaba | 48.0 |

The new 9.3 billion parameter model uses a Diffusion Transformer architecture and excels at rendering coherent text within generated images.

The new 3-billion-parameter model from the Chinese tech giant focuses on challenging benchmarks in mathematics, coding, and graduate-level questions.
The new model, SANA-WM, uses a bidirectional diffusion process to give creators fine-grained control over camera movement and video editing.
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.