Alibaba Releases Z-Image-Turbo, A Fast Open Image Model
The new text-to-image model from the team behind Qwen uses a diffusion transformer to generate high-resolution images in just a single step.

Alibaba's Tongyi-MAI team, the creators of the Qwen model family, has released Z-Image-Turbo, a new open-source text-to-image model designed for speed. The model stands out for its ability to generate a 1024x1024 pixel image in a single inference step, a significant departure from the multi-step sampling process required by most diffusion models.
This efficiency is achieved by using a Diffusion Transformer (DiT) architecture. This approach, which has gained prominence in models like Sora, moves away from the U-Net architectures common in earlier Stable Diffusion versions. By building on a transformer backbone, Z-Image-Turbo can capture complex relationships in text prompts more effectively and streamline the generation process.
The combination of speed and a modern architecture makes Z-Image-Turbo a compelling new option for developers. The model and its weights are available on Hugging Face under the permissive Apache 2.0 license, which allows for commercial use.
For the open-source AI ecosystem, Z-Image-Turbo represents another powerful, commercially-friendly alternative to established models. Its single-step generation offers a practical advantage for applications where low latency is critical, such as real-time previews or interactive image creation tools.
Sources
0 comments
No comments yet. Be the first to weigh in.
More in Text → Image

Ideogram 4.0: A 9.3B Open-Weight Text-to-Image Model
The new 9.3 billion parameter model uses a Diffusion Transformer architecture and excels at rendering coherent text within generated images.

ByteDance Releases Lance, a Unified Generative AI Model
The 3-billion-parameter model handles image and video generation, editing, and understanding from a single set of weights under a permissive license.

SenseTime Releases 8B 'Any-to-Any' Infographic Model
The new 8B-parameter SenseNova U1 model from SenseTime is designed for complex multimodal tasks, including the in-conversation generation and editing of infographics.