Tencent SRPO Fine-Tunes SDXL with Preference Optimization
The new text-to-image model uses a novel rejection sampling technique to align Stable Diffusion XL more closely with human aesthetic preferences.

Tencent has released a new text-to-image model that uses a novel technique for aligning generative AI with human preferences. The model, called SRPO, is a fine-tuned version of the popular Stable Diffusion XL 1.0, designed to produce images that better match user intent and aesthetic tastes.
The project's name stands for Simple Rejection Preference Optimization, which hints at its technical approach. Instead of using more complex reinforcement learning methods, SRPO works by generating multiple candidate images for a given prompt. A separate reward model then scores and selects the best one, and the base diffusion model is fine-tuned using only that preferred output. This rejection sampling method provides a simpler, potentially more stable path to preference alignment.
This technique is significant because it offers a more direct and computationally efficient way to imbue foundation models with specific styles or quality standards. For developers and researchers, SRPO presents a practical blueprint for creating highly-specialized image models without the overhead of more elaborate alignment pipelines. The model and its underlying code are available on Hugging Face.
The model is based on Stability AI's SDXL-1.0 and is released under an OpenRAIL-M license, which permits commercial use but includes specific use-case restrictions. By building on a well-known open model, the release allows for direct comparison and demonstrates a clear path for improving existing generative tools.
Sources
- Visit
tencent/SRPO
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Text → Image

Ideogram 4.0: A 9.3B Open-Weight Text-to-Image Model
The new 9.3 billion parameter model uses a Diffusion Transformer architecture and excels at rendering coherent text within generated images.

ByteDance Releases Lance, a Unified Generative AI Model
The 3-billion-parameter model handles image and video generation, editing, and understanding from a single set of weights under a permissive license.

SenseTime Releases 8B 'Any-to-Any' Infographic Model
The new 8B-parameter SenseNova U1 model from SenseTime is designed for complex multimodal tasks, including the in-conversation generation and editing of infographics.