TencentText → Image

Tencent SRPO Fine-Tunes SDXL with Preference Optimization

The new text-to-image model uses a novel rejection sampling technique to align Stable Diffusion XL more closely with human aesthetic preferences.

Sep 8, 2025

NotableOther

Tencent has released a new text-to-image model that uses a novel technique for aligning generative AI with human preferences. The model, called SRPO, is a fine-tuned version of the popular Stable Diffusion XL 1.0, designed to produce images that better match user intent and aesthetic tastes.

The project's name stands for Simple Rejection Preference Optimization, which hints at its technical approach. Instead of using more complex reinforcement learning methods, SRPO works by generating multiple candidate images for a given prompt. A separate reward model then scores and selects the best one, and the base diffusion model is fine-tuned using only that preferred output. This rejection sampling method provides a simpler, potentially more stable path to preference alignment.

This technique is significant because it offers a more direct and computationally efficient way to imbue foundation models with specific styles or quality standards. For developers and researchers, SRPO presents a practical blueprint for creating highly-specialized image models without the overhead of more elaborate alignment pipelines. The model and its underlying code are available on Hugging Face.

The model is based on Stability AI's SDXL-1.0 and is released under an OpenRAIL-M license, which permits commercial use but includes specific use-case restrictions. By building on a well-known open model, the release allows for direct comparison and demonstrates a clear path for improving existing generative tools.

Sources

tencent/SRPO
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Microsoft's Mage-Flow packs image editing into 4B

A compact model handles both text-to-image generation and instruction-based edits at native resolution, under a permissive MIT license.

Jul 21, 2026

Unknown/Any-to-Any

Boogu-Image-0.1 Brings Unified Multimodal to Open Source

A new Apache-licensed model family folds bilingual text-to-image generation and instruction editing into one system.

Jul 13, 2026

NVIDIA/Text → Image

NVIDIA distills Qwen-Image for few-step generation

A DMD2-distilled build of Qwen-Image trades sampling steps for speed while keeping the original model's output profile.

Jul 1, 2026