Skywork Releases UniPic, a Unified 1.5B Vision Model
The new autoregressive model from the Chinese AI lab can understand, generate, and edit images within a single, compact framework.

AI lab Skywork, known for its Chinese-language LLMs, has released Skywork-UniPic-1.5B, a compact multimodal model designed to handle a variety of vision tasks within a single architecture. At just 1.5 billion parameters, UniPic uses a unified autoregressive approach to process and create images, a departure from more specialized, single-task models.
The key feature of UniPic is its versatility. Instead of requiring separate models for different functions, it integrates several core capabilities into one system. This multi-task design represents a growing trend towards more efficient and generalized AI systems that can reason about and manipulate visual data more holistically.
A Unified Vision Framework
UniPic is capable of performing three primary functions:
- Image Understanding: The model can interpret the content of an image and answer questions about it.
- Image Generation: It can create new images from descriptive text prompts.
- Image Editing: It can modify existing images based on user instructions.
The model, code, and further details are available on the project's Hugging Face repository. It is released under the Skywork License Agreement, which allows for research and commercial use with certain restrictions. Its relatively small size could make it an accessible tool for researchers and developers experimenting with unified vision architectures.
Sources
- Visit
Skywork/Skywork-UniPic-1.5B
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Any-to-Any

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.
Google Releases Gemma 4, a 12B 'Any-to-Any' Model
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.