NVIDIAImage Editing

NVIDIA Releases PiD for High-Quality Image Upscaling

The new component is a specialized VAE decoder that works with Stability AI's Z-Image model to enhance super-resolution tasks.

Apr 28, 2026

UpdateOther

NVIDIA has released a new, specialized component for generative AI workflows called PiD, or Pixel Diffusion Decoder. Rather than a full text-to-image model, PiD is a focused tool designed to improve one specific but crucial step in the image generation process: decoding and upscaling.

At its core, PiD is a Variational Autoencoder (VAE) decoder. It takes a compressed latent representation of an image and reconstructs it into a full-resolution picture. The model uses a pixel-diffusion technique, which can produce highly detailed and sharp results, making it particularly well-suited for super-resolution tasks where image quality is paramount.

Notably, this decoder is not a standalone system. It was specifically built to work with stabilityai/z-image-base-1b, a base model from Stability AI. This highlights a trend toward more modular, interoperable tools in the open-source ecosystem, where components from different research labs can be combined.

For developers and researchers, PiD offers a new building block for their image generation pipelines. By swapping in this specialized decoder, they may be able to achieve higher-fidelity outputs when upscaling images to larger sizes. NVIDIA released the model files on Hugging Face for developers to begin experimenting with the component.

Sources

nvidia/PiD
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Microsoft's Mage-Flow packs image editing into 4B

A compact model handles both text-to-image generation and instruction-based edits at native resolution, under a permissive MIT license.

Jul 21, 2026

Unknown/Any-to-Any

Boogu-Image-0.1 Brings Unified Multimodal to Open Source

A new Apache-licensed model family folds bilingual text-to-image generation and instruction editing into one system.

Jul 13, 2026

SenseTime/Any-to-Any

SenseTime's SenseNova-Vision-7B-MoT Goes Any-to-Any

A single 7B model from SenseTime folds vision-language understanding, image generation, editing, and perception into one system.

Jun 29, 2026