Meta releases SAM 3 for image and video segmentation
The latest Segment Anything Model extends Meta's mask-generation lineage from still images into video, now available on Hugging Face.
Meta AI has published SAM 3, the newest entry in its Segment Anything Model family, on Hugging Face. The release targets mask generation across both images and video, continuing the lineage that made the original SAM a default building block for vision pipelines.
Segmentation models like SAM produce pixel-level masks that isolate objects from their surroundings. That capability underpins a wide range of downstream work — from photo and video editing tools to data labeling, robotics, and medical imaging — where reliably separating a subject from its background is the first step in a larger workflow.
Why it matters
The Segment Anything line has been influential precisely because it shipped as a general-purpose, reusable component rather than a task-specific tool. By extending that approach to video, SAM 3 addresses one of the harder problems in the space: maintaining consistent object masks across frames as scenes move and change.
- Handles both image and video mask generation in a single model
- Distributed openly through Hugging Face for developers to build on
- Released under Meta's custom license, so teams should review the terms before commercial use
Meta has not paired this listing with detailed public specifications such as parameter counts or context limits, so practitioners will want to consult the model card and any accompanying documentation directly. For most users, the practical question is how SAM 3's accuracy and speed compare to its predecessors — answers that will emerge as the community puts it to work.
Sources
- Visit
facebook/sam3
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Vision-Language
Moonshot AI Releases Kimi, a Multimodal Coding Model
The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.
Google Releases Open-Source DiffusionGemma 26B Model
The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

PaddleOCR's PP-OCRv6 Adds a Medium Detection Model
Baidu's open-source OCR toolkit ships an Apache-licensed text-line detector in safetensors format, tuned for a balance of accuracy and speed.