Meta AIVision-Language

Meta releases SAM 3 for image and video segmentation

The latest Segment Anything Model extends Meta's mask-generation lineage from still images into video, now available on Hugging Face.

Nov 7, 2025

NotableOther

Meta AI has published SAM 3, the newest entry in its Segment Anything Model family, on Hugging Face. The release targets mask generation across both images and video, continuing the lineage that made the original SAM a default building block for vision pipelines.

Segmentation models like SAM produce pixel-level masks that isolate objects from their surroundings. That capability underpins a wide range of downstream work — from photo and video editing tools to data labeling, robotics, and medical imaging — where reliably separating a subject from its background is the first step in a larger workflow.

Why it matters

The Segment Anything line has been influential precisely because it shipped as a general-purpose, reusable component rather than a task-specific tool. By extending that approach to video, SAM 3 addresses one of the harder problems in the space: maintaining consistent object masks across frames as scenes move and change.

Handles both image and video mask generation in a single model
Distributed openly through Hugging Face for developers to build on
Released under Meta's custom license, so teams should review the terms before commercial use

Meta has not paired this listing with detailed public specifications such as parameter counts or context limits, so practitioners will want to consult the model card and any accompanying documentation directly. For most users, the practical question is how SAM 3's accuracy and speed compare to its predecessors — answers that will emerge as the community puts it to work.

Sources

facebook/sam3
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026

Why it matters

Handles both image and video mask generation in a single model

Distributed openly through Hugging Face for developers to build on

Released under Meta's custom license, so teams should review the terms before commercial use