The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestNVIDIA3B
NVIDIAVision-Language

NVIDIA's New 3B VLM Pinpoints Objects in Images

The new 3-billion-parameter model, based on the company's Eagle architecture, is designed for high-precision visual grounding tasks.

Mar 2, 2026
NotableOther
NVIDIA · Vision-Language
LocateAnything-3B
LocateAnything-3B

NVIDIA has introduced LocateAnything-3B, a new vision-language model designed to precisely identify and outline objects in an image based on a text prompt. This specialized 3-billion-parameter model excels at visual grounding, which connects natural language descriptions to specific pixel regions in a picture.

Built upon the foundation of NVIDIA's Eagle architecture, LocateAnything-3B combines a powerful vision encoder with a language model. This dual structure enables it to interpret complex visual scenes and understand user queries to answer the question, "Where is the object I'm describing?" with a high degree of accuracy.

Potential Applications

The ability to precisely locate objects opens up possibilities for a range of applications. Key use cases for this technology include:

  • Interactive Photo Editing: Allowing users to select complex objects with simple text commands like "select the red sports car."
  • Robotics and Automation: Providing visual intelligence for robots to identify and interact with specific items in their environment.
  • Accessibility Tools: Enhancing systems that describe image content for visually impaired users by specifying where objects are located.

The model and its weights are now available on the Hugging Face Hub. According to its model card, LocateAnything-3B is released for non-commercial research purposes only, an important consideration for developers exploring its capabilities.

Sources

  • nvidia/LocateAnything-3B

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters3B
Context window—
LicenseOTHER
Downloads98.7K

Modalities

Vision-Language

More in Vision-Language

Moonshot AI
Kimi-K2.7-Code
Kimi-K2.7-Code
Moonshot AI/Code

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
Google DeepMind
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026