The Open Weights
LatestModelsLeaderboardsUpcomingCompanies
Subscribe
The Open Weights

The daily record of open-source AI. New model releases, leaderboards, and what's coming next — written for people who ship.

Refreshed every 12 hours

Discover

  • Latest releases
  • New today
  • Trending models
  • Upcoming launches

Browse

  • All models
  • Companies
  • Categories
  • Leaderboards

About

  • About
  • Editorial policy
  • RSS feed
  • Newsletter

© 2026 The Open Weights. An independent publication.

Aggregated by Claude · written with Gemini · curated by humans.

LatestTencent0.5
TencentVision-Language

Tencent Releases 2B Vision Model for Robotics

The new HY-Embodied 0.5 is a vision-language model designed specifically for multi-object tracking in dynamic, real-world environments.

Apr 2, 2026
NotableOther
Tencent · Vision-Language
HY-Embodied 0.5
HY-Embodied 0.5

Tencent's Hunyuan team has released HY-Embodied 0.5, a new 2-billion-parameter vision-language model aimed at the growing field of embodied AI.

Unlike many general-purpose VLMs that focus on static image captioning, HY-Embodied is built on an end-to-end Multi-object Tracking (MoT) architecture. This allows the model to perceive and follow multiple distinct objects through video sequences—a critical capability for robots and other autonomous agents that need to understand dynamic scenes.

A Foundation for Physical Agents

The model's specialized design bridges the gap between passive visual understanding and the active interaction required in robotics. By providing a unified system for tracking objects over time, HY-Embodied could enable more sophisticated behaviors in applications like:

  • Robotic navigation and manipulation
  • Autonomous vehicle systems
  • Advanced video analysis

The release signals a move towards creating foundational models for specific, complex domains beyond simple text and image generation. The HY-Embodied 0.5 model is available on Hugging Face under a custom license agreement.

Sources

  • tencent/HY-Embodied-0.5

    Hugging Face

    Visit

0 comments

Protected by Turnstile

No comments yet. Be the first to weigh in.

Get the model

Weights

Specs

Parameters2B
Context window—
LicenseOTHER
Downloads338

Modalities

Vision-Language

More in Vision-Language

Moonshot AI
Kimi-K2.7-Code
Kimi-K2.7-Code
Moonshot AI/Code

Moonshot AI Releases Kimi, a Multimodal Coding Model

The new Mixture-of-Experts model from the Chinese AI company can generate code while also understanding visual inputs, a rare combination in open models.

Jun 11, 2026
Google DeepMind
DiffusionGemma 26B-A4B Instruct
DiffusionGemma 26B-A4B Instruct
Google DeepMind/Text / LLM

Google Releases Open-Source DiffusionGemma 26B Model

The new 26B parameter model from DeepMind uses a diffusion-based architecture, a technique more common in image generation, to produce text.

Jun 9, 2026
MiniMax
MiniMax-M3
MiniMax-M3
MiniMax/Vision-Language

MiniMax Releases M3, a Multimodal MoE Model

The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.

Jun 2, 2026