TencentVision-Language

Tencent Releases 2B Vision Model for Robotics

The new HY-Embodied 0.5 is a vision-language model designed specifically for multi-object tracking in dynamic, real-world environments.

Apr 2, 2026

NotableOther

Tencent's Hunyuan team has released HY-Embodied 0.5, a new 2-billion-parameter vision-language model aimed at the growing field of embodied AI.

Unlike many general-purpose VLMs that focus on static image captioning, HY-Embodied is built on an end-to-end Multi-object Tracking (MoT) architecture. This allows the model to perceive and follow multiple distinct objects through video sequences—a critical capability for robots and other autonomous agents that need to understand dynamic scenes.

A Foundation for Physical Agents

The model's specialized design bridges the gap between passive visual understanding and the active interaction required in robotics. By providing a unified system for tracking objects over time, HY-Embodied could enable more sophisticated behaviors in applications like:

Robotic navigation and manipulation
Autonomous vehicle systems
Advanced video analysis

The release signals a move towards creating foundational models for specific, complex domains beyond simple text and image generation. The HY-Embodied 0.5 model is available on Hugging Face under a custom license agreement.

Sources

tencent/HY-Embodied-0.5
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Thinking Machines Debuts Inkling Small, a Compact Multimodal MoE

The Apache-2.0 model brings mixture-of-experts efficiency to image, audio, and text tasks in a smaller footprint.

Jul 27, 2026

Microsoft/Vision-Language

Microsoft's Mage-VL Streams Video Natively

A codec-native multimodal foundation model aims to understand live video and vision-language input in real time.

Jul 26, 2026

Swiss Ai/Text / LLM

Apertus v1.5 70B arrives with an Apache-2.0 license

Switzerland's open-model effort ships a 70-billion-parameter, multilingual and multimodal system that anyone can use, modify, and deploy.

Jul 24, 2026

A Foundation for Physical Agents

Robotic navigation and manipulation

Autonomous vehicle systems

Advanced video analysis