NanbeigeText / LLM

Nanbeige Releases 3B Chinese-Enhanced Language Model

The new Llama-based model was trained from scratch on 3.5 trillion tokens of Chinese and English data to enhance its bilingual capabilities.

Feb 10, 2026

UpdateOther

Chinese AI firm Nanbeige Technology has released Nanbeige4.1-3B, a new 3-billion-parameter language model designed for strong performance in both Chinese and English. The model is based on the popular Llama architecture but was trained from scratch on a custom, high-quality dataset of 3.5 trillion tokens.

This release adds a new contender to the growing field of specialized, efficient open models. By focusing on a specific language pair and a compact size, Nanbeige4.1-3B offers developers a capable option that can run on less powerful hardware. The company's provided benchmarks show it performing competitively against other models in its class on Chinese-centric evaluations like CMMLU and C-Eval.

Key Specifications

The model's technical details make it a practical choice for a range of applications:

Architecture: Llama
Parameters: 3 billion
Context Length: 4096 tokens
Training Data: 3.5T tokens (Chinese & English)

According to its Hugging Face release card, Nanbeige4.1-3B is available under a custom license that permits free commercial use, an important consideration for teams looking to build products with the model.

Sources

Nanbeige/Nanbeige4.1-3B
Hugging Face
Visit

0 comments

No comments yet. Be the first to weigh in.

Meituan Ships a Lighter, Sparser LongCat-Flash

The food-delivery giant's newest open model trims its mixture-of-experts design for more efficient inference under an MIT license.

Jul 31, 2026

DeepSeek/Text / LLM

DeepSeek Refreshes V4-Flash With New 0731 Checkpoint

The MIT-licensed mixture-of-experts model returns in an updated build shipping with FP8 weights for cheaper inference.

Jul 31, 2026

DeepSeek/Text / LLM

DeepSeek Ships V4-Flash, a 304B MoE Tuned for Agents

The latest checkpoint in DeepSeek's V4 line leans into agentic workflows while keeping the permissive MIT license.

Jul 31, 2026

Key Specifications

The model's technical details make it a practical choice for a range of applications:

Architecture: Llama

Parameters: 3 billion

Context Length: 4096 tokens

Training Data: 3.5T tokens (Chinese & English)