FlashLabs Releases Chroma-4B, an Any-to-Any Model
The new 4-billion-parameter model handles text, image, and speech inputs and outputs, including direct speech-to-speech translation.
AI research group FlashLabs has released Chroma-4B, a new multimodal model designed for true “any-to-any” capabilities. The 4-billion-parameter model is available under an Apache 2.0 license, making it accessible for both research and commercial applications.
Unlike many multimodal models that are limited to text and image processing, Chroma-4B can understand and generate content across text, images, and audio streams simultaneously. This allows for novel use cases that have been challenging for previous open-source models.
A More Flexible Multimodal Architecture
The model's key feature is its ability to handle complex input and output combinations. According to the release documentation, Chroma-4B supports tasks such as:
- Direct speech-to-speech translation
- Generating an audio description from an image
- Answering text-based questions about an audio clip
This versatility stems from a unified architecture that processes all modalities within a single framework, rather than relying on separate, specialized components.
While at 4 billion parameters Chroma-4B is smaller than many flagship models, its release marks an interesting step forward for open, natively multi-sensory AI. By moving beyond the common text-vision paradigm, it provides a foundation for developing more integrated and intuitive applications. The model and its weights are available on Hugging Face.
Sources
- Visit
FlashLabs/Chroma-4B
Hugging Face
0 comments
No comments yet. Be the first to weigh in.
More in Any-to-Any

MiniMax Releases M3, a Multimodal MoE Model
The new open-weight model from MiniMax AI combines vision, coding, and reasoning using a Mixture-of-Experts architecture.
Google Releases Gemma 4 12B Multimodal Model
The new 12-billion-parameter open model from DeepMind introduces a unified 'any-to-any' architecture for advanced multimodal tasks.
Google Releases Gemma 4, a 12B 'Any-to-Any' Model
The new 12-billion-parameter model from Google DeepMind is designed to handle a flexible mix of data types, moving beyond traditional text and image inputs.